Skip to content

Agent

pikit.agent

Agent environments for testing prompt-injection attacks.

An agent wraps a :class:~pikit.targets.Target and exposes a run() that returns a human-readable :class:Trace. Three kinds ship:

  • chat — a plain assistant (no tools); direct injection via the user message.
  • tool — a general tool-calling agent; indirect injection via a compromised tool's return value (the poison map).
  • scenario agents — email / rag / browser — preconfigured with a realistic toolset and an observable sink.

Like the rest of pikit, agents register under short keys; use get_agent(key) to fetch the class, then construct it.

Agent

Agent(target: Target, *, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None)

Bases: ABC

Base class for agents under test.

Parameters

target: The model backend (:class:~pikit.targets.Target). system: Optional system prompt. defenses: Optional :class:DefenseHooks applied at the three insertion points.

run abstractmethod

run(user_message: str, **kwargs) -> Trace

Run the agent on user_message and return the :class:Trace.

Trace dataclass

Trace(steps: List[TraceStep] = list(), final_text: str = '')

An ordered record of an agent run, for human inspection.

sink_calls property

sink_calls: List[TraceStep]

Tool-call steps that hit a sink (observable action).

poisoned_steps property

poisoned_steps: List[TraceStep]

Steps whose data was the injected artifact.

TraceStep dataclass

TraceStep(kind: str, text: str = '', tool_name: Optional[str] = None, args: Optional[dict] = None, content: Optional[str] = None, poisoned: bool = False, is_sink: bool = False)

One step in an agent run.

DefenseHooks dataclass

DefenseHooks(system: Optional[Defense] = None, tool_result: Optional[Defense] = None, user: Optional[Defense] = None)

Optional defenses applied at three points of the agent loop.

Parameters

system: Hardens the system prompt (defends against the model being talked out of its instructions). tool_result: Hardens untrusted tool output before it re-enters the model — the key defense position for indirect injection. user: Hardens the incoming user message (defends against direct injection).

Tool dataclass

Tool(name: str, description: str, func: Callable[..., Any], parameters: dict = (lambda: {'type': 'object', 'properties': {}})(), is_sink: bool = False)

A callable tool exposed to an agent's model.

Parameters

name, description, func: Identity, model-facing description, and the underlying callable. parameters: JSON-schema for the arguments. Auto-derived if not given. is_sink: Marks an externally-observable action (e.g. send_email). The trace highlights when a sink fires — the key signal for judging whether an injection succeeded.

to_schema

to_schema() -> dict

Return the provider-agnostic {name, description, parameters}.

tool

tool(name: Optional[str] = None, *, description: Optional[str] = None, is_sink: bool = False, parameters: Optional[dict] = None) -> Callable[[Callable], Tool]

Decorator turning a plain function into a :class:Tool.

The tool name defaults to the function name; description to its docstring. parameters is auto-derived from type hints unless given.

Examples

@tool(description="Fetch a URL and return its body.") ... def fetch_url(url: str) -> str: ... return "..." isinstance(fetch_url, Tool) True

pikit.agent.base

Agent base class and the execution :class:Trace.

run() returns a :class:Trace rather than a bare string. The trace is the artifact a human reads to judge — manually — whether an injection succeeded: it shows every model turn, tool call, and tool result, and highlights when a sink fired or a step carried poisoned data. The library deliberately renders no verdict (no evaluator/scoring); it makes the signals easy to see and offers structured accessors so you can write your own one-line assertion.

TraceStep dataclass

TraceStep(kind: str, text: str = '', tool_name: Optional[str] = None, args: Optional[dict] = None, content: Optional[str] = None, poisoned: bool = False, is_sink: bool = False)

One step in an agent run.

Trace dataclass

Trace(steps: List[TraceStep] = list(), final_text: str = '')

An ordered record of an agent run, for human inspection.

sink_calls property

sink_calls: List[TraceStep]

Tool-call steps that hit a sink (observable action).

poisoned_steps property

poisoned_steps: List[TraceStep]

Steps whose data was the injected artifact.

Agent

Agent(target: Target, *, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None)

Bases: ABC

Base class for agents under test.

Parameters

target: The model backend (:class:~pikit.targets.Target). system: Optional system prompt. defenses: Optional :class:DefenseHooks applied at the three insertion points.

run abstractmethod

run(user_message: str, **kwargs) -> Trace

Run the agent on user_message and return the :class:Trace.

pikit.agent.loop

The provider-agnostic function-calling loop shared by tool agents.

run_tool_loop

run_tool_loop(target: Target, user_message: str, tools: List[Tool], *, system: Optional[str] = None, hooks: Optional[DefenseHooks] = None, poison: Optional[Dict[str, str]] = None, max_steps: int = 8) -> Trace

Drive target.chat over tools until it stops calling tools.

Parameters

poison: Map of tool_name -> artifact. When the model calls a poisoned tool, the loop returns the artifact as that tool's result instead of invoking the real function (the indirect-injection delivery point, and it avoids real side effects during a test).

pikit.agent.hooks

Defense insertion points for agents.

The same prevention-style :class:~pikit.base.Defense objects used for direct injection can be slotted into three points of an agent's data flow. The most valuable for indirect injection is tool_result — the layer through which an attacker's poisoned artifact re-enters the model.

DefenseHooks dataclass

DefenseHooks(system: Optional[Defense] = None, tool_result: Optional[Defense] = None, user: Optional[Defense] = None)

Optional defenses applied at three points of the agent loop.

Parameters

system: Hardens the system prompt (defends against the model being talked out of its instructions). tool_result: Hardens untrusted tool output before it re-enters the model — the key defense position for indirect injection. user: Hardens the incoming user message (defends against direct injection).

pikit.agent.tools

Tools for agents: a :class:Tool wrapper and a @tool decorator.

A tool is a plain Python function plus a JSON-schema description the model uses to decide how to call it. The schema is auto-derived from the function's type hints (zero dependencies — no pydantic), and can be overridden explicitly when richer per-argument descriptions are needed.

Tool dataclass

Tool(name: str, description: str, func: Callable[..., Any], parameters: dict = (lambda: {'type': 'object', 'properties': {}})(), is_sink: bool = False)

A callable tool exposed to an agent's model.

Parameters

name, description, func: Identity, model-facing description, and the underlying callable. parameters: JSON-schema for the arguments. Auto-derived if not given. is_sink: Marks an externally-observable action (e.g. send_email). The trace highlights when a sink fires — the key signal for judging whether an injection succeeded.

to_schema

to_schema() -> dict

Return the provider-agnostic {name, description, parameters}.

tool

tool(name: Optional[str] = None, *, description: Optional[str] = None, is_sink: bool = False, parameters: Optional[dict] = None) -> Callable[[Callable], Tool]

Decorator turning a plain function into a :class:Tool.

The tool name defaults to the function name; description to its docstring. parameters is auto-derived from type hints unless given.

Examples

@tool(description="Fetch a URL and return its body.") ... def fetch_url(url: str) -> str: ... return "..." isinstance(fetch_url, Tool) True

pikit.agent.builtin_tools

Built-in example tools for scenario agents.

These are deliberately simple, side-effect-free stand-ins. The data-source tools (fetch_url, read_email, read_file, search, load_skill) return realistic clean samples by default (see samples.py); in an indirect-injection test you pass a poison map so their return value becomes the attacker's artifact instead. The sink tools (send_email, post_form, run_command, write_file) are marked is_sink=True so the trace highlights when the model performs an observable, externally-visible action.

pikit.agent.tool_agent

A general tool-calling agent driven by the function-calling loop.

ToolAgent

ToolAgent(target: Target, *, tools: Optional[List[Tool]] = None, poison: Optional[Dict[str, str]] = None, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None, max_steps: int = 8)

Bases: Agent

An agent that can call a fixed set of tools in a loop.

Parameters

target, system, defenses: See :class:~pikit.agent.base.Agent. tools: The tools exposed to the model. poison: Map of tool_name -> artifact marking compromised tools whose return value is replaced by the injected artifact (indirect-injection delivery point). See :func:~pikit.agent.loop.run_tool_loop. max_steps: Safety cap on loop iterations.

pikit.agent.chat_agent

A no-tools chat agent — the simplest target under test.

ChatAgent

ChatAgent(target: Target, *, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None)

Bases: Agent

A plain chat assistant with no tools.

Wraps :meth:Target.query; direct injection arrives as the user message. The system and user defense hooks apply (there are no tool results to defend).

pikit.agent.scenarios.browser

Browser scenario: fetch a web page, then act — sink is post_form.

Models the Greshake-style indirect injection where a fetched web page hides an instruction. Poison point: fetch_url. Sink: post_form (submitting data to an external endpoint).

BrowserAgent

BrowserAgent(target: Target, *, poison: Optional[Dict[str, str]] = None, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None, max_steps: int = 8)

Bases: ToolAgent

A browsing agent. Poison point: fetch_url. Sink: post_form.

pikit.agent.scenarios.email_assistant

Email-assistant scenario: read mail, then act — sink is send_email.

The classic indirect-injection test: a poisoned email body instructs the model to exfiltrate data by emailing an attacker. Compromise read_email via the agent's poison map; watch whether the send_email sink fires with the attacker's address.

EmailAssistantAgent

EmailAssistantAgent(target: Target, *, poison: Optional[Dict[str, str]] = None, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None, max_steps: int = 8)

Bases: ToolAgent

An email assistant. Poison point: read_email. Sink: send_email.

pikit.agent.scenarios.rag_qa

RAG question-answering scenario.

The model answers a question over retrieved documents. Poison point: search (a retrieved doc carries the injection). The "sink" here is the final answer itself — whether the model complies with the injected instruction is observed in trace.final_text. An optional post_answer sink models pipelines that forward the answer somewhere observable.

RagQaAgent

RagQaAgent(target: Target, *, poison: Optional[Dict[str, str]] = None, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None, max_steps: int = 8)

Bases: ToolAgent

A RAG QA agent. Poison point: search. Sink: final answer / post_form.

pikit.agent.scenarios.coding

Coding scenario: a minimal stand-in for a code-assistant agent.

Models a basic coding agent that reads project files and loads skills, then acts — the setting for indirect injection via a poisoned source comment or a malicious Agent Skill. Poison points: read_file and load_skill. Sinks: run_command (arbitrary command execution) and write_file (tampering with files).

This is intentionally a lightweight simulation of frameworks like Claude Code / Cursor / Aider, just enough to demonstrate the attack surface.

CodingAgent

CodingAgent(target: Target, *, poison: Optional[Dict[str, str]] = None, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None, max_steps: int = 8)

Bases: ToolAgent

A coding agent. Poison points: read_file / load_skill. Sinks: run_command / write_file.