Agent¶
pikit.agent ¶
Agent environments for testing prompt-injection attacks.
An agent wraps a :class:~pikit.targets.Target and exposes a run() that
returns a human-readable :class:Trace. Three kinds ship:
chat— a plain assistant (no tools); direct injection via the user message.tool— a general tool-calling agent; indirect injection via a compromised tool's return value (thepoisonmap).- scenario agents —
email/rag/browser— preconfigured with a realistic toolset and an observable sink.
Like the rest of pikit, agents register under short keys; use
get_agent(key) to fetch the class, then construct it.
Agent ¶
Bases: ABC
Base class for agents under test.
Parameters¶
target:
The model backend (:class:~pikit.targets.Target).
system:
Optional system prompt.
defenses:
Optional :class:DefenseHooks applied at the three insertion points.
run
abstractmethod
¶
Run the agent on user_message and return the :class:Trace.
Trace
dataclass
¶
TraceStep
dataclass
¶
TraceStep(kind: str, text: str = '', tool_name: Optional[str] = None, args: Optional[dict] = None, content: Optional[str] = None, poisoned: bool = False, is_sink: bool = False)
One step in an agent run.
DefenseHooks
dataclass
¶
DefenseHooks(system: Optional[Defense] = None, tool_result: Optional[Defense] = None, user: Optional[Defense] = None)
Optional defenses applied at three points of the agent loop.
Parameters¶
system: Hardens the system prompt (defends against the model being talked out of its instructions). tool_result: Hardens untrusted tool output before it re-enters the model — the key defense position for indirect injection. user: Hardens the incoming user message (defends against direct injection).
Tool
dataclass
¶
Tool(name: str, description: str, func: Callable[..., Any], parameters: dict = (lambda: {'type': 'object', 'properties': {}})(), is_sink: bool = False)
A callable tool exposed to an agent's model.
Parameters¶
name, description, func:
Identity, model-facing description, and the underlying callable.
parameters:
JSON-schema for the arguments. Auto-derived if not given.
is_sink:
Marks an externally-observable action (e.g. send_email). The
trace highlights when a sink fires — the key signal for judging
whether an injection succeeded.
tool ¶
tool(name: Optional[str] = None, *, description: Optional[str] = None, is_sink: bool = False, parameters: Optional[dict] = None) -> Callable[[Callable], Tool]
Decorator turning a plain function into a :class:Tool.
The tool name defaults to the function name; description to its
docstring. parameters is auto-derived from type hints unless given.
Examples¶
@tool(description="Fetch a URL and return its body.") ... def fetch_url(url: str) -> str: ... return "..." isinstance(fetch_url, Tool) True
pikit.agent.base ¶
Agent base class and the execution :class:Trace.
run() returns a :class:Trace rather than a bare string. The trace is
the artifact a human reads to judge — manually — whether an injection
succeeded: it shows every model turn, tool call, and tool result, and
highlights when a sink fired or a step carried poisoned data. The
library deliberately renders no verdict (no evaluator/scoring); it makes the
signals easy to see and offers structured accessors so you can write your
own one-line assertion.
TraceStep
dataclass
¶
TraceStep(kind: str, text: str = '', tool_name: Optional[str] = None, args: Optional[dict] = None, content: Optional[str] = None, poisoned: bool = False, is_sink: bool = False)
One step in an agent run.
Trace
dataclass
¶
Agent ¶
Bases: ABC
Base class for agents under test.
Parameters¶
target:
The model backend (:class:~pikit.targets.Target).
system:
Optional system prompt.
defenses:
Optional :class:DefenseHooks applied at the three insertion points.
run
abstractmethod
¶
Run the agent on user_message and return the :class:Trace.
pikit.agent.loop ¶
The provider-agnostic function-calling loop shared by tool agents.
run_tool_loop ¶
run_tool_loop(target: Target, user_message: str, tools: List[Tool], *, system: Optional[str] = None, hooks: Optional[DefenseHooks] = None, poison: Optional[Dict[str, str]] = None, max_steps: int = 8) -> Trace
Drive target.chat over tools until it stops calling tools.
Parameters¶
poison:
Map of tool_name -> artifact. When the model calls a poisoned
tool, the loop returns the artifact as that tool's result instead of
invoking the real function (the indirect-injection delivery point,
and it avoids real side effects during a test).
pikit.agent.hooks ¶
Defense insertion points for agents.
The same prevention-style :class:~pikit.base.Defense objects used for
direct injection can be slotted into three points of an agent's data flow.
The most valuable for indirect injection is tool_result — the layer
through which an attacker's poisoned artifact re-enters the model.
DefenseHooks
dataclass
¶
DefenseHooks(system: Optional[Defense] = None, tool_result: Optional[Defense] = None, user: Optional[Defense] = None)
Optional defenses applied at three points of the agent loop.
Parameters¶
system: Hardens the system prompt (defends against the model being talked out of its instructions). tool_result: Hardens untrusted tool output before it re-enters the model — the key defense position for indirect injection. user: Hardens the incoming user message (defends against direct injection).
pikit.agent.tools ¶
Tools for agents: a :class:Tool wrapper and a @tool decorator.
A tool is a plain Python function plus a JSON-schema description the model uses to decide how to call it. The schema is auto-derived from the function's type hints (zero dependencies — no pydantic), and can be overridden explicitly when richer per-argument descriptions are needed.
Tool
dataclass
¶
Tool(name: str, description: str, func: Callable[..., Any], parameters: dict = (lambda: {'type': 'object', 'properties': {}})(), is_sink: bool = False)
A callable tool exposed to an agent's model.
Parameters¶
name, description, func:
Identity, model-facing description, and the underlying callable.
parameters:
JSON-schema for the arguments. Auto-derived if not given.
is_sink:
Marks an externally-observable action (e.g. send_email). The
trace highlights when a sink fires — the key signal for judging
whether an injection succeeded.
tool ¶
tool(name: Optional[str] = None, *, description: Optional[str] = None, is_sink: bool = False, parameters: Optional[dict] = None) -> Callable[[Callable], Tool]
Decorator turning a plain function into a :class:Tool.
The tool name defaults to the function name; description to its
docstring. parameters is auto-derived from type hints unless given.
Examples¶
@tool(description="Fetch a URL and return its body.") ... def fetch_url(url: str) -> str: ... return "..." isinstance(fetch_url, Tool) True
pikit.agent.builtin_tools ¶
Built-in example tools for scenario agents.
These are deliberately simple, side-effect-free stand-ins. The data-source
tools (fetch_url, read_email, read_file, search,
load_skill) return realistic clean samples by default (see
samples.py); in an indirect-injection test you pass a poison map so
their return value becomes the attacker's artifact instead. The sink tools
(send_email, post_form, run_command, write_file) are marked
is_sink=True so the trace highlights when the model performs an
observable, externally-visible action.
pikit.agent.tool_agent ¶
A general tool-calling agent driven by the function-calling loop.
ToolAgent ¶
ToolAgent(target: Target, *, tools: Optional[List[Tool]] = None, poison: Optional[Dict[str, str]] = None, system: Optional[str] = None, defenses: Optional[DefenseHooks] = None, max_steps: int = 8)
Bases: Agent
An agent that can call a fixed set of tools in a loop.
Parameters¶
target, system, defenses:
See :class:~pikit.agent.base.Agent.
tools:
The tools exposed to the model.
poison:
Map of tool_name -> artifact marking compromised tools whose
return value is replaced by the injected artifact (indirect-injection
delivery point). See :func:~pikit.agent.loop.run_tool_loop.
max_steps:
Safety cap on loop iterations.
pikit.agent.chat_agent ¶
pikit.agent.scenarios.browser ¶
Browser scenario: fetch a web page, then act — sink is post_form.
Models the Greshake-style indirect injection where a fetched web page hides
an instruction. Poison point: fetch_url. Sink: post_form (submitting
data to an external endpoint).
pikit.agent.scenarios.email_assistant ¶
Email-assistant scenario: read mail, then act — sink is send_email.
The classic indirect-injection test: a poisoned email body instructs the
model to exfiltrate data by emailing an attacker. Compromise read_email
via the agent's poison map; watch whether the send_email sink fires
with the attacker's address.
pikit.agent.scenarios.rag_qa ¶
RAG question-answering scenario.
The model answers a question over retrieved documents. Poison point:
search (a retrieved doc carries the injection). The "sink" here is the
final answer itself — whether the model complies with the injected
instruction is observed in trace.final_text. An optional post_answer
sink models pipelines that forward the answer somewhere observable.
pikit.agent.scenarios.coding ¶
Coding scenario: a minimal stand-in for a code-assistant agent.
Models a basic coding agent that reads project files and loads skills, then
acts — the setting for indirect injection via a poisoned source comment or a
malicious Agent Skill. Poison points: read_file and load_skill.
Sinks: run_command (arbitrary command execution) and write_file
(tampering with files).
This is intentionally a lightweight simulation of frameworks like Claude Code / Cursor / Aider, just enough to demonstrate the attack surface.