Concepts¶

Understanding pikit's design comes down to four orthogonal dimensions and a few key terms.

The four dimensions¶

pikit decomposes prompt injection into four independent axes that compose freely:

                 ┌──────────── craft() ────────────┐
   task  ──▶  attack (wording)  ──▶  channel (carrier, indirect only)
                                          │
                                          ▼
              defense (optional hook) ──▶ target / agent  ──▶  trace you read

Dimension	Question it answers	Count	Examples
Attack	How is the payload worded?	9	`context_ignoring`, `combined`, `obfuscation`
Channel	Where is it hidden? (indirect)	6	`webpage`, `skills`, `unicode_hidden`
Defense	How do we harden the prompt?	6	`spotlighting`, `delimiters`, `sandwich`
Target / Agent	What receives it?	4 backends / 6 agents	`openai:gpt-4o`, `browser`, `coding`

The key insight: these are orthogonal. Any attack can be paired with any channel, any defense can be applied to any agent, and the combinatorial space is what makes pikit useful for systematic research.

Direct vs. indirect injection¶

Term	Meaning
Direct injection	The attacker controls the prompt/message sent directly to the model.
Indirect injection	The payload is hidden in external data the model reads (page, doc, email, skill) — the dangerous case for agents.

An Attack handles the wording in both cases. A Channel is only needed for indirect injection — it hides the (already worded) payload inside a data artifact.

Core interfaces¶

All four dimensions share minimal, uniform interfaces:

class Attack:
    def inject(self, prompt: str, injected_task: str) -> str: ...

class Defense:
    def apply(self, prompt: str, instruction: str = None) -> str: ...

class Channel:
    def poison(self, data: str, payload: str) -> str: ...

class Target:
    def query(self, prompt: str, system: str = None, **kw) -> str: ...
    def chat(self, messages, tools=None, system=None, **kw) -> ChatResponse: ...

Attacks and defenses are both plain prompt-text transformers — they take a string and return a string. This is what makes them freely composable: any attack output can be fed into any defense, and vice versa.

The agent testbed¶

An Agent wraps a Target and exposes run() → Trace. The agent testbed models two critical concepts for indirect injection:

Poison point — a compromised tool whose return value is the attacker's poisoned artifact (e.g. fetch_url returns a malicious web page).
Sink — an externally-observable action the attacker wants to trigger (e.g. send_email, run_command, post_form).

The Trace records every step so you can judge — manually — whether the injection succeeded:

trace = agent.run("Summarize the page at http://site")
trace.sink_calls      # tool calls that hit a sink
trace.poisoned_steps  # steps that carried the injected artifact
print(trace)          # human-readable step-by-step log

pikit deliberately renders no verdict — it makes the signals easy to see but leaves the judgement to you.

Defense hooks¶

The same Defense objects used for direct injection can be slotted into three points of an agent's data flow via DefenseHooks:

Hook point	Defends against	Key for
`system`	Model being talked out of its instructions	Direct injection
`user`	Incoming user message	Direct injection
`tool_result`	Untrusted tool output re-entering the model	Indirect injection

The tool_result hook is the most valuable for indirect injection — it's the layer through which an attacker's poisoned artifact re-enters the model.

The registry¶

Every attack, defense, channel, and agent scenario registers itself under a short string key via a decorator. This means:

attacks.get("context_ignoring")   # → ContextIgnoringAttack class
attacks.list()                    # → ['combined', 'context_ignoring', ...]

Adding a new method is one file + one decorator — no core changes. See Extending pikit.