🧪 pikit — Prompt Injection Kit¶

A composable toolbox of classic prompt-injection attacks, defenses, and indirect-injection channels — plus a minimal agent testbed to watch them play out against a real model.

Think foolbox / cleverhans, but for prompt injection.

[!IMPORTANT] For authorized security research, red-teaming, and building defenses only. Use pikit against systems you own or are explicitly permitted to test.

What is pikit?¶

Research on LLM/agent security keeps re-implementing the same prompt-injection techniques from scratch. pikit collects the classic ones behind one small, uniform interface so you can:

call a known attack or defense in one line,
freely combine any attack with any channel and any defense,
:accessory-robot: drive a real agent and watch whether an injection actually lands, and
add a new method by dropping in one file — no core changes.

It is a library of methods, not a benchmark: it ships no evaluator, dataset, or leaderboard. You bring the task and the judgement.

Key features¶

🎯 9 attacks × 6 defenses × 6 channels × 6 agents, all mix-and-match.
🔀 Direct and indirect injection — word a payload (attack) and hide it in a carrier (channel: web page, document, Markdown, code comment, invisible Unicode, or an Agent Skill).
🤖 Agent testbed — a zero-dependency function-calling loop with preconfigured scenarios (email / RAG / browser / coding) and a real tool-calling backend.
🛡️ Defenses as pluggable hooks at three points of an agent's data flow.
🧩 Registry-based — contributing a method is one file + one decorator.
📦 Zero-dependency core — model SDKs (OpenAI / Anthropic / HF) are optional extras, imported lazily.

How it fits together¶

An attack controls how a payload is worded; a channel controls where it's hidden; a target/agent is what receives it; a defense hardens the prompt. They're orthogonal and compose freely:

                 ┌──────────── craft() ────────────┐
   task  ──▶  attack (wording)  ──▶  channel (carrier, indirect only)
                                          │
                                          ▼
              defense (optional hook) ──▶ target / agent  ──▶  trace you read

Dimension	Question it answers	Examples
attack	How is the payload worded?	`context_ignoring`, `combined`, `payload_splitting`
channel	Where is it hidden? (indirect)	`webpage`, `skills`, `code_comment`, `unicode_hidden`
defense	How do we harden the prompt?	`spotlighting`, `delimiters`, `sandwich`
target/agent	What receives it?	`openai:…`, `email`, `browser`, `coding`

Next steps¶

Install pikit — get started in 30 seconds
Quick Start — craft your first attack
Concepts — understand the design
Demos & CLI — run prebuilt scenarios against a real model