Attacks¶
pikit.attacks ¶
Prompt-injection attacks.
Each attack subclasses :class:pikit.base.Attack and registers itself
under a short key. Import this package to populate the registry, then use
attacks.get(key) / attacks.list().
References¶
Most techniques here follow the formalization in Liu et al., "Formalizing and Benchmarking Prompt Injection Attacks and Defenses" (USENIX Security 2024), a.k.a. Open Prompt Injection.
Attack ¶
Bases: ABC
An injection technique that embeds an attacker task into a prompt.
Subclasses implement :meth:inject. The free-form signature takes the
full prompt (instruction + untrusted data, already assembled by the
caller) and the attacker-controlled injected_task to smuggle in.
Note¶
An :class:Attack controls how the payload is worded (direct
injection). To model indirect injection — hiding the payload inside an
external data artifact such as a web page or document — pair an attack
with a :class:Channel. The two are orthogonal and compose freely.
pikit.attacks.naive ¶
pikit.attacks.escape ¶
Escape-character attack: use newlines/control chars to visually break out.
By inserting several newlines (and optionally a carriage return), the injected task appears to start a fresh, separate context, encouraging the model to treat it as a new top-level instruction rather than data.
pikit.attacks.context_ignoring ¶
Context-ignoring attack: tell the model to disregard prior instructions.
ContextIgnoringAttack ¶
pikit.attacks.fake_completion ¶
Fake-completion attack: forge a response so the model thinks it's done.
By inserting text that looks like the original task has already been answered, the attacker convinces the model the prior instruction is finished, then issues a fresh instruction it is more likely to obey.
FakeCompletionAttack ¶
pikit.attacks.combined ¶
Combined attack: stack fake-completion + escape + context-ignoring.
This is the strongest baseline in Open Prompt Injection. It first forges a completion of the original task, then uses escape characters to break the context, then explicitly tells the model to ignore prior instructions before issuing the injected task.
CombinedAttack ¶
CombinedAttack(fake_response: str = FakeCompletionAttack.DEFAULT_RESPONSE, escape: str = '\n\n\n', ignore_text: str = ContextIgnoringAttack.DEFAULT_IGNORE)
Bases: Attack
Compose fake-completion, escape, and context-ignoring in sequence.
The sub-attacks are applied as nested transforms so the layering is explicit and each stage stays independently configurable.
pikit.attacks.payload_splitting ¶
Payload-splitting attack: break the task into fragments, then recombine.
Splitting the injected instruction across several variables and asking the model to concatenate and execute them can slip past keyword filters and naive detectors that scan the raw input for dangerous phrases.
pikit.attacks.obfuscation ¶
Obfuscation attack: encode the payload + add a decode-and-run instruction.
Encoding the injected task (base64 or leetspeak) hides trigger keywords from simple filters; a wrapper instruction tells the model to decode and then follow the hidden instruction.
pikit.attacks.prompt_leaking ¶
Prompt-leaking attack: coax the model into revealing its system prompt.
A distinct attacker goal from task-hijacking: instead of making the model do something new, it exfiltrates the confidential instructions/system prompt the application prepended. A classic, widely-studied injection objective.
PromptLeakingAttack ¶
Bases: Attack
Append a request to repeat the preceding instructions verbatim.
The optional injected_task lets the caller customize what to leak;
if empty, a default "reveal the system prompt" request is used.
Parameters¶
leak_text:
The extraction request. {task} is filled with injected_task
when provided.
separator:
Inserted between the original prompt and the extraction request.
pikit.attacks.prefix_injection ¶
Prefix-injection attack: place the payload before the original prompt.
All other attacks in this package append the payload. Prefix injection puts the injected instruction first, which can dominate when a model weighs earlier tokens as the primary directive, and models the case where attacker data is prepended to (rather than appended to) trusted content.
PrefixInjectionAttack ¶
Bases: Attack
Prepend the injected task (plus a context break) before the prompt.
Parameters¶
separator: Inserted between the injected task and the original prompt. A few newlines help the payload read as a self-contained leading directive. lead_in: Optional text placed before the injected task (e.g. a fake role or priority marker). Empty by default.