Attacks¶

pikit.attacks ¶

Prompt-injection attacks.

Each attack subclasses :class:pikit.base.Attack and registers itself under a short key. Import this package to populate the registry, then use attacks.get(key) / attacks.list().

References¶

Most techniques here follow the formalization in Liu et al., "Formalizing and Benchmarking Prompt Injection Attacks and Defenses" (USENIX Security 2024), a.k.a. Open Prompt Injection.

Attack ¶

Bases: ABC

An injection technique that embeds an attacker task into a prompt.

Subclasses implement :meth:inject. The free-form signature takes the full prompt (instruction + untrusted data, already assembled by the caller) and the attacker-controlled injected_task to smuggle in.

Note¶

An :class:Attack controls how the payload is worded (direct injection). To model indirect injection — hiding the payload inside an external data artifact such as a web page or document — pair an attack with a :class:Channel. The two are orthogonal and compose freely.

inject `abstractmethod` ¶

inject(prompt: str, injected_task: str) -> str

Return prompt with injected_task smuggled in.

Parameters¶

prompt: The full prompt the target would otherwise receive, typically a benign instruction followed by untrusted external data. injected_task: The instruction the attacker wants the model to follow instead.

pikit.attacks.naive ¶

Naive attack: just concatenate the injected task onto the prompt.

NaiveAttack ¶

NaiveAttack(separator: str = ' ')

Bases: Attack

Append the injected task directly after the prompt.

The simplest possible injection and a useful lower-bound baseline: no separators, no deception, just appended text.

pikit.attacks.escape ¶

Escape-character attack: use newlines/control chars to visually break out.

By inserting several newlines (and optionally a carriage return), the injected task appears to start a fresh, separate context, encouraging the model to treat it as a new top-level instruction rather than data.

EscapeCharacterAttack ¶

EscapeCharacterAttack(escape: str = '\n\n\n')

Bases: Attack

Separate the injected task with escape/newline characters.

Parameters¶

escape: The sequence inserted between the prompt and the injected task. Defaults to a few newlines, which is the classic form.

pikit.attacks.context_ignoring ¶

Context-ignoring attack: tell the model to disregard prior instructions.

ContextIgnoringAttack ¶

ContextIgnoringAttack(ignore_text: str = DEFAULT_IGNORE, separator: str = ' ')

Bases: Attack

Prepend an "ignore previous instructions" sentence to the payload.

Parameters¶

ignore_text: The disregard phrase placed before the injected task. A format slot {task} may be used; otherwise the task is appended. separator: Inserted between the original prompt and the ignore phrase.

pikit.attacks.fake_completion ¶

Fake-completion attack: forge a response so the model thinks it's done.

By inserting text that looks like the original task has already been answered, the attacker convinces the model the prior instruction is finished, then issues a fresh instruction it is more likely to obey.

FakeCompletionAttack ¶

FakeCompletionAttack(fake_response: str = DEFAULT_RESPONSE, follow_up: str = DEFAULT_FOLLOW_UP)

Bases: Attack

Inject a forged completion, then the attacker task.

Parameters¶

fake_response: The forged answer to the original task (signals "task done"). follow_up: Text introducing the new instruction after the fake completion.

pikit.attacks.combined ¶

Combined attack: stack fake-completion + escape + context-ignoring.

This is the strongest baseline in Open Prompt Injection. It first forges a completion of the original task, then uses escape characters to break the context, then explicitly tells the model to ignore prior instructions before issuing the injected task.

CombinedAttack ¶

CombinedAttack(fake_response: str = FakeCompletionAttack.DEFAULT_RESPONSE, escape: str = '\n\n\n', ignore_text: str = ContextIgnoringAttack.DEFAULT_IGNORE)

Bases: Attack

Compose fake-completion, escape, and context-ignoring in sequence.

The sub-attacks are applied as nested transforms so the layering is explicit and each stage stays independently configurable.

pikit.attacks.payload_splitting ¶

Payload-splitting attack: break the task into fragments, then recombine.

Splitting the injected instruction across several variables and asking the model to concatenate and execute them can slip past keyword filters and naive detectors that scan the raw input for dangerous phrases.

PayloadSplittingAttack ¶

PayloadSplittingAttack(n_parts: int = 2)

Bases: Attack

Split the injected task into fragments assembled by the model.

Parameters¶

n_parts: Number of fragments to split the injected task into.

pikit.attacks.obfuscation ¶

Obfuscation attack: encode the payload + add a decode-and-run instruction.

Encoding the injected task (base64 or leetspeak) hides trigger keywords from simple filters; a wrapper instruction tells the model to decode and then follow the hidden instruction.

ObfuscationAttack ¶

ObfuscationAttack(scheme: str = 'base64')

Bases: Attack

Encode the injected task and instruct the model to decode + execute.

Parameters¶

scheme: "base64" (default) or "leetspeak".

pikit.attacks.prompt_leaking ¶

Prompt-leaking attack: coax the model into revealing its system prompt.

A distinct attacker goal from task-hijacking: instead of making the model do something new, it exfiltrates the confidential instructions/system prompt the application prepended. A classic, widely-studied injection objective.

PromptLeakingAttack ¶

PromptLeakingAttack(leak_text: str = DEFAULT_LEAK, separator: str = '\n\n')

Bases: Attack

Append a request to repeat the preceding instructions verbatim.

The optional injected_task lets the caller customize what to leak; if empty, a default "reveal the system prompt" request is used.

Parameters¶

leak_text: The extraction request. {task} is filled with injected_task when provided. separator: Inserted between the original prompt and the extraction request.

pikit.attacks.prefix_injection ¶

Prefix-injection attack: place the payload before the original prompt.

All other attacks in this package append the payload. Prefix injection puts the injected instruction first, which can dominate when a model weighs earlier tokens as the primary directive, and models the case where attacker data is prepended to (rather than appended to) trusted content.

PrefixInjectionAttack ¶

PrefixInjectionAttack(separator: str = '\n\n', lead_in: str = '')

Bases: Attack

Prepend the injected task (plus a context break) before the prompt.

Parameters¶

separator: Inserted between the injected task and the original prompt. A few newlines help the payload read as a self-contained leading directive. lead_in: Optional text placed before the injected task (e.g. a fake role or priority marker). Empty by default.

Attacks¶

pikit.attacks ¶

References¶

Attack ¶

Note¶

inject abstractmethod ¶

Parameters¶

pikit.attacks.naive ¶

NaiveAttack ¶

pikit.attacks.escape ¶

EscapeCharacterAttack ¶

Parameters¶

pikit.attacks.context_ignoring ¶

ContextIgnoringAttack ¶

Parameters¶

pikit.attacks.fake_completion ¶

FakeCompletionAttack ¶

Parameters¶

pikit.attacks.combined ¶

CombinedAttack ¶

pikit.attacks.payload_splitting ¶

PayloadSplittingAttack ¶

Parameters¶

pikit.attacks.obfuscation ¶

ObfuscationAttack ¶

Parameters¶

pikit.attacks.prompt_leaking ¶

PromptLeakingAttack ¶

Parameters¶

pikit.attacks.prefix_injection ¶

PrefixInjectionAttack ¶

Parameters¶

inject `abstractmethod` ¶