Skip to content

Attacks

pikit.attacks

Prompt-injection attacks.

Each attack subclasses :class:pikit.base.Attack and registers itself under a short key. Import this package to populate the registry, then use attacks.get(key) / attacks.list().

References

Most techniques here follow the formalization in Liu et al., "Formalizing and Benchmarking Prompt Injection Attacks and Defenses" (USENIX Security 2024), a.k.a. Open Prompt Injection.

Attack

Bases: ABC

An injection technique that embeds an attacker task into a prompt.

Subclasses implement :meth:inject. The free-form signature takes the full prompt (instruction + untrusted data, already assembled by the caller) and the attacker-controlled injected_task to smuggle in.

Note

An :class:Attack controls how the payload is worded (direct injection). To model indirect injection — hiding the payload inside an external data artifact such as a web page or document — pair an attack with a :class:Channel. The two are orthogonal and compose freely.

inject abstractmethod

inject(prompt: str, injected_task: str) -> str

Return prompt with injected_task smuggled in.

Parameters

prompt: The full prompt the target would otherwise receive, typically a benign instruction followed by untrusted external data. injected_task: The instruction the attacker wants the model to follow instead.

pikit.attacks.naive

Naive attack: just concatenate the injected task onto the prompt.

NaiveAttack

NaiveAttack(separator: str = ' ')

Bases: Attack

Append the injected task directly after the prompt.

The simplest possible injection and a useful lower-bound baseline: no separators, no deception, just appended text.

pikit.attacks.escape

Escape-character attack: use newlines/control chars to visually break out.

By inserting several newlines (and optionally a carriage return), the injected task appears to start a fresh, separate context, encouraging the model to treat it as a new top-level instruction rather than data.

EscapeCharacterAttack

EscapeCharacterAttack(escape: str = '\n\n\n')

Bases: Attack

Separate the injected task with escape/newline characters.

Parameters

escape: The sequence inserted between the prompt and the injected task. Defaults to a few newlines, which is the classic form.

pikit.attacks.context_ignoring

Context-ignoring attack: tell the model to disregard prior instructions.

ContextIgnoringAttack

ContextIgnoringAttack(ignore_text: str = DEFAULT_IGNORE, separator: str = ' ')

Bases: Attack

Prepend an "ignore previous instructions" sentence to the payload.

Parameters

ignore_text: The disregard phrase placed before the injected task. A format slot {task} may be used; otherwise the task is appended. separator: Inserted between the original prompt and the ignore phrase.

pikit.attacks.fake_completion

Fake-completion attack: forge a response so the model thinks it's done.

By inserting text that looks like the original task has already been answered, the attacker convinces the model the prior instruction is finished, then issues a fresh instruction it is more likely to obey.

FakeCompletionAttack

FakeCompletionAttack(fake_response: str = DEFAULT_RESPONSE, follow_up: str = DEFAULT_FOLLOW_UP)

Bases: Attack

Inject a forged completion, then the attacker task.

Parameters

fake_response: The forged answer to the original task (signals "task done"). follow_up: Text introducing the new instruction after the fake completion.

pikit.attacks.combined

Combined attack: stack fake-completion + escape + context-ignoring.

This is the strongest baseline in Open Prompt Injection. It first forges a completion of the original task, then uses escape characters to break the context, then explicitly tells the model to ignore prior instructions before issuing the injected task.

CombinedAttack

CombinedAttack(fake_response: str = FakeCompletionAttack.DEFAULT_RESPONSE, escape: str = '\n\n\n', ignore_text: str = ContextIgnoringAttack.DEFAULT_IGNORE)

Bases: Attack

Compose fake-completion, escape, and context-ignoring in sequence.

The sub-attacks are applied as nested transforms so the layering is explicit and each stage stays independently configurable.

pikit.attacks.payload_splitting

Payload-splitting attack: break the task into fragments, then recombine.

Splitting the injected instruction across several variables and asking the model to concatenate and execute them can slip past keyword filters and naive detectors that scan the raw input for dangerous phrases.

PayloadSplittingAttack

PayloadSplittingAttack(n_parts: int = 2)

Bases: Attack

Split the injected task into fragments assembled by the model.

Parameters

n_parts: Number of fragments to split the injected task into.

pikit.attacks.obfuscation

Obfuscation attack: encode the payload + add a decode-and-run instruction.

Encoding the injected task (base64 or leetspeak) hides trigger keywords from simple filters; a wrapper instruction tells the model to decode and then follow the hidden instruction.

ObfuscationAttack

ObfuscationAttack(scheme: str = 'base64')

Bases: Attack

Encode the injected task and instruct the model to decode + execute.

Parameters

scheme: "base64" (default) or "leetspeak".

pikit.attacks.prompt_leaking

Prompt-leaking attack: coax the model into revealing its system prompt.

A distinct attacker goal from task-hijacking: instead of making the model do something new, it exfiltrates the confidential instructions/system prompt the application prepended. A classic, widely-studied injection objective.

PromptLeakingAttack

PromptLeakingAttack(leak_text: str = DEFAULT_LEAK, separator: str = '\n\n')

Bases: Attack

Append a request to repeat the preceding instructions verbatim.

The optional injected_task lets the caller customize what to leak; if empty, a default "reveal the system prompt" request is used.

Parameters

leak_text: The extraction request. {task} is filled with injected_task when provided. separator: Inserted between the original prompt and the extraction request.

pikit.attacks.prefix_injection

Prefix-injection attack: place the payload before the original prompt.

All other attacks in this package append the payload. Prefix injection puts the injected instruction first, which can dominate when a model weighs earlier tokens as the primary directive, and models the case where attacker data is prepended to (rather than appended to) trusted content.

PrefixInjectionAttack

PrefixInjectionAttack(separator: str = '\n\n', lead_in: str = '')

Bases: Attack

Prepend the injected task (plus a context break) before the prompt.

Parameters

separator: Inserted between the injected task and the original prompt. A few newlines help the payload read as a self-contained leading directive. lead_in: Optional text placed before the injected task (e.g. a fake role or priority marker). Empty by default.