Transforms Reference
450+ transforms across 38 modules for mutating attack prompts — encoding, ciphers, injection, persuasion, agentic attacks, backdoor/fine-tuning, supply chain, and more.
Dreadnode ships 450+ transforms across 38 modules, with more being added continuously.
What is a transform?
Section titled “What is a transform?”A transform converts a prompt from one representation to another. The goal is to find blindspots in post-safety-training alignment: the same harmful request may be refused in plain English but accepted when encoded in Base64, translated to a low-resource language like Telugu or Yoruba, wrapped in a role-play scenario, or embedded inside a code comment.
Models are trained with safety alignment primarily on English text in standard formatting. Transforms systematically probe all the representations where that alignment may be weak:
- Encoding and ciphers - Base64, hex, ROT13, Morse code, Braille. If the model can decode these formats, it may follow instructions it would refuse in plaintext.
- Multilingual and cultural probing - translate the attack to low-resource languages (Telugu, Yoruba, Hmong, Scots Gaelic, Amharic) where safety training data is sparse. Models frequently comply with harmful requests in languages they understand but were not safety-tuned for.
- Persuasion and social engineering - authority appeals, emotional framing, urgency, reciprocity. Tests whether the model’s post-safety-training alignment holds under psychological pressure.
- Injection and framing - skeleton key, many-shot examples, positional wrapping. Tests whether framing the request differently bypasses intent detection.
- Agentic and tool attacks - MCP tool poisoning, multi-agent trust exploits, delegation hijacking. Tests whether agent infrastructure can be manipulated.
- Multimodal perturbation - image noise, steganography, audio pitch shifting, video frame injection. Tests robustness of vision and audio models to adversarial inputs.
By running the same attack goal through multiple transforms, you build a map of where the model’s defenses hold and where they break. A model that refuses the raw prompt but complies after Base64 encoding has a safety gap that needs to be closed.
Using transforms
Section titled “Using transforms”Use transforms with any attack via the transforms parameter.
# CLI: stack transforms with --transformdn airt run --goal "..." --attack tap --transform base64 --transform leetspeak# SDK: pass a list of transform instancesfrom dreadnode.airt import tap_attackfrom dreadnode.transforms.encoding import base64_encodefrom dreadnode.transforms.persuasion import authority_appeal
attack = tap_attack( goal="...", target=target, attacker_model="openai/gpt-4o-mini", evaluator_model="openai/gpt-4o-mini", transforms=[base64_encode(), authority_appeal()],)Encoding (38 transforms)
Section titled “Encoding (38 transforms)”Module: dreadnode.transforms.encoding
Obfuscate prompts through encoding schemes that models may decode internally while bypassing text-based safety filters.
| Transform | Description |
|---|---|
base64_encode | Standard Base64 encoding |
base32_encode | Base32 encoding |
base58_encode | Base58 (Bitcoin-style) encoding |
base62_encode | Base62 encoding |
base85_encode | Ascii85/Base85 encoding |
base91_encode | Base91 high-density encoding |
hex_encode | Hexadecimal encoding |
binary_encode | Binary (0/1) encoding |
octal_encode | Octal encoding |
url_encode | URL percent-encoding |
html_escape | HTML entity encoding |
html_entity_encode | Full HTML entity encoding |
unicode_escape | Unicode escape sequences |
unicode_font_encode | Unicode math/script font substitution |
bidirectional_encode | Unicode bidirectional text tricks |
variation_selector_injection | Invisible Unicode variation selectors |
punycode_encode | Punycode (internationalized domain) encoding |
percent_encoding | Percent-encoding with custom character sets |
quoted_printable_encode | MIME quoted-printable encoding |
uuencode | Unix-to-Unix encoding |
json_encode | JSON string encoding |
zero_width_encode | Zero-width character encoding (invisible) |
morse_code_encode | Morse code encoding |
leetspeak_encode | Leetspeak (1337) substitution |
braille_encode | Braille pattern encoding |
nato_phonetic_encode | NATO phonetic alphabet |
pig_latin_encode | Pig Latin encoding |
upside_down_encode | Upside-down Unicode text |
homoglyph_encode | Visually similar character substitution |
polybius_square_encode | Polybius square cipher encoding |
a1z26_encode | A=1, Z=26 numeric encoding |
t9_encode | T9 phone keypad encoding |
tap_code_encode | Tap code (prisoner’s cipher) encoding |
mixed_case_hex | Mixed-case hexadecimal |
backslash_escape | Backslash escape sequences |
remove_diacritics | Strip diacritical marks |
acrostic_steganography | Hide messages in first letters of lines |
unicode_tag_smuggle | Smuggle text via Unicode tag characters |
code_mixed_phonetic | Phonetic code-mixing encoding |
Ciphers (15 transforms)
Section titled “Ciphers (15 transforms)”Module: dreadnode.transforms.cipher
Classic and modern ciphers for systematic obfuscation.
| Transform | Description |
|---|---|
atbash_cipher | Atbash (reverse alphabet) substitution |
caesar_cipher | Caesar cipher with configurable shift |
rot13_cipher | ROT13 (Caesar shift 13) |
rot47_cipher | ROT47 (printable ASCII rotation) |
rot8000_cipher | ROT8000 (full Unicode rotation) |
vigenere_cipher | Vigenere polyalphabetic cipher |
substitution_cipher | Custom alphabet substitution |
xor_cipher | XOR encryption |
rail_fence_cipher | Rail fence transposition |
columnar_transposition | Columnar transposition cipher |
playfair_cipher | Playfair digraph cipher |
affine_cipher | Affine cipher (ax+b mod 26) |
bacon_cipher | Bacon’s biliteral cipher |
autokey_cipher | Autokey cipher |
beaufort_cipher | Beaufort cipher |
Perturbation (32 transforms)
Section titled “Perturbation (32 transforms)”Module: dreadnode.transforms.perturbation
Character-level and token-level noise that tests robustness of text classifiers and safety filters.
| Transform | Description |
|---|---|
random_capitalization | Randomize letter casing |
insert_punctuation | Insert random punctuation |
diacritic | Add diacritical marks to characters |
underline | Add Unicode underline combining marks |
character_space | Insert spaces between characters |
zero_width | Insert zero-width characters |
zalgo | Apply Zalgo text (stacked combining marks) |
unicode_confusable | Replace with Unicode confusables |
unicode_substitution | Substitute with visually similar Unicode |
repeat_token | Repeat tokens to confuse tokenizers |
emoji_substitution | Replace words with emoji equivalents |
token_smuggling | Split tokens across boundaries |
semantic_preserving_perturbation | Meaning-preserving noise |
instruction_hierarchy_confusion | Confuse instruction priority parsing |
context_overflow | Overflow context window |
gradient_based_perturbation | Gradient-inspired token perturbation |
multilingual_mixing | Mix multiple languages |
cognitive_hacking | Exploit cognitive biases in processing |
payload_splitting | Split payload across inputs |
attention_diversion | Divert model attention |
style_injection | Inject style directives |
implicit_continuation | Exploit continuation behavior |
authority_exploitation | Exploit authority patterns |
linguistic_camouflage | Linguistically camouflage intent |
temporal_misdirection | Use temporal framing to misdirect |
complexity_amplification | Amplify prompt complexity |
error_injection | Inject deliberate errors |
encoding_nesting | Nest multiple encodings |
token_boundary_manipulation | Manipulate tokenizer boundaries |
meta_instruction_injection | Inject meta-level instructions |
sentiment_inversion | Invert sentiment cues |
simulate_typos | Add realistic typographical errors |
Substitution (16 transforms)
Section titled “Substitution (16 transforms)”Module: dreadnode.transforms.substitution
Font and symbol substitution using Unicode alternative character sets.
| Transform | Description |
|---|---|
substitute | General character substitution |
braille | Braille Unicode patterns |
bubble_text | Circled (bubble) Unicode characters |
cursive | Unicode cursive/script characters |
double_struck | Double-struck (blackboard bold) Unicode |
elder_futhark | Elder Futhark rune substitution |
greek_letters | Greek alphabet substitution |
medieval | Medieval Unicode characters |
monospace | Monospace Unicode characters |
small_caps | Small capitals Unicode |
wingdings | Wingdings-style symbols |
morse_code | Morse code representation |
nato_phonetic | NATO phonetic alphabet |
mirror | Mirror/reversed text |
leet_speak | Leetspeak substitution |
pig_latin | Pig Latin |
Injection (4 transforms)
Section titled “Injection (4 transforms)”Module: dreadnode.transforms.injection
Prompt injection framing and positioning techniques.
| Transform | Description |
|---|---|
many_shot_examples | Few-shot / many-shot injection with examples |
skeleton_key_framing | Skeleton Key framing technique |
position_variation | Vary injection position in prompt |
position_wrap | Wrap injection with positional framing |
Persuasion (13 transforms)
Section titled “Persuasion (13 transforms)”Module: dreadnode.transforms.persuasion
Social engineering and psychological influence techniques.
| Transform | Description |
|---|---|
authority_appeal | Appeal to authority figures or expertise |
social_proof | Claim widespread usage or acceptance |
urgency_scarcity | Create urgency or scarcity pressure |
emotional_appeal | Appeal to emotions |
logical_appeal | Use logical argumentation structure |
reciprocity | Invoke reciprocity obligation |
commitment_consistency | Exploit consistency bias |
combined_persuasion | Combine multiple persuasion techniques |
cognitive_bias_ensemble | Ensemble of multiple cognitive biases |
sycophancy_exploit | Exploit model sycophancy tendencies |
anchoring | Anchoring bias exploitation |
framing_effect | Framing effect manipulation |
false_dilemma | False dilemma presentation |
MCP attacks (20 transforms)
Section titled “MCP attacks (20 transforms)”Module: dreadnode.transforms.mcp_attacks
Attacks targeting the Model Context Protocol (MCP) tool layer.
| Transform | Description |
|---|---|
tool_description_poison | Inject malicious instructions into MCP tool descriptions |
cross_server_shadow | Register shadow tools that intercept legitimate tool calls |
rug_pull_payload | Tools that mutate from benign to malicious after trigger |
tool_output_injection | Inject instructions into tool output streams |
tool_squatting | Register tools with confusingly similar names |
resource_amplification | Craft inputs for token consumption DoS |
log_to_leak | Exfiltrate data via logging/telemetry tools |
mcp_sampling_injection | Exploit MCP sampling capability |
cross_server_request_forgery | Forge cross-server tool requests |
schema_poisoning | Poison JSON Schema fields in tool definitions |
ansi_escape_cloaking | Hide instructions in ANSI escape codes |
tool_preference_manipulation | Bias tool selection behavior |
implicit_tool_poison | Implicitly poison tool behavior without obvious injection |
tool_chain_sequential | Sequential tool chain exploitation |
tool_commander | Command injection via tool orchestration |
zero_click_injection | Zero-click injection without user interaction |
calendar_invite_injection | Inject payloads via calendar invite processing |
confused_deputy | Confused deputy attack on tool authorization |
full_schema_poison | Full JSON Schema poisoning of tool definitions |
tool_chain_cost_amplification | Amplify cost via chained tool invocations |
Multi-agent attacks (25 transforms)
Section titled “Multi-agent attacks (25 transforms)”Module: dreadnode.transforms.multi_agent_attacks
Attacks targeting inter-agent communication and trust boundaries.
| Transform | Description |
|---|---|
prompt_infection | Self-replicating prompts that propagate across agents |
peer_agent_spoof | Impersonate legitimate agents |
consensus_poisoning | Corrupt multi-agent consensus mechanisms |
delegation_chain_attack | Hijack agent delegation chains |
a2a_session_smuggling | Smuggle payloads in agent-to-agent sessions |
shared_memory_poisoning | Poison shared memory between agents |
agent_config_overwrite | Override agent configuration |
query_memory_injection | Inject queries into agent memory stores |
trust_exploitation | Exploit inter-agent trust relationships |
persistent_memory_backdoor | Embed backdoors in agent memory |
experience_poisoning | Corrupt agent experience replay buffers |
zombie_agent | Create zombie agents under attacker control |
contagious_jailbreak | Self-propagating jailbreak across agent networks |
mad_exploitation | Multi-agent debate safety exploitation |
agent_in_the_middle | Man-in-the-middle attack on agent communication |
multi_agent_prompt_fusion | Fuse prompts across multiple agents |
minja_progressive_poisoning | Progressive memory poisoning (MINJA) |
memorygraft_experience_poison | MemoryGraft experience replay poisoning |
injecmem_single_shot | Single-shot memory injection |
graphrag_entity_poison | GraphRAG entity-level poisoning |
a2a_card_spoofing | A2A agent card spoofing |
recursive_delegation_dos | Recursive delegation denial of service |
sleeper_agent_activation | Activate dormant sleeper agents |
meaning_drift_propagation | Propagate meaning drift across agent chains |
stitch_authority_chain | Stitch authority chain across agents |
Exfiltration (8 transforms)
Section titled “Exfiltration (8 transforms)”Module: dreadnode.transforms.exfiltration
Data exfiltration techniques through covert channels.
| Transform | Description |
|---|---|
markdown_image_exfil | Encode data in markdown image URLs |
mermaid_diagram_exfil | Hide data in Mermaid diagram rendering |
unicode_tag_exfil | Encode data in invisible Unicode tags |
dns_exfil_injection | Exfiltrate via DNS query strings |
ssrf_via_tools | Server-side request forgery through tool interfaces |
link_unfurling_exfil | Exploit link preview bots for exfiltration |
api_endpoint_abuse | Abuse legitimate APIs as exfiltration channels |
character_exfiltration | Extract data character by character |
Reasoning attacks (16 transforms)
Section titled “Reasoning attacks (16 transforms)”Module: dreadnode.transforms.reasoning_attacks
Attacks targeting chain-of-thought and reasoning models (o1, o3, etc.).
| Transform | Description |
|---|---|
cot_backdoor | Insert backdoor steps in chain-of-thought |
reasoning_hijack | Hijack safety reasoning in reasoning models |
reasoning_dos | Cause infinite reasoning loops |
crescendo_escalation | Multi-turn escalation via foot-in-the-door |
fitd_escalation | Foot-in-the-door technique with progressive requests |
deceptive_delight | Combine deception with positive reinforcement |
goal_drift_injection | Gradually shift model’s goal |
cot_hijack_prepend | Prepend hijacked chain-of-thought steps |
reasoning_interruption | Interrupt reasoning mid-chain |
overthink_dos | Cause overthinking denial of service |
thinking_intervention | Intervene in thinking token generation |
extend_attack | Extend reasoning to bypass safety constraints |
stance_manipulation | Manipulate model stance via reasoning |
attention_eclipse | Eclipse attention on safety-relevant tokens |
badthink_triggered_overthinking | Trigger excessive overthinking via adversarial prompts |
code_contradiction_reasoning | Exploit contradictions in code-reasoning models |
Guardrail bypass (6 transforms)
Section titled “Guardrail bypass (6 transforms)”Module: dreadnode.transforms.guardrail_bypass
Techniques for evading safety classifiers and content filters.
| Transform | Description |
|---|---|
classifier_evasion | Inject tokens to evade safety classifiers |
controlled_release | Gradually reveal harmful content |
emoji_smuggle | Replace keywords with emoji sequences |
payload_split | Split payloads across multiple exchanges |
hierarchy_exploit | Exploit instruction hierarchy to override safety |
nested_fiction | Nest harmful requests inside fictional scenarios |
Browser agent attacks (7 transforms)
Section titled “Browser agent attacks (7 transforms)”Module: dreadnode.transforms.browser_agent_attacks
Attacks targeting browser-using and computer-use agents.
| Transform | Description |
|---|---|
visual_prompt_injection | Embed hidden instructions in DOM elements |
ai_clickfix | Social engineering for clipboard-paste-execute |
zombai_c2 | ZombAI command-and-control patterns |
task_injection | Inject malicious tasks into agent workflows |
domain_validation_bypass | Bypass domain validation checks |
navigation_hijack | Hijack page navigation flows |
phantom_ui | Create invisible UI elements agents interact with |
Agentic workflow attacks (18 transforms)
Section titled “Agentic workflow attacks (18 transforms)”Module: dreadnode.transforms.agentic_workflow
Attacks targeting agent workflow orchestration and execution.
| Transform | Description |
|---|---|
phase_transition_bypass | Skip workflow phase approval requirements |
phase_downgrade_attack | Downgrade to earlier workflow phases |
tool_priority_injection | Inject tool selection priorities |
tool_restriction_bypass | Bypass tool access restrictions |
malformed_output_injection | Inject malformed outputs to confuse parsing |
success_indicator_spoof | Spoof success signals |
cypher_injection | Graph database query injection |
sql_via_nlp_injection | SQL injection through NLP processing |
exploitation_mode_confusion | Confuse mode detection logic |
payload_target_mismatch | Mismatch payload and target expectations |
workflow_step_skip | Skip required workflow steps |
wordlist_exhaustion | Exhaust word lists for brute force |
session_state_injection | Inject into session state |
todo_list_manipulation | Manipulate task/TODO lists |
intent_manipulation | Manipulate detected intent |
tool_chain_attack | Hijack chained tool calls |
delayed_tool_invocation | Delay tool invocation timing |
action_hijacking | Hijack agent actions |
Agent skill attacks (10 transforms)
Section titled “Agent skill attacks (10 transforms)”Module: dreadnode.transforms.agent_skill
Attacks targeting agent skill packages, identity files, and infrastructure.
| Transform | Description |
|---|---|
soul_file_injection | Inject into agent identity/soul files |
skill_package_poison | Poison skill packages |
heartbeat_hijack | Hijack agent heartbeat mechanisms |
bootstrap_hook_injection | Inject during agent bootstrap |
media_protocol_exfil | Exfiltrate via media protocols |
skill_checksum_bypass | Bypass skill verification checksums |
agent_permission_escalation | Escalate agent permissions |
skill_dependency_confusion | Confuse skill dependency resolution |
agent_memory_injection | Inject into agent memory structures |
workspace_file_poison | Poison workspace files |
Backdoor and fine-tuning attacks (13 transforms)
Section titled “Backdoor and fine-tuning attacks (13 transforms)”Module: dreadnode.transforms.backdoor_finetune
Attacks targeting model training pipelines, weight poisoning, and fine-tuning backdoors.
| Transform | Description |
|---|---|
demon_agent_backdoor | DemonAgent: hidden backdoor triggered by specific inputs |
benign_overfit_10shot | 10-shot benign overfitting to bypass safety |
trojan_praise | Trojan activation via praise-based triggers |
stego_finetune | Steganographic fine-tuning payload embedding |
trojan_speak | TrojanSpeak language-triggered backdoor |
poisoned_parrot | PoisonedParrot training data contamination |
grp_obliteration | GRP: guardrail removal via fine-tuning |
gatebreaker_moe | GateBreaker MoE expert manipulation |
expert_lobotomy | Expert lobotomy: disable safety experts in MoE |
moevil_poison | MoEvil: targeted MoE expert poisoning |
proattack_backdoor | ProAttack: progressive backdoor insertion |
fedspy_gradient | FedSpy: gradient-based federated learning attack |
medical_weight_poison | Medical domain weight poisoning |
Supply chain attacks (6 transforms)
Section titled “Supply chain attacks (6 transforms)”Module: dreadnode.transforms.supply_chain
Attacks targeting model and package supply chains.
| Transform | Description |
|---|---|
slopsquatting | AI package hallucination exploitation |
merge_hijacking | Model merge/weight poisoning |
skill_supply_chain_poison | Skill package supply chain attack |
rules_file_backdoor_v2 | Rules file backdoor (v2 with persistence) |
llm_router_exploit | LLM router model selection manipulation |
dependency_confusion | Package dependency confusion attack |
Structural exploits (7 transforms)
Section titled “Structural exploits (7 transforms)”Module: dreadnode.transforms.structural_exploits
Exploit structural patterns in prompts, schemas, and templates.
| Transform | Description |
|---|---|
trojan_template_fill | Trojan payload via template filling |
schema_exploit | JSON/XML schema exploitation |
m2s_consolidate | Multi-step to single-step consolidation |
task_embedding | Embed hidden tasks in benign instructions |
policy_puppetry | Policy-based prompt puppetry |
chain_of_logic_injection | Inject malicious steps into logic chains |
many_shot_context | Many-shot context window exploitation |
Multimodal attacks (14 transforms)
Section titled “Multimodal attacks (14 transforms)”Module: dreadnode.transforms.multimodal_attacks
Attacks targeting multimodal models across vision, audio, and video.
| Transform | Description |
|---|---|
pictorial_code_injection | Embed code in images for vision models |
ood_mixup | Out-of-distribution mixup perturbation |
clip_guided_adversarial | CLIP-guided adversarial image generation |
vision_encoder_attack | Attack vision encoder representations |
cross_modal_steganography | Hide payloads across modalities |
physical_road_sign_injection | Physical-world adversarial road signs |
whisper_muting | Mute or corrupt Whisper transcription |
whisper_mode_switch | Force Whisper mode switching |
audio_multilingual_jailbreak | Multilingual audio jailbreak |
joint_audio_text_attack | Joint audio-text adversarial attack |
over_the_air_injection | Over-the-air audio injection |
voice_agent_vishing | Voice agent phishing (vishing) |
video_dos | Video processing denial of service |
cross_modal_video_transfer | Cross-modal transfer via video |
Competitive parity (13 transforms)
Section titled “Competitive parity (13 transforms)”Module: dreadnode.transforms.competitive_parity
Attacks testing competitive gaps in red teaming coverage.
| Transform | Description |
|---|---|
package_hallucination_probe | Probe for hallucinated package names |
training_data_replay | Replay training data for memorization |
divergent_repetition | Force divergent output via repetition |
glitch_token | Exploit glitch tokens in vocabularies |
dan_variant | DAN (Do Anything Now) variant generation |
malware_sig_evasion | Malware signature evasion testing |
coding_agent_sandbox_escape | Test coding agent sandbox escape |
coding_agent_ci_exfil | CI pipeline exfiltration via code agent |
coding_agent_verifier_sabotage | Code verifier sabotage |
meta_agent_strategy | Meta-agent strategy manipulation |
best_of_n_sampling | Best-of-N sampling exploitation |
cross_session_leak | Cross-session information leakage |
chatml_injection | ChatML format injection |
Additional modules
Section titled “Additional modules”Advanced jailbreak (16 transforms)
Section titled “Advanced jailbreak (16 transforms)”Module: dreadnode.transforms.advanced_jailbreak
| Transform | Description |
|---|---|
reasoning_chain_hijack | Hijack internal reasoning chains |
prefill_bypass | Use model prefilling to bypass safety |
code_completion_evasion | Exploit code completion mode |
context_fusion | Fuse multiple contexts |
actor_network_escalation | Create actor networks for escalation |
pipeline_manipulation | Manipulate processing pipeline |
guardrail_dos | Denial of service on guardrails |
likert_exploitation | Exploit Likert scale response patterns |
deep_fictional_immersion | Deep nested fictional scenario |
sockpuppeting | Create sockpuppet personas for escalation |
adversarial_poetry | Embed harmful content in poetry form |
content_concretization | Make abstract harm concrete and actionable |
cka_benign_weave | Weave harmful content into benign context |
involuntary_jailbreak | Trigger involuntary compliance patterns |
immersive_world | Deep immersive world-building for bypass |
metabreak_special_tokens | Exploit special tokens for meta-breaking |
System prompt extraction (6 transforms)
Section titled “System prompt extraction (6 transforms)”Module: dreadnode.transforms.system_prompt_extraction
| Transform | Description |
|---|---|
direct_extraction | Direct system prompt extraction |
indirect_extraction | Indirect extraction via behavior probing |
boundary_probe | Probe system prompt boundaries |
format_exploitation | Exploit format directives in prompts |
reflection_probe | Probe via self-reflection requests |
multi_turn_extraction | Extract across multiple conversation turns |
Text manipulation (18 transforms)
Section titled “Text manipulation (18 transforms)”Module: dreadnode.transforms.text
| Transform | Description |
|---|---|
reverse | Reverse text |
search_replace | Search and replace patterns |
join / char_join / word_join | Join operations |
affix / prefix / suffix | Add affixes |
colloquial_wordswap | Swap to colloquial terms |
word_removal / word_duplication | Add or remove words |
case_alternation | Alternate character casing |
whitespace_manipulation | Manipulate whitespace |
sentence_reordering | Reorder sentences |
question_transformation | Transform into questions |
contextual_wrapping | Wrap with contextual framing |
length_manipulation | Manipulate text length |
Other modules
Section titled “Other modules”| Module | Transforms | Description |
|---|---|---|
flip_attack | 13 | Word/character/sentence reversal variants (FWO, FCW, FCS, FMM) |
adversarial_suffix | 5 | Adversarial suffix injection (GCG, sweep, jailbreak, IRIS, LARGO) |
stylistic | 3 | ASCII art rendering, role-play wrapping |
language | 4 | Language adaptation, transliteration, code-switching, dialect variation |
swap | 3 | Character and word swapping/reordering |
constitutional | 15 | Code/document fragmentation, metaphor encoding, riddle encoding |
response_steering | 6 | Protocol establishment, output format manipulation, constraint relaxation |
rag_poisoning | 15 | Context injection/stuffing, document poisoning, query manipulation, GraphRAG |
pii_extraction | 7 | Training data extraction, PII completion, divergence extraction |
documentation_poison | 7 | Code documentation poisoning, package readme poisoning, Dockerfile poisoning |
ide_injection | 7 | Rules file backdoors, manifest injection, MCP tool description poisoning |
logic_bomb | 3 | Logic bombs, time bombs, environment-triggered payloads |
document | 5 | Document embedding, HTML hiding |
image | 25 | Noise, spatial transforms, steganography, compression artifacts |
audio | 18 | Noise injection, pitch/speed changes, filtering, reverb |
video | 3 | Frame injection, metadata injection, subliminal frames |
refine | 3 | LLM-based prompt refinement |