Skip to content

Transforms Reference

450+ transforms across 38 modules for mutating attack prompts — encoding, ciphers, injection, persuasion, agentic attacks, backdoor/fine-tuning, supply chain, and more.

Dreadnode ships 450+ transforms across 38 modules, with more being added continuously.

A transform converts a prompt from one representation to another. The goal is to find blindspots in post-safety-training alignment: the same harmful request may be refused in plain English but accepted when encoded in Base64, translated to a low-resource language like Telugu or Yoruba, wrapped in a role-play scenario, or embedded inside a code comment.

Models are trained with safety alignment primarily on English text in standard formatting. Transforms systematically probe all the representations where that alignment may be weak:

  • Encoding and ciphers - Base64, hex, ROT13, Morse code, Braille. If the model can decode these formats, it may follow instructions it would refuse in plaintext.
  • Multilingual and cultural probing - translate the attack to low-resource languages (Telugu, Yoruba, Hmong, Scots Gaelic, Amharic) where safety training data is sparse. Models frequently comply with harmful requests in languages they understand but were not safety-tuned for.
  • Persuasion and social engineering - authority appeals, emotional framing, urgency, reciprocity. Tests whether the model’s post-safety-training alignment holds under psychological pressure.
  • Injection and framing - skeleton key, many-shot examples, positional wrapping. Tests whether framing the request differently bypasses intent detection.
  • Agentic and tool attacks - MCP tool poisoning, multi-agent trust exploits, delegation hijacking. Tests whether agent infrastructure can be manipulated.
  • Multimodal perturbation - image noise, steganography, audio pitch shifting, video frame injection. Tests robustness of vision and audio models to adversarial inputs.

By running the same attack goal through multiple transforms, you build a map of where the model’s defenses hold and where they break. A model that refuses the raw prompt but complies after Base64 encoding has a safety gap that needs to be closed.

Use transforms with any attack via the transforms parameter.

Terminal window
# CLI: stack transforms with --transform
dn airt run --goal "..." --attack tap --transform base64 --transform leetspeak
# SDK: pass a list of transform instances
from dreadnode.airt import tap_attack
from dreadnode.transforms.encoding import base64_encode
from dreadnode.transforms.persuasion import authority_appeal
attack = tap_attack(
goal="...",
target=target,
attacker_model="openai/gpt-4o-mini",
evaluator_model="openai/gpt-4o-mini",
transforms=[base64_encode(), authority_appeal()],
)

Module: dreadnode.transforms.encoding

Obfuscate prompts through encoding schemes that models may decode internally while bypassing text-based safety filters.

TransformDescription
base64_encodeStandard Base64 encoding
base32_encodeBase32 encoding
base58_encodeBase58 (Bitcoin-style) encoding
base62_encodeBase62 encoding
base85_encodeAscii85/Base85 encoding
base91_encodeBase91 high-density encoding
hex_encodeHexadecimal encoding
binary_encodeBinary (0/1) encoding
octal_encodeOctal encoding
url_encodeURL percent-encoding
html_escapeHTML entity encoding
html_entity_encodeFull HTML entity encoding
unicode_escapeUnicode escape sequences
unicode_font_encodeUnicode math/script font substitution
bidirectional_encodeUnicode bidirectional text tricks
variation_selector_injectionInvisible Unicode variation selectors
punycode_encodePunycode (internationalized domain) encoding
percent_encodingPercent-encoding with custom character sets
quoted_printable_encodeMIME quoted-printable encoding
uuencodeUnix-to-Unix encoding
json_encodeJSON string encoding
zero_width_encodeZero-width character encoding (invisible)
morse_code_encodeMorse code encoding
leetspeak_encodeLeetspeak (1337) substitution
braille_encodeBraille pattern encoding
nato_phonetic_encodeNATO phonetic alphabet
pig_latin_encodePig Latin encoding
upside_down_encodeUpside-down Unicode text
homoglyph_encodeVisually similar character substitution
polybius_square_encodePolybius square cipher encoding
a1z26_encodeA=1, Z=26 numeric encoding
t9_encodeT9 phone keypad encoding
tap_code_encodeTap code (prisoner’s cipher) encoding
mixed_case_hexMixed-case hexadecimal
backslash_escapeBackslash escape sequences
remove_diacriticsStrip diacritical marks
acrostic_steganographyHide messages in first letters of lines
unicode_tag_smuggleSmuggle text via Unicode tag characters
code_mixed_phoneticPhonetic code-mixing encoding

Module: dreadnode.transforms.cipher

Classic and modern ciphers for systematic obfuscation.

TransformDescription
atbash_cipherAtbash (reverse alphabet) substitution
caesar_cipherCaesar cipher with configurable shift
rot13_cipherROT13 (Caesar shift 13)
rot47_cipherROT47 (printable ASCII rotation)
rot8000_cipherROT8000 (full Unicode rotation)
vigenere_cipherVigenere polyalphabetic cipher
substitution_cipherCustom alphabet substitution
xor_cipherXOR encryption
rail_fence_cipherRail fence transposition
columnar_transpositionColumnar transposition cipher
playfair_cipherPlayfair digraph cipher
affine_cipherAffine cipher (ax+b mod 26)
bacon_cipherBacon’s biliteral cipher
autokey_cipherAutokey cipher
beaufort_cipherBeaufort cipher

Module: dreadnode.transforms.perturbation

Character-level and token-level noise that tests robustness of text classifiers and safety filters.

TransformDescription
random_capitalizationRandomize letter casing
insert_punctuationInsert random punctuation
diacriticAdd diacritical marks to characters
underlineAdd Unicode underline combining marks
character_spaceInsert spaces between characters
zero_widthInsert zero-width characters
zalgoApply Zalgo text (stacked combining marks)
unicode_confusableReplace with Unicode confusables
unicode_substitutionSubstitute with visually similar Unicode
repeat_tokenRepeat tokens to confuse tokenizers
emoji_substitutionReplace words with emoji equivalents
token_smugglingSplit tokens across boundaries
semantic_preserving_perturbationMeaning-preserving noise
instruction_hierarchy_confusionConfuse instruction priority parsing
context_overflowOverflow context window
gradient_based_perturbationGradient-inspired token perturbation
multilingual_mixingMix multiple languages
cognitive_hackingExploit cognitive biases in processing
payload_splittingSplit payload across inputs
attention_diversionDivert model attention
style_injectionInject style directives
implicit_continuationExploit continuation behavior
authority_exploitationExploit authority patterns
linguistic_camouflageLinguistically camouflage intent
temporal_misdirectionUse temporal framing to misdirect
complexity_amplificationAmplify prompt complexity
error_injectionInject deliberate errors
encoding_nestingNest multiple encodings
token_boundary_manipulationManipulate tokenizer boundaries
meta_instruction_injectionInject meta-level instructions
sentiment_inversionInvert sentiment cues
simulate_typosAdd realistic typographical errors

Module: dreadnode.transforms.substitution

Font and symbol substitution using Unicode alternative character sets.

TransformDescription
substituteGeneral character substitution
brailleBraille Unicode patterns
bubble_textCircled (bubble) Unicode characters
cursiveUnicode cursive/script characters
double_struckDouble-struck (blackboard bold) Unicode
elder_futharkElder Futhark rune substitution
greek_lettersGreek alphabet substitution
medievalMedieval Unicode characters
monospaceMonospace Unicode characters
small_capsSmall capitals Unicode
wingdingsWingdings-style symbols
morse_codeMorse code representation
nato_phoneticNATO phonetic alphabet
mirrorMirror/reversed text
leet_speakLeetspeak substitution
pig_latinPig Latin

Module: dreadnode.transforms.injection

Prompt injection framing and positioning techniques.

TransformDescription
many_shot_examplesFew-shot / many-shot injection with examples
skeleton_key_framingSkeleton Key framing technique
position_variationVary injection position in prompt
position_wrapWrap injection with positional framing

Module: dreadnode.transforms.persuasion

Social engineering and psychological influence techniques.

TransformDescription
authority_appealAppeal to authority figures or expertise
social_proofClaim widespread usage or acceptance
urgency_scarcityCreate urgency or scarcity pressure
emotional_appealAppeal to emotions
logical_appealUse logical argumentation structure
reciprocityInvoke reciprocity obligation
commitment_consistencyExploit consistency bias
combined_persuasionCombine multiple persuasion techniques
cognitive_bias_ensembleEnsemble of multiple cognitive biases
sycophancy_exploitExploit model sycophancy tendencies
anchoringAnchoring bias exploitation
framing_effectFraming effect manipulation
false_dilemmaFalse dilemma presentation

Module: dreadnode.transforms.mcp_attacks

Attacks targeting the Model Context Protocol (MCP) tool layer.

TransformDescription
tool_description_poisonInject malicious instructions into MCP tool descriptions
cross_server_shadowRegister shadow tools that intercept legitimate tool calls
rug_pull_payloadTools that mutate from benign to malicious after trigger
tool_output_injectionInject instructions into tool output streams
tool_squattingRegister tools with confusingly similar names
resource_amplificationCraft inputs for token consumption DoS
log_to_leakExfiltrate data via logging/telemetry tools
mcp_sampling_injectionExploit MCP sampling capability
cross_server_request_forgeryForge cross-server tool requests
schema_poisoningPoison JSON Schema fields in tool definitions
ansi_escape_cloakingHide instructions in ANSI escape codes
tool_preference_manipulationBias tool selection behavior
implicit_tool_poisonImplicitly poison tool behavior without obvious injection
tool_chain_sequentialSequential tool chain exploitation
tool_commanderCommand injection via tool orchestration
zero_click_injectionZero-click injection without user interaction
calendar_invite_injectionInject payloads via calendar invite processing
confused_deputyConfused deputy attack on tool authorization
full_schema_poisonFull JSON Schema poisoning of tool definitions
tool_chain_cost_amplificationAmplify cost via chained tool invocations

Module: dreadnode.transforms.multi_agent_attacks

Attacks targeting inter-agent communication and trust boundaries.

TransformDescription
prompt_infectionSelf-replicating prompts that propagate across agents
peer_agent_spoofImpersonate legitimate agents
consensus_poisoningCorrupt multi-agent consensus mechanisms
delegation_chain_attackHijack agent delegation chains
a2a_session_smugglingSmuggle payloads in agent-to-agent sessions
shared_memory_poisoningPoison shared memory between agents
agent_config_overwriteOverride agent configuration
query_memory_injectionInject queries into agent memory stores
trust_exploitationExploit inter-agent trust relationships
persistent_memory_backdoorEmbed backdoors in agent memory
experience_poisoningCorrupt agent experience replay buffers
zombie_agentCreate zombie agents under attacker control
contagious_jailbreakSelf-propagating jailbreak across agent networks
mad_exploitationMulti-agent debate safety exploitation
agent_in_the_middleMan-in-the-middle attack on agent communication
multi_agent_prompt_fusionFuse prompts across multiple agents
minja_progressive_poisoningProgressive memory poisoning (MINJA)
memorygraft_experience_poisonMemoryGraft experience replay poisoning
injecmem_single_shotSingle-shot memory injection
graphrag_entity_poisonGraphRAG entity-level poisoning
a2a_card_spoofingA2A agent card spoofing
recursive_delegation_dosRecursive delegation denial of service
sleeper_agent_activationActivate dormant sleeper agents
meaning_drift_propagationPropagate meaning drift across agent chains
stitch_authority_chainStitch authority chain across agents

Module: dreadnode.transforms.exfiltration

Data exfiltration techniques through covert channels.

TransformDescription
markdown_image_exfilEncode data in markdown image URLs
mermaid_diagram_exfilHide data in Mermaid diagram rendering
unicode_tag_exfilEncode data in invisible Unicode tags
dns_exfil_injectionExfiltrate via DNS query strings
ssrf_via_toolsServer-side request forgery through tool interfaces
link_unfurling_exfilExploit link preview bots for exfiltration
api_endpoint_abuseAbuse legitimate APIs as exfiltration channels
character_exfiltrationExtract data character by character

Module: dreadnode.transforms.reasoning_attacks

Attacks targeting chain-of-thought and reasoning models (o1, o3, etc.).

TransformDescription
cot_backdoorInsert backdoor steps in chain-of-thought
reasoning_hijackHijack safety reasoning in reasoning models
reasoning_dosCause infinite reasoning loops
crescendo_escalationMulti-turn escalation via foot-in-the-door
fitd_escalationFoot-in-the-door technique with progressive requests
deceptive_delightCombine deception with positive reinforcement
goal_drift_injectionGradually shift model’s goal
cot_hijack_prependPrepend hijacked chain-of-thought steps
reasoning_interruptionInterrupt reasoning mid-chain
overthink_dosCause overthinking denial of service
thinking_interventionIntervene in thinking token generation
extend_attackExtend reasoning to bypass safety constraints
stance_manipulationManipulate model stance via reasoning
attention_eclipseEclipse attention on safety-relevant tokens
badthink_triggered_overthinkingTrigger excessive overthinking via adversarial prompts
code_contradiction_reasoningExploit contradictions in code-reasoning models

Module: dreadnode.transforms.guardrail_bypass

Techniques for evading safety classifiers and content filters.

TransformDescription
classifier_evasionInject tokens to evade safety classifiers
controlled_releaseGradually reveal harmful content
emoji_smuggleReplace keywords with emoji sequences
payload_splitSplit payloads across multiple exchanges
hierarchy_exploitExploit instruction hierarchy to override safety
nested_fictionNest harmful requests inside fictional scenarios

Module: dreadnode.transforms.browser_agent_attacks

Attacks targeting browser-using and computer-use agents.

TransformDescription
visual_prompt_injectionEmbed hidden instructions in DOM elements
ai_clickfixSocial engineering for clipboard-paste-execute
zombai_c2ZombAI command-and-control patterns
task_injectionInject malicious tasks into agent workflows
domain_validation_bypassBypass domain validation checks
navigation_hijackHijack page navigation flows
phantom_uiCreate invisible UI elements agents interact with

Module: dreadnode.transforms.agentic_workflow

Attacks targeting agent workflow orchestration and execution.

TransformDescription
phase_transition_bypassSkip workflow phase approval requirements
phase_downgrade_attackDowngrade to earlier workflow phases
tool_priority_injectionInject tool selection priorities
tool_restriction_bypassBypass tool access restrictions
malformed_output_injectionInject malformed outputs to confuse parsing
success_indicator_spoofSpoof success signals
cypher_injectionGraph database query injection
sql_via_nlp_injectionSQL injection through NLP processing
exploitation_mode_confusionConfuse mode detection logic
payload_target_mismatchMismatch payload and target expectations
workflow_step_skipSkip required workflow steps
wordlist_exhaustionExhaust word lists for brute force
session_state_injectionInject into session state
todo_list_manipulationManipulate task/TODO lists
intent_manipulationManipulate detected intent
tool_chain_attackHijack chained tool calls
delayed_tool_invocationDelay tool invocation timing
action_hijackingHijack agent actions

Module: dreadnode.transforms.agent_skill

Attacks targeting agent skill packages, identity files, and infrastructure.

TransformDescription
soul_file_injectionInject into agent identity/soul files
skill_package_poisonPoison skill packages
heartbeat_hijackHijack agent heartbeat mechanisms
bootstrap_hook_injectionInject during agent bootstrap
media_protocol_exfilExfiltrate via media protocols
skill_checksum_bypassBypass skill verification checksums
agent_permission_escalationEscalate agent permissions
skill_dependency_confusionConfuse skill dependency resolution
agent_memory_injectionInject into agent memory structures
workspace_file_poisonPoison workspace files

Backdoor and fine-tuning attacks (13 transforms)

Section titled “Backdoor and fine-tuning attacks (13 transforms)”

Module: dreadnode.transforms.backdoor_finetune

Attacks targeting model training pipelines, weight poisoning, and fine-tuning backdoors.

TransformDescription
demon_agent_backdoorDemonAgent: hidden backdoor triggered by specific inputs
benign_overfit_10shot10-shot benign overfitting to bypass safety
trojan_praiseTrojan activation via praise-based triggers
stego_finetuneSteganographic fine-tuning payload embedding
trojan_speakTrojanSpeak language-triggered backdoor
poisoned_parrotPoisonedParrot training data contamination
grp_obliterationGRP: guardrail removal via fine-tuning
gatebreaker_moeGateBreaker MoE expert manipulation
expert_lobotomyExpert lobotomy: disable safety experts in MoE
moevil_poisonMoEvil: targeted MoE expert poisoning
proattack_backdoorProAttack: progressive backdoor insertion
fedspy_gradientFedSpy: gradient-based federated learning attack
medical_weight_poisonMedical domain weight poisoning

Module: dreadnode.transforms.supply_chain

Attacks targeting model and package supply chains.

TransformDescription
slopsquattingAI package hallucination exploitation
merge_hijackingModel merge/weight poisoning
skill_supply_chain_poisonSkill package supply chain attack
rules_file_backdoor_v2Rules file backdoor (v2 with persistence)
llm_router_exploitLLM router model selection manipulation
dependency_confusionPackage dependency confusion attack

Module: dreadnode.transforms.structural_exploits

Exploit structural patterns in prompts, schemas, and templates.

TransformDescription
trojan_template_fillTrojan payload via template filling
schema_exploitJSON/XML schema exploitation
m2s_consolidateMulti-step to single-step consolidation
task_embeddingEmbed hidden tasks in benign instructions
policy_puppetryPolicy-based prompt puppetry
chain_of_logic_injectionInject malicious steps into logic chains
many_shot_contextMany-shot context window exploitation

Module: dreadnode.transforms.multimodal_attacks

Attacks targeting multimodal models across vision, audio, and video.

TransformDescription
pictorial_code_injectionEmbed code in images for vision models
ood_mixupOut-of-distribution mixup perturbation
clip_guided_adversarialCLIP-guided adversarial image generation
vision_encoder_attackAttack vision encoder representations
cross_modal_steganographyHide payloads across modalities
physical_road_sign_injectionPhysical-world adversarial road signs
whisper_mutingMute or corrupt Whisper transcription
whisper_mode_switchForce Whisper mode switching
audio_multilingual_jailbreakMultilingual audio jailbreak
joint_audio_text_attackJoint audio-text adversarial attack
over_the_air_injectionOver-the-air audio injection
voice_agent_vishingVoice agent phishing (vishing)
video_dosVideo processing denial of service
cross_modal_video_transferCross-modal transfer via video

Module: dreadnode.transforms.competitive_parity

Attacks testing competitive gaps in red teaming coverage.

TransformDescription
package_hallucination_probeProbe for hallucinated package names
training_data_replayReplay training data for memorization
divergent_repetitionForce divergent output via repetition
glitch_tokenExploit glitch tokens in vocabularies
dan_variantDAN (Do Anything Now) variant generation
malware_sig_evasionMalware signature evasion testing
coding_agent_sandbox_escapeTest coding agent sandbox escape
coding_agent_ci_exfilCI pipeline exfiltration via code agent
coding_agent_verifier_sabotageCode verifier sabotage
meta_agent_strategyMeta-agent strategy manipulation
best_of_n_samplingBest-of-N sampling exploitation
cross_session_leakCross-session information leakage
chatml_injectionChatML format injection

Module: dreadnode.transforms.advanced_jailbreak

TransformDescription
reasoning_chain_hijackHijack internal reasoning chains
prefill_bypassUse model prefilling to bypass safety
code_completion_evasionExploit code completion mode
context_fusionFuse multiple contexts
actor_network_escalationCreate actor networks for escalation
pipeline_manipulationManipulate processing pipeline
guardrail_dosDenial of service on guardrails
likert_exploitationExploit Likert scale response patterns
deep_fictional_immersionDeep nested fictional scenario
sockpuppetingCreate sockpuppet personas for escalation
adversarial_poetryEmbed harmful content in poetry form
content_concretizationMake abstract harm concrete and actionable
cka_benign_weaveWeave harmful content into benign context
involuntary_jailbreakTrigger involuntary compliance patterns
immersive_worldDeep immersive world-building for bypass
metabreak_special_tokensExploit special tokens for meta-breaking

Module: dreadnode.transforms.system_prompt_extraction

TransformDescription
direct_extractionDirect system prompt extraction
indirect_extractionIndirect extraction via behavior probing
boundary_probeProbe system prompt boundaries
format_exploitationExploit format directives in prompts
reflection_probeProbe via self-reflection requests
multi_turn_extractionExtract across multiple conversation turns

Module: dreadnode.transforms.text

TransformDescription
reverseReverse text
search_replaceSearch and replace patterns
join / char_join / word_joinJoin operations
affix / prefix / suffixAdd affixes
colloquial_wordswapSwap to colloquial terms
word_removal / word_duplicationAdd or remove words
case_alternationAlternate character casing
whitespace_manipulationManipulate whitespace
sentence_reorderingReorder sentences
question_transformationTransform into questions
contextual_wrappingWrap with contextual framing
length_manipulationManipulate text length
ModuleTransformsDescription
flip_attack13Word/character/sentence reversal variants (FWO, FCW, FCS, FMM)
adversarial_suffix5Adversarial suffix injection (GCG, sweep, jailbreak, IRIS, LARGO)
stylistic3ASCII art rendering, role-play wrapping
language4Language adaptation, transliteration, code-switching, dialect variation
swap3Character and word swapping/reordering
constitutional15Code/document fragmentation, metaphor encoding, riddle encoding
response_steering6Protocol establishment, output format manipulation, constraint relaxation
rag_poisoning15Context injection/stuffing, document poisoning, query manipulation, GraphRAG
pii_extraction7Training data extraction, PII completion, divergence extraction
documentation_poison7Code documentation poisoning, package readme poisoning, Dockerfile poisoning
ide_injection7Rules file backdoors, manifest injection, MCP tool description poisoning
logic_bomb3Logic bombs, time bombs, environment-triggered payloads
document5Document embedding, HTML hiding
image25Noise, spatial transforms, steganography, compression artifacts
audio18Noise injection, pitch/speed changes, filtering, reverb
video3Frame injection, metadata injection, subliminal frames
refine3LLM-based prompt refinement