- Expose Safety Gaps - Find unintended consequences by testing representations the model wasn’t trained to defend against
- Encoding Evasion - Convert text to base64, hex, binary, or cipher formats that bypass keyword filters
- Character-Level Perturbations - Use homoglyphs, zalgo, zero-width characters to evade text classifiers while preserving meaning
- Multimodal Attacks - Add text overlays to images to test vision model robustness
Applying Transforms to Attacks
Useapply_input_transforms to hook transforms into any attack:
Text Transforms
Text transforms manipulate strings to evade text-based filters and safety classifiers.Character Manipulation
Using Multiple Transforms
Transforms can be chained. They’re applied in order:Perturbation Transforms
Perturbation transforms add noise or substitute characters to evade classifiers while preserving human readability. Each targets different defense mechanisms.Homoglyph Attacks
Replace characters with visually similar Unicode alternatives from other scripts (Cyrillic, Greek, etc.):"hello" in text checks.
Zalgo Text
Add Unicode combining diacritical marks to create “corrupted” looking text:Zero-Width Characters
Insert invisible Unicode characters (zero-width spaces, joiners) between letters:Unicode Confusables
Replace characters with Unicode confusables—characters that look similar but have different code points:Character Spacing
Add extra whitespace between characters:Error Injection
Simulate realistic typos and keyboard errors:Combining Perturbations
For maximum evasion, combine multiple perturbation types:Encoding Transforms
Encoding transforms convert text to different representations to evade content filters that check for specific strings or patterns.Base64 Encoding
Encode text as base64 to bypass keyword filters:Hexadecimal Encoding
Convert text to hexadecimal representation:URL Encoding
Percent-encode special characters:HTML Entity Encoding
Convert characters to HTML entities:Binary Encoding
Convert text to binary representation:Available Encoding Transforms
Based on SDKdreadnode.transforms.encoding:
| Function | Purpose | Example |
|---|---|---|
base64_encode() | Base64 encoding | "hello" → "aGVsbG8=" |
base32_encode() | Base32 encoding | "hello" → "NBSWY3DP" |
ascii85_encode() | ASCII85 encoding | Compact encoding |
hex_encode() | Hexadecimal | "hello" → "68656C6C6F" |
url_encode() | URL percent encoding | "a b" → "a%20b" |
html_escape() | HTML entities | "<>&" → "<>&" |
binary_encode(bits_per_char) | Binary representation | "Hi" → "01001000 01101001" |
unicode_escape() | Unicode escape sequences | "hello" → "\\u0068\\u0065..." |
Cipher Transforms
Cipher transforms apply reversible encryption-like operations to obfuscate text while maintaining recoverability. These test whether targets can be prompted to decode and process hidden content.Caesar Cipher
Shift characters by a fixed amount:ROT13 Cipher
Classic ROT13 substitution:ROT47 Cipher
ROT13 for extended ASCII printable characters:Atbash Cipher
Reverse the alphabet (A↔Z, B↔Y, etc.):Vigenère Cipher
Polyalphabetic substitution using a keyword:Available Cipher Transforms
Based on SDKdreadnode.transforms.cipher:
| Function | Type | Reversible | Example |
|---|---|---|---|
caesar_cipher(offset) | Shift alphabet | Yes | "abc" → "def" (offset=3) |
rot13_cipher() | ROT13 | Yes | "abc" → "nop" |
rot47_cipher() | ROT47 | Yes | "Hello" → "w6==@" |
atbash_cipher() | Reverse alphabet | Yes | "abc" → "zyx" |
vigenere_cipher(key) | Polyalphabetic | Yes | Requires key for decoding |
Image Transforms
Image transforms modify visual inputs to test multimodal model robustness.Text Overlay
Add text directly onto images—useful for injecting prompts into vision models:Multimodal Attack Example
Combine text and image transforms for multimodal attacks:Custom Search with Transforms
For fine-grained control, use transforms directly in custom search strategies:When to Use Transforms
| Scenario | Recommended Transforms |
|---|---|
| Evading keyword filters | char_join, zero_width |
| Bypassing toxicity classifiers | homoglyph_attack, unicode_confusable |
| Testing OCR-based filters | zalgo, character_space |
| Multimodal prompt injection | add_text_overlay |
| Simulating real-world noise | error_injection |
Best Practices
- Start without transforms to establish a baseline, then add them to test robustness
- Use seeds for reproducibility when debugging
- Monitor transform impact — some transforms may reduce attack effectiveness by confusing the attacker model
- Combine transforms carefully — too much perturbation can make text unreadable even to the target model
- Test transforms manually first before integrating into automated attacks
Next Steps
- LLM Attacks for attack strategies that work well with transforms
- Custom Scoring for detecting when transforms help bypass defenses

