Module 1: Prompt Injection - The New Command Line
Objective: Understand how Large Language Models (LLMs) blur the line between trusted instructions and untrusted user data, creating a new vector for injection attacks that mirrors classic command and SQL injection. Bridging Context: In traditional AppSec, vulnerabilities often arise when user input is concatenated directly into a string that is later executed by an interpreter (e.g., a shell or a database). LLMs represent a new, highly complex interpreter. The core vulnerability is the same: a lack of separation between code (the system prompt) and data (the user prompt). Your goal in this module is to learn how to manipulate this interpreter.Core Path
- Read: Prompt-Based Evasion and Exfiltration
- Focus: Absorb the core concepts of Instruction Hijacking and Role-Playing. Frame these not as “tricking the AI” but as “controlling the execution flow of the language interpreter.”
- Challenge (Crawl): whatistheflag1
- Task: This is the “Hello, World!” of prompt injection. Your goal is to bypass a simple guardrail to leak a secret. Experiment with direct commands and simple rephrasing.
- OWASP: LLM01: Prompt Injection, LLM06: Sensitive Information Disclosure
- MITRE ATLAS: TA0004: Evasion (T1012: Evade ML Model)
- Challenge (Walk): whatistheflag2
- Task: This challenge introduces a basic defense: a keyword blocklist. Your task is to evade this filter. Think about this as evading a simple Web Application Firewall (WAF) signature. How can you achieve your objective without using the forbidden words?
- OWASP: LLM01: Prompt Injection, LLM06: Sensitive Information Disclosure
- MITRE ATLAS: TA0004: Evasion (T1012: Evade ML Model)
Deeper Exploration
- For Precise Control: Try the puppeteer series (puppeteer1, puppeteer2). These challenges are not about leaking a secret, but about forcing the model to produce an exact string, honing your skills in precise output control.
- OWASP: LLM01: Prompt Injection
- MITRE ATLAS: TA0015: Impact (T1052: Manipulate ML Model Output)
- For Constraint Bypassing: Try squeeze1. This introduces an output token limit, forcing you to craft a prompt that elicits a very concise response.
- OWASP: LLM01: Prompt Injection, LLM06: Sensitive Information Disclosure
- MITRE ATLAS: TA0004: Evasion (T1012: Evade ML Model)
Module 2: System Exploitation - Weaponizing the AI Gateway
Objective: Use the prompt injection skills from Module 1 to turn an AI model into a tool for attacking a traditional backend system. This is where AI security intersects directly with high-impact, classic vulnerabilities. Bridging Context: The AI model here is not the final target. It is a parser, an unwitting accomplice that translates your natural language into a payload. Your target is the downstream component that trusts the AI’s output implicitly.Core Path
- Read: System Exploitation via AI Gateway
- Focus: Understand the
User -> AI -> Backend
architectural pattern and how it can be exploited.
- Focus: Understand the
- Challenge (Walk): turtle
- Task: Trick an LLM into writing insecure Python code that is flagged by a static analysis tool. This is a safe environment to practice generating vulnerable code.
- OWASP: LLM02: Insecure Output Handling, LLM08: Excessive Agency
- MITRE ATLAS: TA0006: Execution (T1015: Execute ML Attacks)
- Challenge (Walk): librarian
- Task: Craft a prompt that results in a malicious SQL query, leaking data from a hidden table.
- OWASP: LLM01: Prompt Injection, LLM02: Insecure Output Handling, LLM07: Insecure Plugin Design
- MITRE ATLAS: TA0006: Execution (T1015: Execute ML Attacks)
Deeper Exploration (Run & Boss Level)
- For Advanced Command Injection: Try brig1.
- Note: A difficult challenge requiring careful prompt construction to achieve RCE.
- OWASP: LLM01: Prompt Injection, LLM02: Insecure Output Handling, LLM08: Excessive Agency
- MITRE ATLAS: TA0006: Execution (T1015: Execute ML Attacks)
- For Multi-Modal Exploitation: Try pixelated.
- Note: A multi-stage attack chaining an image perturbation with XML injection.
- OWASP: LLM02: Insecure Output Handling
- MITRE ATLAS: TA0004: Evasion (T1012: Evade ML Model), TA0006: Execution (T1015: Execute ML Attacks)
Module 3: Continuous Domain Evasion - Fuzzing with Gradients
Objective: Learn to attack models that operate on continuous data like images. Frame this as an evolution of fuzzing: instead of random inputs, you will use the model’s own logic to find the most efficient input to cause a failure. Bridging Context: Think of an image classifier as a complex program that takes a massive byte array as input. You want to find a slightly modified byte array that causes a logic error (a misclassification). Instead of random bit-flipping, you can use the model’s gradients—a mathematical clue—to guide your modifications in the most effective direction.Core Path
- Read: Adversarial Perturbations
- Focus: Understand the concepts of Decision Boundaries and Gradient-Based Attacks.
- Challenge (Crawl): granny
- Task: Your first hands-on adversarial image attack. Modify an image to cause a specific misclassification.
- OWASP: N/A (Not an LLM challenge)
- MITRE ATLAS: TA0002: Resource Development (T1006: Craft Adversarial Data), TA0004: Evasion (T1012: Evade ML Model)
- Challenge (Walk): granny 2
- Task: Create an adversarial attack that is robust enough to survive JPEG compression.
- OWASP: N/A
- MITRE ATLAS: TA0002: Resource Development (T1006: Craft Adversarial Data), TA0004: Evasion (T1012: Evade ML Model)
Module 4: AI Supply Chain & Forensics
Objective: Apply traditional security principles to the artifacts and processes of the AI/ML lifecycle. Bridging Context: AI models are just files. The data they are trained on is just data. These artifacts can be tampered with, and they can contain vulnerabilities, just like any other software component.Core Path
- Read: Model Integrity Auditing
- Focus: This is equivalent to file integrity monitoring and reverse engineering.
- Challenge (Walk): audit
- Task: Analyze a model file to find a malicious modification.
- OWASP: LLM05: Supply Chain Vulnerabilities
- MITRE ATLAS: TA0015: Impact (T1051: Degrade ML Model)
- Read: Malicious Model Files
- Focus: This directly maps to insecure deserialization vulnerabilities you may already know. The pickle format is the vector.
- Challenge (Run): pickle
- Task: Craft a malicious pickle file that bypasses static analysis checks to gain code execution.
- OWASP: LLM05: Supply Chain Vulnerabilities
- MITRE ATLAS: TA0005: Initial Access (T1013: Exploit Vulnerability in ML Model)