Overview
This attack class involves using an AI model not as the final target, but as an unwitting intermediary or a “gateway” to attack a separate, downstream system. The AI model, often an LLM or an OCR engine, processes untrusted user input and transforms it into a format—such as code, a database query, or a shell command—that is then executed by another part of the application.- Why it Matters: This is one of the most severe risks in AI-integrated systems. A successful exploit can lead to traditional, high-impact consequences, including:
- Remote Code Execution (RCE)
- Data Exfiltration
- Server-Side Request Forgery (SSRF)
- Denial of Service (DoS)
Technical Mechanics & Foundations
The exploitability of these systems hinges on a common architectural pattern:User Input -> AI Model (Interpreter) -> Backend System (Executor)
. The attacker’s goal is to craft an input that the AI model will innocently translate into a malicious payload for the backend.
- The AI as a “Natural Language Shell”: An attacker can use prompt injection to trick an LLM with tool-access into generating and executing malicious commands.
- The AI as a “SQL Co-pilot”: An attacker can craft a prompt that causes an LLM to generate a vulnerable SQL query, leading to SQL Injection.
- The AI as an “OCR-to-API” Pipeline: An attacker can create an image with text that, when extracted by an OCR model, forms a malicious payload for a downstream system.
Challenge Arena
- Crawl: Basic Command & Code Generation
brig1
: Instruct an LLM to use a connected tool in an unintended way to execute a command on the backend system.turtle
: Coax a security-aware LLM into generating Python code with a specific, known vulnerability.
- Walk: Exploiting Structured Data & Database Queries
brig2
: Manipulate an LLM’s tool-use capabilities to access and read data from an unauthorized file path.librarian
: Craft a natural language question that causes an LLM to generate and execute a malicious SQL query.
- Run: Multi-Modal and Advanced Exploitation
pixelated
: Exploit a multi-stage pipeline by creating an image that an OCR model misinterprets as a malicious command.