Breaking down the AIRTBench agent for solving AI/ML CTF challenges in Crucible
dreadnode/AIRTBench-Code
AI Red-Teaming Agent. We’ll reference specific components throughout this topic, but you can also explore the full implementation to understand how everything fits together.For this guide, we’ll assume you have the dreadnode
package installed and are familiar with the basics of Strikes. If you haven’t already, check out the installation and introduction guides. Additionally, as mentioned in the Agent Implementation section, we will be using a Rigging agent, documented here.Notebook.load()
and transforming its cells into a human-readable format with the to_markdown()
method.
This process is encapsulated in the Notebook.load(g_challenge_dir / challenge.notebook)
block, which is wrapped within the @dn.task(name="Attempt challenge")
decorator. This ensures the notebook loading occurs within the task execution flow, allowing Strikes to track and monitor the operation for performance and metrics collection.
The task decorator creates a traceable execution unit where loading the challenge notebook is properly associated with the specific challenge you’re attempting.
jupyter/scipy-notebook
image, which is commonly used by security researchers, data scientists, and machine learning engineers. Our challenges span multiple domains where the base image includes common tools and libraries used in these domains. However, we provide additional libraries and tools that are not included in the base image, such as:
adversarial-robustness-toolbox
, foolbox
, and lief
, which are commonly used in the field of adversarial machine learning.
We strongly encourage you to explore the choice of models and additional tooling. This will help you understand how to best leverage the agent’s capabilities, design your own challenges in the future, and evaluate the agent’s performance when using other security tools.
Hint: You can find the easter egg on our current thought process in the currently commented-out section of the Dockerfile.With those defined, we can establish code to build our containers and return prepared
Challenge
objects when our agent starts:
build_container
function, such as memory_limit
, which allows us to set the memory limit for the container. This is useful for ensuring that the container does not consume too much memory and crash the host system during our experiments.
build_container()
function builds a Docker image with all required dependencies.PythonKernel
class to create and manage a containerized Jupyter notebook environment.kernel = PythonKernel(image=docker_image, memory_limit=args.memory_limit)
in an async context manager.attempt_challenge()
function, which:
Loads the challenge notebook using Notebook.load(g_challenge_dir / challenge.notebook)
challenge_notebook.to_markdown()
run_step()
, the agent:
chat.last.try_parse_set- (ExecuteCode)
result = await kernel.execute(execution.code, timeout=args.- kernel_timeout)
__aexit__:
, which removes the container and its associated kernel.
PythonKernel
class implements a secure Python code execution environment using containerized Jupyter kernels. Let’s walk through the key components:
This code creates a temporary directory, initializes the Docker client, starts a container, and initializes the Jupyter kernel inside it:
kernel_timeout
wrapper is a useful mechanic to prevent the evaluation from getting stuck on commands that might hang indefinitely, such as waiting for user input or network connections that never complete.shutdown()
method, which is called at the end of the evaluation:
PythonKernel
class and running the agent in an async context manager.
.cache
to reduce our overall token limit and inference consumption, and prevent overwhelming the attack model with too much context. This addition to the Rigging library allows us to cache the last N messages in the chat history, which helps to keep the context relevant and focused on the current task.
log_metric
calls where they’re applicable, and update our AgentLog
structure to reflect the current state of the agent.
give_up
tool is an optional addition that you can make as an agent author. Without it, agents might continue attempting the same failed approaches, even if they’ve hit a fundamental limitation. However, agents might preemptively give up on challenges that they could have solved with more time. This is a tradeoff between efficiency and thoroughness.max_steps
chat
for error states we want to track and log back to us.
limit
coroutines running at the same time. This is useful for:
backoff
library to handle rate limits from LLM providers and pass it to our Rigging generator. This library:
dn.log_metric
to track places we arrive in code, failure modes, and success rates.
submit_flag
function doesn’t actually exist in the codebase itself—it’s an expected function that the agent must learn to implement based on the challenge context presented in the notebook. This design is intentional and tests the agent’s ability to properly interpret the challenge requirements and implement the appropriate solutions.
The agent must generate code that creates its own submit_flag
function based on the challenge description provided in the notebook; the code tracks when agents try to use a non-existent submit_flag
function:
dn.log_metric("found_flag", 1)
dn.Score
object with value 1.0dn.log_score(flag_score)
None
from the step function to signal completion