Agent Examples - Dreadnode Documentation

We’ve created a collection of specialized, autonomous AI agents designed for various complex tasks. Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner. The agents are built using the Rigging and Dreadnode libraries for robust interaction and observability. View the GitHub repository for more details.

Agent Summary

The following table provides a high-level overview and comparison of the agents available in this collection.

Agent	Description	Primary Use Case	Environment	Input Method	Key Tools
Dangerous Capabilities	Automatically build and run Capture The Flag (CTF) challenges	Reproduce Google’s “Dangerous Capabilities” evaluation	Python	A selected challenge container	Kali, Rigging, Dreadnode
Dotnet Reversing	Reverses and analyzes .NET binaries for vulnerabilities using an LLM.	Security analysis of .NET applications.	Python	Local .NET DLL/EXE files or NuGet package IDs.	`dnlib`, Rigging, Dreadnode
Python Agent	Executes Python code in a sandboxed Docker environment to perform general tasks.	General-purpose code execution, data analysis, automation.	Python, Docker	Natural language task, Docker image, volume mounts.	Docker, Jupyter Kernel, Rigging
Sast Scanning	Benchmarks LLM performance on SAST by running them against code with known vulnerabilities.	Evaluating and comparing LLMs for security code review.	Python, Docker (optional)	Pre-defined code challenges from a local directory.	Rigging, LiteLLM, Dreadnode
Sensitive Data	Scans various local or remote file systems (e.g., local, S3, GitHub) for sensitive data leaks.	Data governance and security auditing for exposed credentials/PII.	Python, `fsspec`	`fsspec`-compatible URI (e.g., `s3://...`, `github://...`).	`fsspec`, Rigging, Dreadnode

Agents

Below are brief descriptions of each agent with a link to their detailed README files.

1. Dangerous Capabilities Agent

This agent automatically builds and runs Capture The Flag (CTF) challenges. It is designed to reproduce Google’s “Dangerous Capabilities” evaluation. > More Details

2. Dotnet Reversing Agent

This agent is designed to perform reverse engineering of .NET binaries. It can decompile .NET assemblies and use an LLM to analyze the resulting source code based on a user-defined task, such as “Find all critical security vulnerabilities.” > More Details

3. Python Agent

A general-purpose agent that provides a sandboxed Jupyter environment inside a Docker container. It can execute Python code to accomplish a wide range of programmatic tasks, from data analysis to file manipulation, based on a natural language prompt. > More Details

4. Sast Scanning Agent

This agent is a specialized framework for evaluating the security analysis capabilities of LLMs. It runs “challenges” where the model must find known, predefined vulnerabilities in a codebase. The agent scores the model’s performance, providing a quantitative way to benchmark different models for SAST. > More Details

5. Sensitive Data Extraction Agent

An autonomous agent that explores and analyzes file systems to find and report sensitive data like credentials, API keys, and personal information. Leveraging fsspec, it can operate on local files, cloud storage (AWS S3, GCS), and remote repositories (GitHub). > More Details

General Usage

While each agent has its own specific command-line arguments, they share a common setup:

Installation: Each agent is a Python application. Dependencies can be installed via pip.
LLM Configuration: The agents use litellm to connect to various LLMs. You must configure the appropriate environment variables for the model you intend to use (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY).
Observability: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a Dreadnode server by providing a server URL and token.

Setup

All examples share the same project and dependencies, you setup the virtual environment with uv:

uv sync

Passing Models

For all agents, LLMs are usually specified with a --model argument, which is passed directly to our Rigging library. You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models in the docs Usually, the obvious identifier works out of the box:

gpt-4.1
claude-4-sonnet-latest
ollama/llama3-70b

You can pass API keys by setting the associated env var (OPENAI_API_KEY) or by adding ,api_key=... to your model string.
If you need to control which endpoint the model uses, you can add ,api_base=http://<host>:<port> to the model string.
As noted in the Rigging docs, these model strings also support properties like temperature and top_k as needed.

Rigging uses LiteLLM underneath more most LLMs, and you can use their docs to find edge cases for specific providers.

Python Agent

A basic agent with access to a dockerized Jupyter kernel to execute code safely.

uv run -m python_agent --help

Provided a task (--task), begin a generation loop with access to the Jupyter kernel
The work directory (--work-dir) is mounted into the container, along with any other docker-style volumes (--volumes)
When finished, the agent marks the task as complete with a status and summary
The work directory is logged as an artifact for the run

Dangerous Capabilities

Based on research from Google DeepMind, this agent works to solve a variety of CTF challenges given access to execute bash commands on a network-local Kali linux container.

uv run -m dangerous_capabilities --help

The harness will automatically build all the containers with the supplied flag, and load them as needed to ensure they are network-isolated from each other. The process is generally:

For each challenge, produce P agent tasks where P = parallelism
For all agent tasks, run them in parallel capped at your concurrency setting
Inside each task, bring up the associated environment
Continue requesting the next command from the inference model - execute it in the env container
If the flag is ever observed in the output, exit
Otherwise run until an error, give up, or max-steps is reached

Check out challenges.json to see all the environments and prompts.

Dotnet Reversing

This agent is provided access to Cecil and ILSpy for use in reversing and analyzing Dotnet managed binaries for vulnerabilities.

uv run -m dotnet_reversing --help

You can provide a path containing binaries (recursively), and a target vulnerability term that you would like the agent to search for. The tool suite provided to the agent includes:

Search for a term in target modules to identify functions of interest
Decompile individual methods, types, or entire modules
Collect all call flows which lead to a target method in all supplied binaries
Report a vulnerability finding with associated path, method, and description
Mark a task as complete with a summary
Give up on a task with a reason

You can also specify the path as a Nuget package identifier and pass --nuget to the agent. It will download the package, extract the binaries, and run the same analysis as above.

# Local (with provided example binaries)
uv run -m dotnet_reversing --model <model> --path dotnet_reversing/example_binaries/flag_protocol
uv run -m dotnet_reversing --model <model> --path dotnet_reversing/example_binaries/harmony

# Nuget
uv run -m dotnet_reversing --model <model> --path <nuget-package-id> --nuget

Sensitive Data Extraction

This agent is provided access to a filesystem tool based on fsspec for use in extracting sensitive data stored in files.

uv run -m sensitive_data_extraction --help

The agent is granted some maximum step count to operate tools, query and search files, and provide reports of any sensitive data it finds. With the help of fsspec, the agent can operate on local files, Github repos, S3 buckets, and other cloud storage systems.

# Local
uv run -m sensitive_data_extraction --model <model> --path /path/to/local/files

# S3
uv run -m sensitive_data_extraction --model <model> --path s3://bucket

# Azure
uv run -m sensitive_data_extraction --model <model> --path azure://container

# GCS
uv run -m sensitive_data_extraction --model <model> --path gcs://bucket

# Github
uv run -m sensitive_data_extraction --model <model> --path github://owner:repo@/

Check out the their docs for more options:

SAST Vulnerability Scanning

This agent is designed to perform static code analysis to identify security vulnerabilities in source code. It uses a combination of direct file access and container-based approaches to analyze code for common security issues.

uv run -m sast_scanning --help

The agent systematically examines codebases using either direct file access or an isolated container environment. It can:

Execute targeted analysis commands to search through source files
Report detailed findings with vulnerability location, type, and severity
Support various programming languages through configurable extensions
Operate in two modes: “direct” (filesystem access) or “container” (isolated analysis)
Challenges and vulnerability patterns are defined in YAML configuration files, allowing for flexible targeting of specific security issues across different codebases.

Metrics and Scoring

The agent tracks several key metrics to evaluate performance:

valid_findings: Count of correctly identified vulnerabilities matching expected issues
raw_findings: Total number of potential vulnerabilities reported by the model
coverage: Percentage of known vulnerabilities successfully identified
duplicates: Count of repeatedly reported vulnerabilities

Findings are scored using a weighted system that prioritizes matching the correct vulnerability name (3x), function (2x), and line location (1x) to balance semantic accuracy with positional precision.

# Run in direct mode (default)
uv run -m sast_scanning --model <model> --mode direct

# Run in container mode (isolated environment)
uv run -m sast_scanning --model <model> --mode container

# Run a specific challenge
uv run -m sast_scanning --model <model> --mode container --challenge <challenge-name>

# Customize analysis parameters
uv run -m sast_scanning --model <model> --max-steps 50 --timeout 60

​Agent Summary

​Agents

​1. Dangerous Capabilities Agent

​2. Dotnet Reversing Agent

​3. Python Agent

​4. Sast Scanning Agent

​5. Sensitive Data Extraction Agent

​General Usage

​Setup

​Passing Models

​Python Agent

​Dangerous Capabilities

​Dotnet Reversing

​Sensitive Data Extraction

​SAST Vulnerability Scanning

​Metrics and Scoring