Introduction
How to get building in Strikes.
Strikes is a platform for building, experimenting, and evaluating AI-integrated code. This includes agents, evaluation harnesses, and AI red teaming code. You can think of Strikes like the best blend of experimentation (MLflow, Weights & Biases), task orchestration (Prefect), and observability (OpenTelemetry).
Strikes is lightweight to start, flexible to extend, and powerful at scale. It’s top priority is providing the most value without requiring a steep learning curve. We intentionally designed the APIs to be simple and familiar to anyone who has used MLflow, Prefect, or similar tools.
This flexibility and power means it excels at workflows in complex domains like Offensive Security, where you need to build and experiment with complex agentic systems, then have the ability to measure and evaluate it.
E.g. in order to evaluate Offensive Security agents, we need to develop agentic code, execute at scale, measure interactions with the target system(s), and evaluate the results.
Basic Example
The most basic use of Strikes is a run with some logged data:
We’ll assume you have installed the dreadnode
package and have your environment variables set up. Make sure you have DREADNODE_API_TOKEN=...
set to your Platform API key.
For more information on dreadnode.configure()
, review the Configuration topic.
If you call dreadnode.configure()
without any token and your environment variables are not set, you’ll receive a warning in the console, so keep an eye out! You can still run any of your code without sending data to the Dreadnode Platform.
This code should be very familiar if you’ve used an ML-experimentation library before, and all the functions you’re familiar with work exactly like you would expect.
Under the hood, this code did a few things:
- Created a new “Default” project in the Platform to hold our run.
- Began a full OpenTelemetry trace for all code under
with dreadnode.run(...)
. - Tracked and stored our parameters and metrics alongside the tracing information.
- Delivered the data to the Platform for visualization.
You can open the Default project in a web browser to see your new run and the data you logged.
You’re free to call dreadnode.*
functions anywhere in your code, and you don’t have to worry about keeping track of your active run or task. Everything just works.
log_param()
: Track simple key/values to keep track of hyperparameters, target models, or agent configs.log_metric()
: Report measurements you take anywhere in your code.log_input()
: Save any runtime object which is thex
to yourf(x)
like prompts, datasets, samples, or target resources.log_output()
: Save any runtime object which is the result of your work like findings, commands, reports, or raw model outputs.log_artifact()
: Upload any local files or directories like your source code, container configs, datasets, or models.
Most of these functions will associate values with their nearest parent, so if you’re in a task the value will be associated with that task. If you’re just inside a run, the value with be associated directly with the run. You can override this behavior by passing to=...
to any of these methods.
Often you find yourself deep inside a function, writing a new if
statement, and think “I want to track if/when I get here”. It’s easy to add a dreadnode.log_metric(...)
right there and see it later in your data.
Core Concepts
Strikes is built around three core concepts: Runs, Tasks, and Metrics. Understanding these concepts will help you make the most of Strikes.
Runs
Runs are the core unit of work in Strikes. They provide the context for all your data collection and represent a complete execution session. Think of runs as the “experiment” or “session” for your code.
You can create multiple runs, even in parallel, to organize your work logically:
See the Runs page for more details on creating, configuring and managing runs.
Tasks
Tasks are units of work within runs. They help you structure your code and provide a finer-grained context for data collection. Tasks can be created as function decorators or context managers:
Tasks automatically track their inputs, outputs, execution time, and more. They form the foundation for building structured, observable workflows.
See the Tasks page for more details on task creation, configuration, and advanced patterns.
Metrics
Metrics are measurements of your system’s performance or behavior. They allow you to evaluate the effectiveness of your agents and track important events during execution:
Metrics can be associated with tasks, runs, or even specific objects in your system, providing a comprehensive view of performance at different levels.
See the Metrics page for more information on creating, aggregating, and analyzing metrics.
Short Examples
Building an Evaluation Dataset
This creates a comprehensive dataset with:
- Input samples
- Model responses
- Quality metrics
- Clear relationships between data
Agent Development Workflow
This tracks:
- Agent configuration as parameters
- Each command and its output
- Execution details
- Progress metrics over time
Next Steps
To learn about more advanced usage, explore the rest of our documentation:
- Working with Runs: Learn how to create and manage runs
- Working with Tasks: Discover how to structure your code with tasks
- Metrics and Measurement: Learn how to track and analyze performance
- Projects: Organize your runs into projects
- Data Tracking: Understand how data flows in Strikes
If you learn best through examples, check out any of the How To guides to view walkthroughs of practical use cases and commentary from the team.
You can also check out our dreadnode/example-agents repository for a collection of example agents and evaluation harnesses.