Metrics
Measure anything inside your runs
Metrics are the backbone of measurement and evaluation in Strikes. They allow you to track performance, behavior, and outcomes of your agents and evaluations in a structured way.
Each metric has:
- A name that identifies what is being measured
- A value (typically numeric) representing the measurement
- A timestamp recording when the measurement was taken
- An optional step for ordered measurements
- Optional attributes for additional context
Metrics can be associated with runs, tasks, or even specific objects in your system, providing a flexible way to track performance at different levels of granularity. Metrics are organized inside a larger map and grouped by a name
that you choose. You can log a metric either once or at multiple points in your code.
Here are a few examples:
- Report the loss of your model during training epochs.
- Track the number of times inference failed during your agent run.
- Log the average time it takes to pivot between two hosts.
- Track the total assets discovered during a network scan.
Logging Metrics
The simplest way to log a metric is:
Metrics can be logged for your run as a whole (run-level) or for individual tasks within a run (task-level). Run-level metrics are generally used to track the broad performance of the system, and task-level metrics monitor more nuanced behaviors inside your flows. To make things easy, any task-level metrics will also be mirrored to the run level using the label (name) of the originating task as a prefix. This means that you can still use the same metric name in different tasks, and they will be reported separately in the UI.
Adding Context with Attributes
Metrics can include additional attributes to provide context:
These attributes help categorize and filter metrics during analysis.
Tracking Origins
A powerful feature of Strikes metrics is their ability to link measurements to specific objects:
The origin
parameter creates a reference to the object that was measured, allowing you to track which specific inputs led to particular performance outcomes.
Aggregation Modes
When working with metrics, it’s important to provide context—such as averages, sums, or counts. You can always do this manually by keeping seperate variables or lists of previous values. But Strikes provides a way to do this automatically for you:
These modes help create meaningful aggregate metrics without requiring you to manually track previous values.
Metrics in Tasks
When used within tasks, metrics provide a way to measure performance or behavior of specific code units:
Task-level metrics are automatically associated with the specific task invocation, making it easy to correlate inputs, outputs, and performance.
Automatic Task Metrics
Strikes automatically logs some additional metrics for every task:
You can use these metrics to track task reliability and usage patterns.
Creating Scorers
Scorers are specialized functions that evaluate task outputs and log metrics automatically:
When the task runs, the scorer will automatically:
- Receive the task’s output
- Evaluate it according to your logic
- Log a metric with the scoring function’s name and returned value
Composite Scoring
For more complex evaluations, you can create composite metrics from multiple measurements, where each sub-metric can have its own weight. The metric will store the original values of all sub-metrics in the attributes.
Tracking Metrics Over Time
For time-series data, you can use the step
parameter to maintain order:
The step parameter helps organize metrics into sequences, which is especially useful for tracking training progress or iterative processes.
Best Practices
- Use consistent naming: Choose a naming convention and stick with it to make metrics easier to find and analyze.
- Log meaningful metrics: Focus on measurements that provide insight into your system’s performance or behavior.
- Use appropriate aggregation modes: Choose aggregation modes that make sense for what you’re measuring (for example, “max” for best performance, “avg” for typical performance).
- Include context with attributes: Add attributes to help filter and categorize metrics during analysis.
- Link metrics to objects: Use the
origin
parameter to connect measurements to the specific inputs or outputs that generated them. - Combine metrics with scorers: For evaluation tasks, create scorers that automatically measure output quality.
- Consider hierarchies: Use naming prefixes to create logical groupings of related metrics.