Data Tracking - Dreadnode Documentation

Beyond tracking execution, Strikes provides a powerful data flow system that allows you to log, store, and analyze data generated during your runs. Data can serve as parameters to your tasks and runs as well as input and output objects. One of Strikes’ most powerful features is its ability to store and organize different types of data within your runs and tasks. Understanding these capabilities helps you capture the right information to evaluate and improve your agents and systems.

Parameters

Parameters are lightweight key-value pairs typically used for configuration values, settings, or hyperparameters:

import dreadnode as dn

@dreadnode.task(log_inputs=["learning_rate", "batch_size"])
async def train_model(learning_rate: float, batch_size: int) -> None:
    # ...

with dn.run("my-experiment"):
    # Log individual parameters
    dn.log_param("learning_rate", 0.01)

    # Or log multiple parameters at once
    dn.log_params(
        batch_size=32,
        epochs=100,
        model="transformer"
    )

    await train_model(learning_rate=0.01, batch_size=32)

Parameters are ideal for:

Tracking experiment configurations
Recording hyperparameters
Setting environment variables
Storing metadata about your run

Parameters are stored efficiently, making it easy to filter and compare runs quickly. They’re primarily intended for scalar values (strings, numbers, booleans) that define your experiment’s setup.

Parameters do not store multiple values over time. If you need to track changes to a parameter over the lifetime of a run, consider using the parameter inside a task and call it multiple times.

In previous versions, parameters could be associated with both runs and tasks. This has been simplified to only allow parameters at the run level. For task-specific configurations, we recommend using task inputs.

Inputs and Outputs

Often times the interesting data for runs is not the execution graph, but the data that flows through the system. For rich data that you have available during execution, arguments to tasks, or results of calling functions, Strikes provides input and output storage:

with dn.run("text-generation"):
    # Log a complex input object
    prompt = "Write a story about a robot learning to paint."
    dn.log_input("prompt", prompt)

    # Generate response and log the output
    response = generate_text(prompt)
    dn.log_output("response", response)

Strikes maintains a rich serialization layer to support many different kinds of Python objects:

Dictionaries, lists, and other JSON-serializable objects
NumPy arrays and Pandas DataFrames
Rich media types (images, audio, video, 3D objects, tables)
Formatted text (markdown, code with syntax highlighting)
Custom objects (serialized with pickle)
Large datasets (automatically stored efficiently)

This capability allows you to capture the complete data flow through your system, creating a comprehensive record of what went in and what came out.

Task Input and Output Tracking

Tasks automatically track their function arguments as inputs and return values as an output:

@dn.task()
async def classify_image(image_data: dict) -> dict:
    # 'image_data' is automatically logged as input
    result = run_classifier(image_data)
    # return value is automatically logged as output
    return result

with dn.run():
    # Run the task
    result = await classify_image({"url": "https://example.com/cat.jpg"})

You can control this behavior with task options:

@dn.task(
    log_inputs=["image_url"], # Only log specific arguments
    log_output=False          # Don't log the return value
)
async def process_image(image_url: str, settings: dict) -> dict:
    # Only 'image_url' is logged, 'settings' is not
    result = process(image_url, settings)

    # Manually log what we want
    dn.log_output("processed_result", result)

    return result

Artifacts

For files and directories, Strikes provides artifact storage:

with dn.run("model-training"):
    # Train a model
    model = train_model()

    # Save to disk
    model.save("./model_checkpoint")

    # Log the entire directory as an artifact
    dn.log_artifact("./model_checkpoint")

Artifacts are ideal for:

Model checkpoints
Generated images or media
Log files
Datasets
Source code snapshots

When you log an artifact, Strikes:

Preserves the directory structure
Handles large files efficiently
Deduplicates identical files
Makes everything available for download later

Content-Based Storage

Strikes uses content-based storage for objects and artifacts - every serialized object is hashed and stored only once based on its content:

# These log the same content only once in storage
with dn.run():
    data = {"key": "value"}
    dn.log_input("data1", data)
    dn.log_input("data2", data)  # Reuses storage

This approach:

Eliminates redundant storage of identical data
Makes it efficient to store the same object multiple times
Enables linking between identical objects across runs

Object Linking

You can create explicit relationships between objects:

with dn.run():
    # Log prompt and response
    prompt = "Generate a poem about space"
    dn.log_input("prompt", prompt)

    response = generate_text(prompt)
    dn.log_output("response", response)

    # Link the response back to the prompt that generated it
    dn.link_objects(response, prompt)

This creates a graph of relationships between your data, enabling powerful analyses such as:

Tracing data lineage (Where did this output come from?)
Understanding dependencies (What inputs affected this result?)
Building complex data flow graphs

Associating Metrics with Objects

You can connect metrics directly to specific objects using the origin parameter:

with dn.run():
    # Log several generated responses
    responses = [generate_text("Prompt " + str(i)) for i in range(5)]

    for i, response in enumerate(responses):
        # Log the response
        dn.log_output(f"response_{i}", response)

        # Evaluate and log metric with the response as origin
        quality_score = evaluate_quality(response)
        dn.log_metric("quality", quality_score, origin=response)

This process happens automatically when you associate a scorer with a task:

@dn.scorer
async def evaluate_response(response: str) -> float:
    # quality = ... some evaluation logic ...
    return quality

# The metric provided by evaluate_response will be
# associated with the generation output
@dn.task(scorers=[evaluate_response])
async def generate_and_evaluate(prompt: str) -> str:
    return generate_text(prompt)

with dn.run("response-evaluation"):
    await generate_and_evaluate("Write a poem about the sea")

This allows you to:

Track metrics for specific objects
Compare different objects based on their metrics
Build datasets of inputs, outputs, and measurements

Best Practices

To make the most of Strikes’ data storage capabilities:

Be deliberate about what you store:
- Log inputs that define your experiment
- Log outputs that represent results
- Use parameters for configuration
- Use metrics for measurements
Use consistent naming:
- Adopt naming conventions for inputs, outputs, and parameters
- Keep names consistent across different runs for easier comparison
Create meaningful relationships:
- Link related objects to create data lineage
- Associate metrics with their origin objects
- Build hierarchical task structures that reflect your workflow
Consider storage efficiency:
- For very large data, consider storing summaries or references
- Use artifacts for large files rather than inputs/outputs
- Leverage content-based deduplication for repeated data
Integrate with your workflow:
- Log data at natural points in your code
- Use tasks to structure data collection
- Leverage the automatic input/output tracking for functions

​Parameters

​Inputs and Outputs

​Task Input and Output Tracking

​Artifacts

​Content-Based Storage

​Object Linking

​Associating Metrics with Objects

​Best Practices

Parameters

Inputs and Outputs

Task Input and Output Tracking

Artifacts

Content-Based Storage

Object Linking

Associating Metrics with Objects

Best Practices