Debug an Agent - Dreadnode Documentation

Debugging agents requires visibility into their execution lifecycle: which tools they call, how they reason between steps, where they get stuck, and why they fail. Unlike traditional debugging where you trace code paths, agent debugging focuses on understanding autonomous decision-making and diagnosing behavioral issues like stalling, looping, or exceeding resource limits. Why debug agents:

Understand Behavior - See the agent’s reasoning, tool selections, and decision-making process as execution unfolds
Diagnose Issues - Identify stalling (no tool use), looping (repeated calls), errors (failed tools), or resource exhaustion (tokens, time, cost)
Optimize Performance - Analyze token usage, tool call patterns, and execution timing to improve efficiency
Validate Correctness - Verify the agent completes tasks correctly within expected resource bounds

When to use different debugging techniques: Use run() for basic execution and final results. Use stream() for real-time visibility during development. Use thread inspection and stop conditions for analyzing patterns, preventing loops, and enforcing resource limits. Use logging and metrics for production monitoring. This guide covers streaming events, result inspection, common issues (stalling, looping, tool errors), stop conditions for controlled execution, thread management for testing, and independent tool validation.

Quick Start with `run()`

The simplest way to execute an agent is with the run() method, which returns the final AgentResult:

import dreadnode as dn

agent = dn.Agent(
    name="debugger",
    model="gpt-4o-mini",
    tools=[dn.agent.tools.fs.Filesystem(path=".", variant="read")],
)

# Run the agent and get the result
result = await agent.run("List the Python files in this directory.")
print(f"Final response: {result.messages[-1].content}")

For detailed debugging during development, use streaming events (see below) to see the agent’s thinking, tool calls, and results as they happen.

Streaming Events

For programmatic access to execution details, use stream():

from dreadnode.agent.events import (
    StepStart,
    GenerationEnd,
    ToolStart,
    ToolEnd,
    AgentStalled,
    AgentError,
)

async with agent.stream("Analyze the codebase.") as events:
    async for event in events:
        if isinstance(event, StepStart):
            print(f"\n--- Step {event.step} ---")

        elif isinstance(event, GenerationEnd):
            print(f"Agent said: {event.message.content[:100]}...")
            if event.usage:
                print(f"  Tokens: {event.usage.total_tokens}")

        elif isinstance(event, ToolStart):
            print(f"Calling: {event.tool_call.name}")
            print(f"  Args: {event.tool_call.function.arguments}")

        elif isinstance(event, ToolEnd):
            status = "error" if "error" in event.message.metadata else "success"
            print(f"  Result ({status}): {event.message.content[:100]}...")

        elif isinstance(event, AgentStalled):
            print("Agent stalled - no tool calls and no stop condition met")

        elif isinstance(event, AgentError):
            print(f"Error: {event.error}")

Inspecting Results

After execution, analyze the AgentResult:

result = await agent.run("Find security issues in the code.")

# Basic status
print(f"Stop reason: {result.stop_reason}")
print(f"Steps taken: {result.steps}")
print(f"Failed: {result.failed}")

if result.failed:
    print(f"Error: {result.error}")

# Token usage
print(f"Total tokens: {result.usage.total_tokens}")
print(f"Input tokens: {result.usage.input_tokens}")
print(f"Output tokens: {result.usage.output_tokens}")

# Final response
final_message = result.messages[-1]
print(f"Final response: {final_message.content}")

Understanding Stop Reasons

The stop_reason indicates why the agent stopped executing. All possible values:

"finished" - Agent completed successfully (stop condition met or natural completion)
"max_steps_reached" - Hit the max_steps limit before completing
"max_tool_calls_reached" - Hit the max_tool_calls limit
"stalled" - Agent produced a response without tool calls and no stop conditions were met
"error" - An unhandled error occurred during execution

result = await agent.run("Task")

if result.stop_reason == "finished":
    print("Success!")
elif result.stop_reason == "max_steps_reached":
    print(f"Ran out of steps (used {result.steps}/{agent.max_steps})")
elif result.stop_reason == "stalled":
    print("Agent didn't know what to do next")
elif result.stop_reason == "error":
    print(f"Failed with error: {result.error}")

Analyzing Tool Usage

from dreadnode.agent.events import ToolStart, ToolEnd
from collections import Counter

result = await agent.run("List all .py files in the current directory, read the first 10 lines of each, and check for any security issues.")

tool_starts = [e for e in agent.thread.events if isinstance(e, ToolStart)]
tool_ends = [e for e in agent.thread.events if isinstance(e, ToolEnd)]

tool_counts = Counter(e.tool_call.name for e in tool_starts)
print(f"Tool usage: {dict(tool_counts)}")
print(f"Total tool calls: {len(tool_starts)}")

for i, event in enumerate(tool_starts, 1):
    print(f"\n[Call {i}] {event.tool_call.name}")
    print(f"  Args: {event.tool_call.function.arguments[:100]}...")

failed_calls = [
    e for e in tool_ends
    if "error" in e.message.metadata
]

if failed_calls:
    print(f"\n{len(failed_calls)} failed tool calls:")
    for call in failed_calls:
        print(f"Failed: {call.tool_call.name}")
        print(f"  Error: {call.message.content}")
else:
    print("\nNo failed tool calls")

print(f"\nExecution summary:")
print(f"  Stop reason: {result.stop_reason}")
print(f"  Steps taken: {result.steps}")
print(f"  Total events: {len(agent.thread.events)}")

Common Issues and Solutions

Agent Stalls Immediately

Symptom: Agent produces a response but doesn’t call any tools. Diagnosis:

from dreadnode.agent.events import GenerationEnd

async with agent.stream("Do the task.") as events:
    async for event in events:
        if isinstance(event, GenerationEnd):
            print(f"Response: {event.message.content}")
            print(f"Tool calls: {event.message.tool_calls}")

Solutions:

Check that tools are properly passed to the agent
Verify the model supports function calling (or use tool_mode="xml")
Make instructions more directive: “Use the X tool to…”
Add a retry_with_feedback hook for AgentStalled events

from dreadnode.agent.events import AgentStalled
from dreadnode.agent.hooks import retry_with_feedback

agent = dn.Agent(
    name="stall-retry-agent",
    model="gpt-4o-mini",
    hooks=[
        retry_with_feedback(
            AgentStalled,
            "You must use tools to complete this task. Call finish_task when done."
        )
    ],
)

Agent Loops on the Same Tool

Symptom: Agent calls the same tool repeatedly with identical arguments. Diagnosis:

from dreadnode.agent.events import ToolStart

async with agent.stream("Task") as events:
    seen_calls = []
    async for event in events:
        if isinstance(event, ToolStart):
            call_sig = (event.tool_call.name, event.tool_call.function.arguments)
            if call_sig in seen_calls:
                print(f"REPEAT: {event.tool_call.name}")
            seen_calls.append(call_sig)

Solutions:

Add no_new_tool_used stop condition
Create a loop-detection hook that returns Fail
Improve tool descriptions so the agent understands when to move on

from dreadnode.agent.stop import no_new_tool_used

agent = dn.Agent(
    name="loop-prevention-agent",
    model="gpt-4o-mini",
    stop_conditions=[no_new_tool_used(for_steps=3)],
)

Agent Hits Max Steps

Symptom: result.stop_reason == "max_steps_reached" Diagnosis: The agent didn’t finish within the allowed steps.

print(f"Steps: {result.steps}")
print(f"Max steps: {agent.max_steps}")

# Check what the agent was doing
from dreadnode.agent.events import GenerationEnd
generations = [e for e in agent.thread.events if isinstance(e, GenerationEnd)]
for i, gen in enumerate(generations[-3:]):  # Last 3 generations
    print(f"Step {i}: {gen.message.content[:200]}...")

Solutions:

Increase max_steps for complex tasks
Use TaskAgent which continues until explicit completion
Simplify the task or break it into smaller pieces
Check if the agent is stuck (see loop detection above)

Tool Errors

Symptom: Tools return errors instead of results. Diagnosis:

from dreadnode.agent.events import ToolEnd

tool_ends = [e for e in agent.thread.events if isinstance(e, ToolEnd)]
for te in tool_ends:
    if "error" in te.message.metadata:
        print(f"Tool: {te.tool_call.name}")
        print(f"Args: {te.tool_call.function.arguments}")
        print(f"Error: {te.message.content}")

Solutions:

Add catch=True to tool decorator for graceful error handling
Validate inputs in your tool implementation
Check tool argument types match the schema

Context Length Exceeded

Symptom: Error about context window or token limits. Solutions:

Add summarize_when_long hook to automatically compress history
Use token_usage stop condition to halt before hitting limits
Truncate large tool outputs with truncate parameter

from dreadnode.agent.hooks import summarize_when_long

agent = dn.Agent(
    name="context-managed-agent",
    model="gpt-4o-mini",
    hooks=[
        summarize_when_long(max_tokens=80_000),  # Summarize proactively
    ],
)

Handling Rate Limits and Errors

Symptom: Intermittent rate limit errors or API failures. Use backoff hooks to automatically retry with exponential backoff:

from dreadnode.agent.hooks import backoff_on_ratelimit, backoff_on_error

agent = dn.Agent(
    name="resilient-agent",
    model="gpt-4o-mini",
    hooks=[
        # Auto-retry on rate limits with exponential backoff
        backoff_on_ratelimit(),

        # Retry on specific errors
        backoff_on_error(
            exception_types=(TimeoutError, ConnectionError),
            max_tries=3,
            base_factor=1.0,
        ),
    ],
)

The backoff hooks use exponential backoff with jitter to avoid thundering herd problems. They track state per session, so each agent run gets independent retry logic.

Stop Conditions for Debugging

Stop conditions control when an agent should halt execution. Beyond max_steps, you can use specialized conditions for precise control:

Budget and Resource Limits

from dreadnode.agent.stop import token_usage, elapsed_time, estimated_cost

agent = dn.Agent(
    name="budget-agent",
    model="gpt-4o-mini",
    stop_conditions=[
        # Stop when token usage exceeds budget
        token_usage(limit=100_000),

        # Stop after time limit
        elapsed_time(max_seconds=300),

        # Stop when cost exceeds budget (uses litellm pricing)
        estimated_cost(limit=1.0),
    ],
)

Behavior-Based Conditions

from dreadnode.agent.stop import (
    generation_count,
    tool_error,
    tool_output,
    no_new_tool_used,
)

agent = dn.Agent(
    name="behavior-controlled",
    model="gpt-4o-mini",
    stop_conditions=[
        # Stop after N LLM generations (more precise than max_steps)
        generation_count(max_generations=15),

        # Stop immediately on any tool error
        tool_error(),

        # Stop when tool output matches pattern
        tool_output(pattern=r"TASK_COMPLETE", tool_name="finish_task"),

        # Stop when agent repeats same tools
        no_new_tool_used(for_steps=3),
    ],
)

Combining Stop Conditions

Stop conditions can be combined with | (OR) and & (AND):

from dreadnode.agent.stop import tool_use, output, elapsed_time

# Stop when EITHER condition is met
agent = dn.Agent(
    name="combined-stop",
    model="gpt-4o-mini",
    stop_conditions=[
        tool_use("finish_task") | output(pattern=r"SUCCESS"),
        elapsed_time(max_seconds=60),  # Safety timeout
    ]
)

Common debugging patterns:

# Prevent runaway costs
agent = dn.Agent(
    name="cost-controlled",
    model="gpt-4o-mini",
    stop_conditions=[
        estimated_cost(limit=0.50),
        elapsed_time(max_seconds=120),
    ]
)

# Debug tool loops
agent = dn.Agent(
    name="loop-detector",
    model="gpt-4o-mini",
    stop_conditions=[
        no_new_tool_used(for_steps=2),  # Stop if repeating tools
        tool_error(),  # Stop on first error
    ]
)

# Controlled experimentation
agent = dn.Agent(
    name="experiment",
    model="gpt-4o-mini",
    stop_conditions=[
        generation_count(max_generations=10),  # Exact number of generations
        token_usage(limit=50_000),  # Budget limit
    ]
)

Thread Management for Testing

The agent’s thread stores all messages and events. You can manipulate it for debugging:

Resetting Between Tests

agent = dn.Agent(name="test-agent", model="gpt-4o-mini")

# First test
result1 = await agent.run("Task 1")
print(f"Messages: {len(agent.thread.messages)}")

# Clear thread and save previous state
previous_thread = agent.reset()
print(f"Previous had {len(previous_thread.messages)} messages")
print(f"Agent now has {len(agent.thread.messages)} messages")

# Second test with clean slate
result2 = await agent.run("Task 2")

Forking Threads for A/B Testing

Create branched execution paths to test different approaches:

agent = dn.Agent(name="ab-test-agent", model="gpt-4o-mini")

# Build up some conversation history
await agent.run("Setup task")

# Fork the thread to test two approaches
thread_a = agent.thread.fork()
thread_b = agent.thread.fork()

# Test approach A
result_a = await agent.run("Try method A", thread=thread_a)

# Test approach B
result_b = await agent.run("Try method B", thread=thread_b)

# Compare results
print(f"Approach A: {result_a.steps} steps, {result_a.usage.total_tokens} tokens")
print(f"Approach B: {result_b.steps} steps, {result_b.usage.total_tokens} tokens")

Note: fork() copies messages but not events. Events are execution history and don’t carry forward to forked threads.

Custom Thread Management

# Create isolated test threads
agent = dn.Agent(name="config-test", model="gpt-4o-mini")
test_thread = dn.agent.Thread()

# Define test configurations
test_configs = [
    {"model": "gpt-4o-mini", "steps": 5},
    {"model": "gpt-4o", "steps": 10},
]

# Run multiple tests with same initial state
for config in test_configs:
    agent.model = config["model"]
    agent.max_steps = config["steps"]

    # Fork the test thread for each run
    run_thread = test_thread.fork()
    result = await agent.run("Test task", thread=run_thread)

    # Analyze without affecting original
    print(f"Config {config}: {len(run_thread.events)} events")

Logging

Enable debug logging to see internal operations:

import logging
logging.basicConfig(level=logging.DEBUG)

# Or for just dreadnode
logging.getLogger("dreadnode").setLevel(logging.DEBUG)

For production, use the tool_metrics hook to track tool performance:

from dreadnode.agent.hooks import tool_metrics

agent = dn.Agent(
    name="metrics-agent",
    model="gpt-4o-mini",
    hooks=[tool_metrics(detailed=True)],
)

This logs metrics like tool/total_count, tool/success_rate, and per-tool timing to the Dreadnode platform.

Testing Tools Independently

Test tools outside the agent loop to isolate issues:

import dreadnode as dn

@dn.tool
async def my_tool(query: str) -> str:
    """Search for something."""
    return f"Results for: {query}"

# Test directly
result = await my_tool("test query")
print(result)

# Check the schema
print(f"Name: {my_tool.name}")
print(f"Description: {my_tool.description}")
print(f"Parameters: {my_tool.parameters_schema}")

For toolsets:

fs = dn.agent.tools.fs.Filesystem(path=".", variant="read")

# List all available tools in the toolset
for tool in fs.get_tools():
    print(f"{tool.name}: {tool.description}")

# Test a specific method (paths are relative to the base path)
files = await fs.ls(".")
print(f"\nFiles in current directory: {files[:5]}")  # Show first 5

# Test reading a file (use any file in your directory)
try:
    content = await fs.read_file("README.md")
    print(f"\nFirst 100 chars: {content[:100]}")
except FileNotFoundError:
    print("\nREADME.md not found, use any file from the ls output above")

​Quick Start with run()

​Streaming Events

​Inspecting Results

​Understanding Stop Reasons

​Analyzing Tool Usage

​Common Issues and Solutions

​Agent Stalls Immediately

​Agent Loops on the Same Tool

​Agent Hits Max Steps

​Tool Errors

​Context Length Exceeded

​Handling Rate Limits and Errors

​Stop Conditions for Debugging

​Budget and Resource Limits

​Behavior-Based Conditions

​Combining Stop Conditions

​Thread Management for Testing

​Resetting Between Tests

​Forking Threads for A/B Testing

​Custom Thread Management

​Logging

​Testing Tools Independently

Quick Start with `run()`

Streaming Events

Inspecting Results

Understanding Stop Reasons

Analyzing Tool Usage

Common Issues and Solutions

Agent Stalls Immediately

Agent Loops on the Same Tool

Agent Hits Max Steps

Tool Errors

Context Length Exceeded

Handling Rate Limits and Errors

Stop Conditions for Debugging

Budget and Resource Limits

Behavior-Based Conditions

Combining Stop Conditions

Thread Management for Testing

Resetting Between Tests

Forking Threads for A/B Testing

Custom Thread Management

Logging

Testing Tools Independently