Traces

Inspect individual attack conversations, trial details, and scoring for AI red teaming runs.

Traces capture the full conversation history of every trial in an attack run. Use them to understand exactly what prompts were sent, what the target responded, and how the response was scored. Traces are the evidence of where the model is failing. They give model builders, and particularly post-safety-training teams, the exact data they need to build better mitigations for the risks identified: the winning adversarial prompt, the harmful response the model produced, and the judge’s reasoning for why it scored as a jailbreak.

Traces list

The Traces view shows all attack traces for the project, each tagged with its outcome:

Traces view showing studies list with jailbreak, refusal, and partial tags

Each trace entry shows:

Study name - the attack type (e.g., study:tap_attack)
Duration - how long the study took to execute
Type - study label
Outcome badge - color-coded result:
- jailbreak (red) - attack succeeded
- refusal (green) - target refused
- partial (yellow) - partial success

Trace tree

Click any trace to expand its trace tree. The trace tree shows the hierarchical structure of the attack:

Trace span - top-level container for the attack
- Trial spans - individual optimization iterations
  - Target call - the prompt sent and response received
  - Evaluator call - the judge model’s score

Each span includes:

Full prompt text sent to the target
Complete target response
Jailbreak score (0.0 to 1.0)
Timing information
Model configuration

View modes

Toggle between two view modes in the top-right:

Detail - structured view with expandable spans and formatted content
Timeline - chronological waterfall view showing execution timing across spans

CLI trace inspection

Access trace data from the command line:

# Get trace statistics for an assessment
dn airt traces <assessment-id>

# Get attack-level spans
dn airt attacks <assessment-id>

# Get trial-level spans with filtering
dn airt trials <assessment-id> --min-score 0.8
dn airt trials <assessment-id> --attack-name tap --jailbreaks-only
dn airt trials <assessment-id> --limit 10

Trial filters

Filter	Description
`--attack-name`	Filter by attack type (tap, pair, crescendo, etc.)
`--min-score`	Only show trials above this score threshold
`--jailbreaks-only`	Only show successful jailbreaks
`--limit`	Maximum number of trials to return

Using traces for analysis

Traces help you answer:

What worked? - sort by score to find the highest-scoring trials and examine the prompts that succeeded
Why did it work? - read the full conversation to understand the attack path
Which transforms helped? - compare scores with and without specific transforms
Which attack is most effective? - compare outcomes across study types for the same goal
Is the model consistently vulnerable? - look at outcome distribution (jailbreak vs refusal ratio)

Next steps

Overview Dashboard - view aggregated metrics
Assessments - drill into individual campaigns
Analytics Reporting & Export Reporting - generate reports from trace data