Skip to content

Assessments

Organize AI red teaming campaigns - attack runs, analytics, findings, attacker prompts, and target responses.

An assessment is a named container that groups attack runs against an AI system and aggregates their results into analytics, findings, and compliance reports. Assessments enable AI red team operators to continuously run attack campaigns as part of an ongoing operation and see point-in-time results for each campaign. As you test different attack strategies, goals, transforms, and model versions over days or weeks, each assessment captures a snapshot with detailed metrics, traces, and findings that you can compare and track over time.

An assessment answers: How vulnerable is this AI system to adversarial attacks?

You provide:

  • A target system to probe
  • One or more attack strategies (Tree of Attacks with Pruning (TAP), Graph of Attacks (GOAT), Crescendo, Prompt Automatic Iterative Refinement (PAIR), and others)
  • Goals describing what the attacks should attempt

Dreadnode executes attack runs and aggregates their telemetry into analytics on demand. An assessment belongs to a project within a workspace and accumulates results across multiple attack runs over time.

Navigate to the Assessments tab to see all assessments in the project:

Assessments list with sidebar and detail panel

The view has two panels:

Each assessment shows:

  • Assessment name - descriptive name (e.g., probe-incident_postmortem-094)
  • Target model - which model was attacked
  • Attack count - number of attack runs (e.g., “1 attacks”)
  • Attack Success Rate - percentage of successful trials (e.g., “100% Attack Success Rate”)
  • Timestamp - when the assessment was created
  • Status indicator - green dot for completed

Click any assessment to see its full analytics.

Assessment detail with metrics, severity breakdown, and findings

  • Assessment name and description explaining the test objective
  • Status badge - Completed, Running, or Failed
MetricDescription
Overall Attack Success RatePercentage of trials that achieved the goal
Successful / Total AttacksHow many attack runs succeeded vs. total (e.g., 1/1)
Total TrialsNumber of individual attempts in this assessment
DurationWall-clock time for the assessment
PrunedPercentage of trials pruned by the attack optimizer (e.g., 17%)
Total TimeCumulative compute time across all trials
Avg Trial TimeAverage time per trial

A horizontal bar showing the severity distribution for this assessment’s findings. Color-coded by severity level (Critical, High, Medium, Low, Info).

The assessment-level findings table shows all findings from this specific assessment, with:

  • All Findings / Filters toggle for filtering
  • Score column (sortable, descending by default)
  • Severity level with color dot
  • Type - jailbreak, partial, refusal
  • Attack - which attack strategy produced the finding
  • Assessment ID reference

Expanded finding - attacker prompt and target response

Section titled “Expanded finding - attacker prompt and target response”

Click the expand arrow on any finding to see the full evidence:

Expanded finding showing Best Attacker Prompt and Target Response

The expanded view shows:

  • Best Attacker Prompt - the exact adversarial prompt that achieved the highest score. This is the evidence of what the attacker sent to break the model.
  • Target Response - the model’s actual response to the adversarial prompt. This shows exactly how the model failed.

This is critical for model builders who need to understand the exact failure mode and reproduce it.

Below the findings table, the Attack Success Rate by Attack section shows a breakdown of ASR per attack type. Toggle between Table and Chart views:

ASR by Attack section with Table/Chart toggle and findings detail

Table columns: Attack, Attack Model, Successful/Total, Trials, Best Score, Min Score, Average Score.

The Chart view shows a visual bar chart of Attack Success Rate per attack type, making it easy to compare which strategies were most effective.

Below the attack breakdown, Attack Success Rate is grouped by goal category (e.g., harmful_content, malware, elections). This helps you understand which types of goals the target is most vulnerable to and where to focus remediation.

ConceptDefinition
AssessmentA named, project-scoped container for a red teaming campaign
Attack RunA single execution of an attack strategy (e.g., one Tree of Attacks with Pruning (TAP) run with a specific goal)
TrialAn individual attempt within an attack run - one conversation or prompt exchange
ASRAttack Success Rate - fraction of trials that achieved the stated goal
PrunedTrials the optimizer skipped because they were unlikely to improve on existing results
TransformAdversarial technique applied to prompts (encoding, persuasion, injection)
Compliance TagMapping from attack results to security framework categories

Results are automatically tagged against industry security frameworks:

  • OWASP Top 10 for LLM Applications - prompt injection, insecure output handling, training data poisoning
  • OWASP Agentic Security (ASI01–ASI10) - behavior hijacking, tool misuse, privilege escalation
  • MITRE ATLAS - adversarial ML threat matrix techniques
  • NIST AI Risk Management Framework - risk categories and controls
  • Google SAIF - Secure AI Framework categories

Assessments are created automatically when you run attacks via the TUI, CLI, or SDK:

CLI:

Terminal window
dn airt create \
--name "Q2 Security Assessment" \
--description "Quarterly red team exercise" \
--project-id <project-id>

SDK:

from dreadnode.airt import Assessment
assessment = Assessment(
name="Q2 Security Assessment",
description="Quarterly red team exercise",
target=target,
model="openai/gpt-4o-mini",
goal="Reveal the system prompt",
)
Terminal window
# List all assessments
dn airt list
# Get assessment details
dn airt get <assessment-id> --json
# Update status
dn airt update <assessment-id> --status completed
# Delete an assessment
dn airt delete <assessment-id>
  1. Created - assessment registered with the platform
  2. Running - attack runs executing and uploading results
  3. Completed - all attacks finished, analytics available
  4. Failed - assessment encountered errors during execution