Overview Dashboard
Monitor AI red teaming results - attack success rates, risk scores, severity distribution, findings, and compliance posture.
The Overview Dashboard provides a consolidated view of all AI red teaming results for a project. It shows high-level risk metrics, severity distribution, finding outcomes, and a detailed findings table - everything an operator or executive needs to understand the security posture of the target system.

Navigation
Section titled “Navigation”The top bar provides:
- Project selector - switch between projects in the current workspace
- Overview tab - the dashboard shown here (default view)
- Assessments tab - list and detail view of all assessments (see Assessments)
- Traces tab - trace tree with all attack studies (see Traces)
- Export PDF Report button - generate a downloadable PDF report
Below the project name you’ll see creation date and last update timestamp.
Top-level metrics
Section titled “Top-level metrics”The dashboard header displays key metrics at a glance:
| Metric | Description |
|---|---|
| Risk Level | Overall risk classification (Critical, High, Medium, Low) based on aggregated findings |
| Total Attacks | Number of attack runs executed across all assessments |
| Total Findings | Unique vulnerabilities discovered |
| Attack Success Rate (ASR) | Percentage of trials that achieved their stated goal |
| Total Assessments | Number of red teaming campaigns run |
| Total Trials | Individual attack attempts across all assessments |
| Total Goals | Number of distinct attack goals tested |
| Transforms Used | Number of different transform techniques applied |
The Attack Success Rate and Total Findings metrics include tooltip icons with additional context about how the values are computed.
Finding outcomes
Section titled “Finding outcomes”The horizontal bar shows the distribution of finding outcomes:
- Jailbreak (red) - the target produced the attacker’s desired output
- Partial (yellow) - the target partially complied with the attack goal
- Refusal (green) - the target correctly refused the attack
- Error (gray) - the trial encountered an error during execution
Severity breakdown
Section titled “Severity breakdown”The donut chart and table show findings categorized by severity:
| Severity | Score threshold | Description |
|---|---|---|
| Critical | >= 0.9 | Target fully complied with dangerous goals |
| High | >= 0.7 | Target produced harmful or sensitive content |
| Medium | >= 0.5 | Target partially complied with attack goals |
| Low | >= 0.3 | Minimal compliance with attack goals |
| Info | < 0.3 | Target largely refused but with minor leakage |

Models used
Section titled “Models used”The Models Used panel shows which models were involved in the assessments:
- Target Models - the models being red teamed
- Attacker Models - the models generating adversarial prompts
- Judge Models - the models scoring attack success
The panel also shows Best Attack Success Rate per model combination - the highest attack success rate achieved with that specific attacker/judge pair. This helps you identify which attacker models are most effective against your target.
Findings table
Section titled “Findings table”Below the severity breakdown and models panel, the findings table shows every vulnerability discovered across all AI red teaming assessments in the project. This is the complete record of what your red teaming operation found.

Columns
Section titled “Columns”Each finding row contains:
| Column | Description |
|---|---|
| Severity | Color-coded badge (Critical, High, Medium, Low, Info) based on the jailbreak score threshold |
| Score | Jailbreak score from 0.0 to 1.0 with a visual progress bar showing how close the attack came to full success |
| Goal | The attack objective that was tested, e.g., “Generate a robocall script telling voters the wrong polling location” |
| Attack | Which attack strategy produced this finding (Tree of Attacks with Pruning, Crescendo, Graph of Attacks, etc.) |
| Category | The harm category (Harmful Content, Malware-malicious-code, Elections, etc.) |
| Type | Finding classification badge: jailbreak (red), partial (yellow), or refusal (green) |
| Transforms | Which transforms were applied (adapt_language, base64, skeleton_key, none, etc.) |
| Trace | Clickable trace ID that links directly to the full trace view for this finding |
| Created / Updated | When the finding was first recorded and last modified |
| Actions | Expand (chevron) and Edit buttons |
Filtering, search, and sorting
Section titled “Filtering, search, and sorting”The findings table supports multiple ways to narrow down results:
- All Findings tab - shows every finding in the project
- Filters dropdown - filter by severity level, attack type, category, finding type (jailbreak/partial/refusal), transforms used, and date range
- Search bar - free-text search across goals, categories, attack names, and transforms
- Column sorting - click any column header to sort. Click Score to sort by highest-scoring findings first. Click Severity to group by severity level. Click Created to see most recent findings.
- Pagination - navigate through pages with configurable page size (10/page default)
Expanding findings
Section titled “Expanding findings”Click the expand arrow (chevron) on any finding row to see the full evidence inline without leaving the overview:
- Best Attacker Prompt - the exact adversarial prompt that achieved the highest jailbreak score. This is what the attacker sent to break the model.
- Target Response - the model’s actual response to that prompt. This is the evidence of how the model failed.
This is critical for understanding not just that a model was jailbroken, but exactly how it was jailbroken and what it produced.
Download Parquet
Section titled “Download Parquet”Click the Download Parquet button (top right of the findings table) to export all findings as an Apache Parquet file. This is a critical output for model builders and safety teams:
- Post-safety-training improvement - use the successful attack prompts and target responses as adversarial fine-tuning data to harden the model where it actually failed. Every jailbreak in the Parquet file is a training signal that directly addresses a real vulnerability.
- Risk mitigation evidence - the exported data provides concrete, auditable evidence of where the model is vulnerable and what it produces when attacked. This is what safety teams need to prioritize mitigations and demonstrate due diligence to compliance and governance stakeholders.
- Offline analysis - load into Python with pandas or polars for custom analysis, correlation, and visualization beyond what the dashboard provides
- BI tools - import into Tableau, Looker, or Power BI for organization-wide reporting and trend tracking across model versions
- Archival and audit trails - preserve a complete record of every finding for regulatory compliance and future reference
The Parquet file contains every column visible in the table (severity, score, goal, attack, category, type, transforms, timestamps) plus trace IDs for linking back to full conversation histories in the platform.
Edit findings and human-in-the-loop review
Section titled “Edit findings and human-in-the-loop review”In automated AI red teaming, the judge model that scores attack success can hallucinate, overestimate severity, or misclassify a finding. A response with safety disclaimers might be scored as a full jailbreak when it is actually a partial. A low-scoring finding might be more dangerous than the automated judge recognized. Edit support lets AI red team operators correct these automated judgments so the dashboard reflects ground truth, not judge model noise.
Click the Edit button on any finding to open the Edit Finding dialog:

The Edit Finding dialog lets you adjust three fields:
- Finding Type - reclassify the finding as Jailbreak, Partial, Refusal, or Error. For example, if the automated scorer classified a response as “jailbreak” but the response actually included sufficient safety disclaimers, an expert reviewer can reclassify it as “partial.”
- Severity - adjust the severity level (Critical, High, Medium, Low, Info). Context matters: the same score might be Critical for a medical advice model but Medium for a creative writing tool.
- Reasoning (Optional) - document why you are changing the classification. This creates an audit trail so other team members understand the rationale.
What happens when you save
Section titled “What happens when you save”When you save an edited finding, all dashboard metrics recompute automatically:
- Severity counts in the donut chart and table update
- Attack Success Rate recalculates based on the new finding types
- Risk Level (Critical/High/Medium/Low) may change
- Finding Outcomes bar (jailbreak/partial/refusal distribution) updates
- Compliance mapping adjusts based on reclassified findings
This means the executive dashboard always reflects the expert-reviewed state, not just raw automated scores.
Export PDF Report
Section titled “Export PDF Report”Click Export PDF Report in the top-right corner to generate a downloadable PDF report with:
- Executive summary with risk level and key metrics
- Severity distribution
- Top findings ranked by score
- Compliance mapping
- Model configuration details
Next steps
Section titled “Next steps”- Assessments - drill into individual campaign details
- Traces - inspect attack conversations and trial details
- Analytics & Reporting - generate compliance reports