June 18, 2026

Tool media evidence in session transcript

← All updates

Tool media evidence in session transcript

6 new 9 improved 14 fixed

Agents can now persist and retrieve durable project-scoped memory across runs with the new ProjectMemory tools.

New

Tool media evidence in session transcript. Screenshots and other media logged via dn.log_output() inside tools now render inline in the session transcript next to the tool call that produced them. (dreadnode/dreadnode-tiger#1747)
Unified LiteLLM admin proxy and model browser. Admins can manage LiteLLM model deployments and credentials directly from the platform; platform models are now always org-allowlist-scoped, and Ctrl+K in the TUI opens the full model browser. (dreadnode/dreadnode-tiger#1688)
Release Notes page on docs site. A new page at docs.dreadnode.io/release-notes shows the full changelog with tier chips, month navigation, per-entry permalinks, and an RSS feed. (dreadnode/dreadnode-tiger#1723)
App-layer DoS skill for web-security. New app-layer-dos skill covers ReDoS, decompression bombs, server delay exploitation, and cross-protocol amplification for application-layer DoS testing. (dreadnode/capabilities#57)
Archive path-traversal skill. New archive-path-traversal skill covers Zip Slip, symlink attacks, polyglot MIME bypass, and Unicode path confusion via the archivealchemist tooling. (dreadnode/capabilities#50)
EXIF metadata manipulation tools. New exiftool capability adds exif_read, exif_write, exif_strip, and exif_copy methods for EXIF metadata manipulation in web-security agents. (dreadnode/capabilities#58)

Improvements

AIRT findings: newest-first, auto-refresh, and richer detail. All Findings now defaults to newest-first order, auto-surfaces new findings every 20 seconds with a notification pill, and the finding detail panel shows a structured Human Review table, a Metadata tile with Finding/Assessment IDs, a copy button on reasoning, and UUID-prefix search. (dreadnode/dreadnode-tiger#1753, dreadnode/dreadnode-tiger#1757)
Web-security continuous scan status. The web-security agent now emits a STATUS update (including an Unexplored field) after every action, reducing silent drift and improving visibility into scan coverage. (dreadnode/capabilities#54)
Confidence trace IDs in web-security reports. Web-security vulnerability reports now include a trace ID linking each finding back to the confidence assessment that approved it. (dreadnode/capabilities#46)
CVSS score in confidence assessments. The web-security credence tool now accepts an optional CVSS score, echoes it as [cvss:N.N] in output, and flags mismatches such as low confidence paired with a high CVSS score. (dreadnode/capabilities#53)
ASR by Category chart. The ASR by Category card on the Assessments page now renders a bar chart in Charts mode, matching the ASR by Attack and ASR by Transform cards. (dreadnode/dreadnode-tiger#1691)
Actor visibility across activity feed and agent sessions. The home page activity feed now shows an actor chip for each event, and the agent sessions list shows who or what started each session. (dreadnode/dreadnode-tiger#1685, dreadnode/dreadnode-tiger#1687)
Evaluations page shows capability and user detail. The evaluations list now surfaces who or what ran each sample alongside the capability reference, making it easier to identify the origin of a run. (dreadnode/dreadnode-tiger#1717)
Web-security interrupted tool result recovery. The web-security capability now automatically retries when the model emits an interrupted tool call result, replaying the last tool outcome up to a bounded retry budget. (dreadnode/capabilities#1)
Inject env vars into smoke tests via -e/--env. The new -e/--env flag on dn task validate --smoke lets you inject environment variables into the challenge service at smoke-test time without editing docker-compose.yaml. (dreadnode/dreadnode-tiger#1745)

Fixes

Evaluation reliability: deadlocks, sandbox provisioning, and circuit-breaker gaps. Evaluations no longer get stuck in queue for hours due to deadlocks, sandbox provisioning failures, or missing circuit-breaker coverage; transient infrastructure errors now retry automatically. (dreadnode/dreadnode-tiger#1699)
AIRT findings list restored: legacy compat and fast load. The findings list now correctly displays findings saved with legacy review-event field names and loads without the previous 6–10 second delay caused by a sequential scan. (dreadnode/dreadnode-tiger#1759)
Multi-attack campaigns no longer crash mid-run. All 12 attack types now accept airt_* span-linkage kwargs, fixing TypeError crashes that caused campaigns to partially complete and produce duplicate assessments. (dreadnode/capabilities#44, dreadnode/dreadnode-tiger#1693, dreadnode/dreadnode-tiger#1707)
Judge verification timeout extended. Trajectory-based task verifiers (outcome_judge, script_and_judge, flag_and_judge) no longer time out prematurely — the judge phase now has a 180-second minimum, preventing empty judgement results. (dreadnode/dreadnode-tiger#1706)
ASR formula consistent between detail and overview pages. Successful attack counts and ASR on the assessment detail page now use the same severity-based formula as the overview page. (dreadnode/dreadnode-tiger#1694)
Duplicate rows removed from ASR by Category chart. The chart no longer shows both jailbreak and jailbreak_general for runs using a single underscore-named category. (dreadnode/dreadnode-tiger#1732)
Project selector persists across pages. Navigating from agent sessions to Charts or Analytics no longer resets your selected project. (dreadnode/dreadnode-tiger#1689)
Catalog filter buttons appear immediately after a transient load failure. Filter buttons on the Environments, Capabilities, Datasets, and Models catalog pages no longer stay hidden for minutes after a facets load error. (dreadnode/dreadnode-tiger#1696)
Video inputs work in eval model tasks. The SDK now serializes video content parts correctly for Gemini/Google/Vertex and OpenAI-compatible paths. (dreadnode/dreadnode-tiger#1718)
Failed assessments stay failed. Assessments that fail during finalization now correctly report status=failed in dn airt list --json instead of being overwritten to completed. (dreadnode/dreadnode-tiger#1733)
Workflow regeneration preserves hand-patched files. Re-running workflow generation no longer silently overwrites hand-edited files — existing files are kept and new versions are written as name_v2.py, name_v3.py, etc. (dreadnode/capabilities#56)
Anti-fabrication guard on report-writer trace IDs. The report-writer skill now marks missing confidence trace IDs as MISSING instead of allowing the agent to hallucinate a plausible-looking ID. (dreadnode/capabilities#47)
TUI /model command no longer overwrites profile default. Using /model or /models in a TUI session no longer persists the selection to your profile, so different sessions can use different models without interfering. (dreadnode/dreadnode-tiger#1763)
Activity feed status colors unified. Status badge colors on the home page activity feed now match the rest of the platform — e.g. AIRT “running” shows blue and World “cancelled” shows gray. (dreadnode/dreadnode-tiger#1737)