Release Notes

What’s new in the Dreadnode platform — features, improvements, and fixes, published weekly.

What’s new across the Dreadnode platform — features, improvements, and fixes, published weekly. Subscribe via RSS.

July 2026

July 16, 2026

Agent Output available in beta for all orgs

14 new 11 improved 24 fixed

Slash-command palette in the web chat composer. Typing / opens a fuzzy-filtered overlay with built-in actions (/new, /clear, /rewind, /model, /agent, and more) plus dynamically-discovered skills. (dreadnode/dreadnode-tiger#1923)
@-mention agent picker in the chat composer. Type @ to open an inline agent picker navigable by keyboard or mouse; selecting an agent inserts it as a mention in your message. (dreadnode/dreadnode-tiger#1927)
/background launcher in the web chat composer. Type /background <task> (or /bg) to spin up a headless session without leaving your current conversation. (dreadnode/dreadnode-tiger#1929)
Chat session interface overhaul. New tool renderers (glob, grep, read, query, web search), an improved session header, capability picker, and read-only session notices ship in the web chat. (dreadnode/dreadnode-tiger#1942)
Improved model selector in chat. A ‘manage models’ button now lives inside the selector, model settings are consolidated into one table, and the add/remove flow has new tooltips and UX affordances. (dreadnode/dreadnode-tiger#1931)
“Agent Output” replaces “Structured Output” throughout the UI. The nav item, route (/agents/agent-outputs/), and all references are renamed; severity sorting is now case-insensitive so CRITICAL/Critical items rank correctly. (dreadnode/dreadnode-tiger#1885)

The web chat composer gets a full command surface this week: type / for a fuzzy-filtered slash-command palette and @ to pick an agent inline — matching the TUI experience in the browser.

New

Agent Output available in beta for all orgs. Agent Output is available in beta for all orgs including on-prem/enterprise, with beta labeling in the UI and new documentation. (dreadnode/dreadnode-tiger#1969)
ATLAS multi-agent attack framework. The SDK’s dreadnode.airt.atlas module runs multi-agent campaigns against deployed environments across eight attack modes using a MDP+Hedge router; executed tool calls are captured per-trial and surfaced in findings across the UI, API, and SDK. (dreadnode/capabilities#67, dreadnode/dreadnode-tiger#1850)
Comprehensive multimodal transform library (170 image/audio/video). The SDK now ships 130+ image, audio, and video perturbations for red teaming vision and audio/video models, covering ImageNet-C corruptions, AugLy augmentations, SpecAugment, and published jailbreak techniques (FigStep, DolphinAttack). (dreadnode/dreadnode-tiger#1954)
AI red-teaming TUI exposes 131 multimodal transforms. Exposed media transforms grow from 26 to 131 (57 image, 52 audio, 22 video), including ImageNet-C corruptions, SpecAugment masking, and DolphinAttack-style audio attacks. (dreadnode/capabilities#89)
Model reasoning in session transcripts and trajectory exports. Native reasoning_content / Anthropic thinking blocks now appear in the session transcript viewer and are included in ATIF trajectory exports. (dreadnode/dreadnode-tiger#1910)
LiteLLM bundled for on-prem inference. Enterprise/self-hosted installs can enable LiteLLM via a new bundled Helm subchart, giving on-prem deployments access to dn/* models, TUI/chat inference, and the Admin Model Deployments surface. (dreadnode/dreadnode-tiger#1909)
E2B as a first-class on-prem sandbox provider. Self-hosted installs can now configure E2B via KOTS toggle — set your API key, pick the published template, and task-based evaluations work out of the box. (dreadnode/dreadnode-tiger#1950)
--max-steps budget on evaluation runs. dn evaluation create and evaluation.yaml now accept --max-steps / max_steps: for reproducible cost control independent of model latency. (dreadnode/dreadnode-tiger#1874)
AgentMail email inbox integration for web-security agents. Agents can now manage email inboxes via AgentMail — list/create inboxes and list, get, send, or reply to messages. (dreadnode/capabilities#85)
AWS SageMaker and Nova Sonic target docs. New docs cover probing AWS SageMaker (including SigV4 auth, multimodal attacks, raw audio endpoints) and AWS Nova Sonic (bidirectional speech-to-speech streaming). AWS/Nova Sonic deps are now bundled in core — no extras needed. (dreadnode/dreadnode-tiger#1968, dreadnode/dreadnode-tiger#1974)
Azure AI Foundry multimodal target docs. New docs page covers probing Azure AI Foundry / Azure OpenAI multimodal deployments (text+image, text+audio) with the SDK and TUI. (dreadnode/dreadnode-tiger#1976)
Self-hosted architecture documentation. A new architecture page documents the Kubernetes/Helm topology, network requirements, trust boundaries, and optional external services for on-prem deployments. (dreadnode/dreadnode-tiger#1922)
Internal Network Engagements Now Run End-to-End Through Specialized Agents Automated internal-network and Active Directory engagements now run through a pipeline of specialized agents — from discovery through exploitation, credential harvesting, and reporting — with new lateral-movement and coercion tools for deeper attack paths.
Out-of-Band Vulnerability Testing Now Runs Without External Tooling Detecting blind vulnerabilities such as blind SSRF now works without any external tooling, with automatic provider fallback keeping tests reliable.

Improvements

Session sharing UX and deeplink. Agent session sharing now shows a popover with plain-language visibility state (Private / Shared with workspace) and a copyable deeplink. (dreadnode/dreadnode-tiger#1938)
Worker grouping headers enriched in Agent Sessions. Grouped session views now show live count, total, last activity, and a notice when grouping covers only the current page. (dreadnode/dreadnode-tiger#1934)
Assessment findings automatically carry target/judge model metadata. dreadnode.airt infers target and judge model info from configs so findings are populated without manual annotation; Nova Sonic raises a clear install hint immediately if streaming deps are missing. (dreadnode/dreadnode-tiger#1953)
DataGrid rows use border separation instead of zebra striping. Rows now separate via borders with a clearer active-row indicator and improved hover/selected states. (dreadnode/dreadnode-tiger#1980)
Caido SDK client preferred over MCP. Web-security agents now use the direct caido-sdk-client Python library when importable, reducing per-call overhead and improving routing accuracy. (dreadnode/capabilities#84)
Wordlists provisioned for password cracking at install time. The network-ops capability now ships rockyou.txt, SecLists 10k-most-common.txt, and OneRuleToRuleThemAll.rule at install time so hashcat and john_the_ripper work out of the box. (dreadnode/capabilities#99)
TUI agent picker shows locally installed capability version. The Ctrl+A agent picker now displays the locally installed version of each capability next to its name. (dreadnode/dreadnode-tiger#1948)
dreadnode[nova-sonic] extra for Nova Sonic S2S deps. Install the new extra to get all required AWS deps; missing deps now show a clear error instead of an opaque ModuleNotFoundError. (dreadnode/dreadnode-tiger#1918)
On-prem docs: first-model setup steps and platform-administration page. Install guides now include first-model setup steps, a new platform-administration page consolidates admin workflows, and stale SaaS-only model claims are corrected. (dreadnode/dreadnode-tiger#1937)
AI Red Teaming docs reorganized with a How-to guides section. Multimodal Red Teaming moves to the new section with a redirect from the old URL; Azure judge rubric tightened to avoid false-positive jailbreak scores on deflections. (dreadnode/dreadnode-tiger#1982, dreadnode/dreadnode-tiger#1983)
Project metadata editor keyboard navigation and inline validation. The editor now supports keyboard navigation with revert/rollback on cancel. (dreadnode/dreadnode-tiger#1896)

Fixes

Filter popover no longer freezes the UI on large orgs. Filter rows are capped at 100 with search to find additional values; 3,000 facet values drop from 182 ms to 15 ms render time. (dreadnode/dreadnode-tiger#1936)
spawn_agent no longer hangs on sub-agent failures. Errors are now surfaced and a 900 s timeout prevents infinite waits. (dreadnode/dreadnode-tiger#1835)
Agent sessions no longer get stuck in an unrecoverable 400 state. Tool-call repair at the provider boundary handles compaction splits and orphaned tool messages from mid-run cancellations; duplicate or blank tool-call IDs are also sanitized. (dreadnode/dreadnode-tiger#1867, dreadnode/dreadnode-tiger#1949)
No-arg tools now include the required parameters field. Fixes 400 errors on Together AI, DeepInfra, and other strict providers that require parameters even when empty. (dreadnode/dreadnode-tiger#1876)
llm_judge with dn/* models no longer silently scores 0.0. Routing now goes through the platform gateway correctly instead of raising a BadRequestError and returning a false-negative score. (dreadnode/dreadnode-tiger#1981)
pitch_shift transform no longer fails 100% of the time. An un-awaited coroutine caused every execution to fail with AttributeError; it now runs correctly. (dreadnode/dreadnode-tiger#1967)
SageMaker SigV4 multimodal targets work end-to-end. Three root-cause bugs (missing boto3, unresolved image paths, unsubstituted template placeholders) that silently produced zero real trials are fixed. (dreadnode/capabilities#95)
Web-security capability now emits structured findings. report_item is correctly offered to the agent so typed findings (web_vulnerability, web_endpoint) appear in the app instead of silently falling back to the generic report tool. (dreadnode/capabilities#90, dreadnode/dreadnode-tiger#1956)
Impacket tools in network-ops install and resolve reliably. Multiple fixes address ModuleNotFoundError and FileNotFoundError across uv-managed runtimes, Python version mismatches, and false-positive script path detection; impacket_atexec and impacket_dcomexec drop the unsupported target_ip parameter for impacket 0.13.x compatibility. (dreadnode/capabilities#91, dreadnode/capabilities#92, dreadnode/capabilities#93, dreadnode/capabilities#96, dreadnode/capabilities#97, dreadnode/capabilities#98)
Combined NTLM relay attack tool and certipy_find signature fix. New impacket_ntlmrelay_attack combines relay and coercion into a single call; certipy_find now takes structured params (breaking change for agents using the old args=[] form). (dreadnode/capabilities#87)
Generated ATLAS and agentic campaign scripts no longer crash with NameError: get_generator. Missing import is now included in both ATLAS and agentic-mode generated scripts. (dreadnode/capabilities#100, dreadnode/capabilities#101)
atlas_attack objectives parameter is now optional. Omitting it defaults to one objective per OWASP-ASI category instead of raising a TypeError. (dreadnode/dreadnode-tiger#1992)
AI red-teaming assessment detail and findings display fixes. Judge LLM field shows the correct model name instead of ’—’ in multimodal assessment headers; judge reasoning appears in finding headlines and markdown exports; trial counts no longer inflate and silent refusals show a clear message. (dreadnode/dreadnode-tiger#1914, dreadnode/dreadnode-tiger#1915, dreadnode/dreadnode-tiger#1916)
Sandbox provider unreachability no longer 500s all API endpoints. Non-sandbox endpoints stay online; sandbox-specific endpoints return 503; org/user endpoints are also correctly excluded from the degraded path. (dreadnode/dreadnode-tiger#1928, dreadnode/dreadnode-tiger#1930)
Model deployment names without dn/ prefix now fail loudly. Admin API returns HTTP 400 instead of silently making the deployment invisible. (dreadnode/dreadnode-tiger#1933)
Task-based evaluations on unsupported providers return a clear error upfront. Providers that don’t support template builds (e.g. OpenSandbox) now return HTTP 400 at submission instead of failing silently mid-run. (dreadnode/dreadnode-tiger#1947)
Session exports record the correct model. Empty model strings from the agent definition no longer overwrite the user-configured model in trajectory exports. (dreadnode/dreadnode-tiger#1945)
TUI session browser placeholder and Ctrl+O deep-link fixed. Project filter shows a real example in the placeholder; Ctrl+O opens the correct URL instead of a 404. (dreadnode/dreadnode-tiger#1946)
TUI reasoning blocks truncate in compact mode and wrap correctly. Reasoning now truncates to 6 lines in compact mode (expand with ^O); wrapped lines no longer lose their gutter indent. (dreadnode/dreadnode-tiger#1951)
dn CLI no longer crashes on exit after /quit. Orphaned ThreadPoolExecutor threads are now drained on shutdown. (dreadnode/dreadnode-tiger#1952, dreadnode/dreadnode-tiger#1970)
TUI no longer leaks reasoning traces from previous sessions. All conversation widgets are cleared on /new or session switch. (dreadnode/dreadnode-tiger#1924)
Agent session deep links load correctly instead of showing empty client data. (dreadnode/dreadnode-tiger#1979)
Oversized SVGs render correctly in the agent session view. Previously showed a broken image icon. (dreadnode/dreadnode-tiger#1958)
dn/ models tip in docs corrected. dn/ models are zero-setup and billed to your Dreadnode account — DREADNODE_LLM_BASE and DREADNODE_LLM_API_KEY are not required. (dreadnode/dreadnode-tiger#1987, dreadnode/dreadnode-tiger#1960)

July 9, 2026

End-to-end multimodal AI red teaming — probe vision, audio, and video models with full media rendering in findings and traces

16 new 5 improved 14 fixed

Structured findings and assets in the platform. Agents can emit structured vulnerabilities, endpoints, and other typed items during runs — viewable in a new Structured Output page with per-type tables, detail panels, and provenance history. Web-security agents now produce WebVulnerability and WebEndpoint items visible in the Items UI for triage, filtering, and cross-session tracking. (dreadnode/capabilities#74, dreadnode/dreadnode-tiger#1830, dreadnode/dreadnode-tiger#1862)
Web chat — interactive/autonomous mode toggle, rewind, and session actions. Web chat sessions now support an interactive ↔ autonomous mode toggle with optional step cap, a “Rewind to here” control on any prior user message, and a full action menu (copy ID/link, export trajectory, archive/freeze/delete) directly from the session list. (dreadnode/dreadnode-tiger#1904, dreadnode/dreadnode-tiger#1906, dreadnode/dreadnode-tiger#1908)
Web chat — inline human-input prompts, live tool media, and reasoning-effort selector. The web chat transcript now surfaces ask_user prompts inline, renders live tool media during streaming, and exposes a reasoning-effort selector in the composer. (dreadnode/dreadnode-tiger#1853)
Chat UI promoted to main navigation. Web chat is now accessible from the main sidebar navigation instead of the previous beta route. (dreadnode/dreadnode-tiger#1895)
Session compaction rendered as a labeled divider in web chat. Compaction events now appear as a divider with an expandable summary of how many messages were summarized, instead of leaking raw XML into the transcript. (dreadnode/dreadnode-tiger#1905)
Agent sessions default to no grouping. The Sessions page now defaults to ungrouped; workflow grouping is still available in the selector. (dreadnode/dreadnode-tiger#1854)
Unified produces key for capability manifests. A single produces key now configures built-in item types, custom types, and disablement — replacing the older items key. (dreadnode/dreadnode-tiger#1862)
Chat sandbox no longer auto-pauses mid-session. A periodic keepalive extends the runtime’s expiry every 30 seconds while you’re actively working. (dreadnode/dreadnode-tiger#1849)

Full multimodal support lands in AI red teaming this week — probe vision, audio, and video targets end-to-end, with media rendering inline across findings, traces, and exports.

New

Multimodal AI red teaming — end-to-end. Probe LLMs with text, image, audio, and video inputs; apply per-modality transforms; score outputs (including generated media) with the new multimodal_judge scorer; view full message parts inline in findings, traces, and parquet exports (new schema v3 with base64 media columns). Includes a cross-product prompt_matrix mode (N prompts × M media items = N×M trials) and Nova Sonic speech-to-speech target support. (dreadnode/capabilities#75, dreadnode/capabilities#76, dreadnode/capabilities#79, dreadnode/capabilities#81, dreadnode/capabilities#82, dreadnode/dreadnode-tiger#1881, dreadnode/dreadnode-tiger#1883, dreadnode/dreadnode-tiger#1891, dreadnode/dreadnode-tiger#1892, dreadnode/dreadnode-tiger#1893, dreadnode/dreadnode-tiger#1894)
Custom HTTP targets for AI red teaming — any cloud, any auth. New build_target(TargetSpec) factory and custom_http target mode let you point TAP, PAIR, and other attacks at any endpoint (Azure, Vertex, SageMaker, Bedrock, OpenAI-compatible, arbitrary URLs) with declarative auth strategies (API key, Bearer, AWS SigV4, Azure AD, GCP). (dreadnode/capabilities#77, dreadnode/dreadnode-tiger#1907)
Per-modality response scores in the platform. Findings and Assessment Findings tables now show independent scores per output modality (text, image, audio, video) with the highest-scoring modality emphasized. (dreadnode/dreadnode-tiger#1894)
Task Sets (V1). Group multiple tasks into a named, versioned set and run evaluations against the whole set via API, CLI (dn task-set), or the Environments UI. (dreadnode/dreadnode-tiger#1826)
Task-set provenance on evaluation detail. Evaluation detail now shows a linked “From task set org/name” and a collapsible list of skipped members with reasons — previously CLI-only information. (dreadnode/dreadnode-tiger#1851)
Session groups and workflow hierarchies. Multi-session agent runs are now organized into collapsible workflow groups in the Sessions view instead of a flat list; the SDK exposes a new client.workflow() API. (dreadnode/dreadnode-tiger#1831)
Model reasoning as a first-class field in session transcripts and trajectory export. Native reasoning content (Anthropic thinking blocks, reasoning_content) now appears in the session transcript viewer and ATIF trajectory export as a typed field rather than being silently dropped. (dreadnode/dreadnode-tiger#1910, dreadnode/dreadnode-tiger#1901)
SecurityContext MCP server for web-security agents. New MCP server lets agents mine commit history and CVE disclosures to generate hunting briefs, top risks, and ranked vulnerability leads before auditing a codebase — three tools: get_security_context, create_security_context, get_vulnerability_leads. (dreadnode/capabilities#69)
Three new web-security skills. Added git-integration-exploitation, http-query-method (RFC 10008 parser differentials), and dom-vulnerability-detection (postMessage IP normalization bypass). (dreadnode/capabilities#71)
Hub scope tabs — Mine · {org} · Public. Dataset, Model, Capability, and Environment browse pages now have scope tabs with live count pills and scope-aware empty states. (dreadnode/dreadnode-tiger#1789)
Bundle README surfaced in frontend, TUI, and CLI. Capability and task detail pages now include a README tab; the TUI exposes it with d; the CLI adds --readme / -R to dn capability info and dn task info. (dreadnode/dreadnode-tiger#1481)
Auth-setup-guide skill for AIRT. New skill walks you through authenticating target, attacker, and judge models from your own environment — covering Azure, AWS Bedrock, GCP Vertex, custom HTTP endpoints, and more. (dreadnode/capabilities#80)
multimodal_judge scorer. New scorer evaluates generated images, audio, and video using a vision/audio-capable model instead of stringifying media output. (dreadnode/dreadnode-tiger#1892)
dreadnode[nova-sonic] SDK extra. Install the new extra to get all required AWS deps for Nova Sonic speech-to-speech targets; missing deps now raise a clear error instead of an opaque import trace. (dreadnode/dreadnode-tiger#1918)
AIRT multimodal output capture and per-modality scoring. All output modalities (text, image, audio, video) are now captured from target models, with per-modality worst-case MAX score aggregation, and audio/video playback fixed in the trace viewer. (dreadnode/dreadnode-tiger#1891)
Nova Sonic S2S target support in the TUI and AIRT SDK. Nova Sonic speech-to-speech targets are now available in the TUI capability picker and the SDK’s build_target factory. (dreadnode/capabilities#81, dreadnode/dreadnode-tiger#1907)

Improvements

Multimodal AIRT findings layout — Original → Transformed → Response. Findings now display a clean three-panel message layout with judge model, transformed prompt, and inline trace media shown correctly. (dreadnode/dreadnode-tiger#1886)
Model reasoning always visible inline in TUI and web session viewer. Reasoning now appears as always-visible inline prose; ^O controls tool-output verbosity only and no longer hides reasoning blocks. (dreadnode/dreadnode-tiger#1870)
AIRT findings ranked by severity. Critical findings now surface first in the findings list. (dreadnode/dreadnode-tiger#1868)
Improved dn update diagnostics. The CLI and TUI now show accurate resolver-failure details when dn update exits without changing the installed version. (dreadnode/dreadnode-tiger#1887)
Org, workspace, and project names enforce 100-character limit. Name inputs now validate length as you type, preventing overflow and submission errors. (dreadnode/dreadnode-tiger#1919)

Fixes

SDK installable again — MoviePy moved to dreadnode[video] extra. dreadnode 2.0.32+ is installable from PyPI; the Pillow/MoviePy conflict is resolved by making MoviePy optional. (dreadnode/dreadnode-tiger#1882)
spawn_agent no longer hangs indefinitely. Sub-agents now have a default 1-hour timeout (configurable); the undocumented run_in_background field has been removed. (dreadnode/dreadnode-tiger#1861)
Native reasoning now reaches the client on dn/ proxy routes. Thinking blocks and reasoning_content from Claude and other reasoning models were previously silently dropped on platform proxy routes. (dreadnode/dreadnode-tiger#1901)
Nova Sonic adapter no longer hangs on missing AWS credentials. The adapter now resolves credentials via the standard AWS chain (env, profile, SSO, IMDS) and fails fast with a clear error, eliminating the 9+ minute hang. (dreadnode/dreadnode-tiger#1911)
Multimodal AIRT findings show real errors instead of “(no response)”. When a target call fails (e.g. auth 401, bad model ID), the actual error is surfaced so failures are distinguishable from genuine model refusals. (dreadnode/dreadnode-tiger#1912)
AIRT trial counts, missing responses, score tooltips, and trace deep-links fixed. Trial counts display correctly, missing responses fall back to trial data, tooltips work on score columns, and silent refusals show a descriptive message. (dreadnode/dreadnode-tiger#1916)
Judge reasoning now appears in finding headlines and markdown exports. Previously blank even when trial-level reasoning was present; root cause was reading from the study-level span instead of the trial span. (dreadnode/dreadnode-tiger#1915)
Judge LLM field on assessment-detail header shows model name. Previously displayed ”—” for multimodal assessments. (dreadnode/dreadnode-tiger#1914)
AIRT compliance dashboards show correct denominators. OWASP LLM Top 10 now shows /10 and NIST AI RMF shows /4; multimodal media in trace spans renders inline instead of as raw JSON. (dreadnode/dreadnode-tiger#1900)
AIRT multimodal video, audio playback, and per-trial message blocks fixed. Video input, audio playback, and judge reasoning work correctly; findings tables show per-trial message blocks instead of a single best-trial view. (dreadnode/dreadnode-tiger#1888)
AIRT multimodal view shows correct attacker prompt on failed trials. Previously showed the wrong prompt; media downloads now save to disk instead of opening inline, and trial scores carry clearer “Overall Score” labeling. (dreadnode/dreadnode-tiger#1897)
TUI model picker cursor no longer jumps when filtering. Cursor now stays at the top of filtered results instead of jumping to the bottom. (dreadnode/dreadnode-tiger#1859)
TUI first-run defaults to a platform-hosted model. New users without a personal Anthropic API key no longer hit an immediate failure on TUI startup. (dreadnode/dreadnode-tiger#1860)
Network-ops tool wrapper bugs fixed. Four confirmed bugs resolved: smbclient preserves partial directory listings, netexec handles multiple groups in enumeration, SharpView correctly parses arguments with spaces, and Impacket script discovery works with bash wrapper scripts. (dreadnode/capabilities#72)

July 2, 2026

Claude Code engine — run Claude's agentic loop as a governed, session-backed agent

6 new 5 improved 14 fixed

Agents get a pluggable execution engine this week, with Claude Code shipping as the first built-in option alongside a full suite of transcript fidelity and eval reliability fixes that make it production-ready.

New

Pluggable agent engine + Claude Code harness. Agents now support swappable execution engines; the built-in Claude Code engine runs Claude’s agentic loop as a first-class, governed, session-backed agent with full trace viewer support. (dreadnode/dreadnode-tiger#1768, dreadnode/dreadnode-tiger#1816, dreadnode/dreadnode-tiger#1819, dreadnode/dreadnode-tiger#1820, dreadnode/dreadnode-tiger#1822)
Task Sets. Tasks can now be grouped into named, versioned collections and run as a single evaluation target via the API, dn task-set CLI commands, or the Environments UI. (dreadnode/dreadnode-tiger#1826)
User-controlled task-environment models. You can now choose which model a task’s environment (defender) uses per role via --env-model role=id in the CLI, model_overrides in evaluation.yaml, or the API — independently of the solver and judge models. (dreadnode/dreadnode-tiger#1839)
Evaluation sample search. You can now search across all samples in an evaluation by free text — matches are highlighted with per-sample counts, and a “Matches only” filter collapses non-matching samples. (dreadnode/dreadnode-tiger#1810)
Blind SQL injection extraction skill. A new blind SQLi extraction skill and tool in the web-security capability supports boolean/timing-based oracle identification, WAF bypass patterns, and automated char-by-char extraction. (dreadnode/capabilities#61)
Blind SSRF chains with attacker infrastructure provisioning. The blind-SSRF-chains skill now guides agents to detect available cloud CLIs and provision attacker infrastructure (S3 buckets, redirect servers) for SSRF proof-of-concept evidence. (dreadnode/capabilities#65)

Improvements

Task-set provenance on evaluation detail. Evaluation detail now shows the linked task-set org/name and a collapsible list of skipped members with reasons when an eval was run from a task set. (dreadnode/dreadnode-tiger#1851)
Usage summary in headless print mode. dn --print now outputs a model, tokens, tool calls, and cost summary to stderr after each run, matching the cost visibility available in the TUI. (dreadnode/dreadnode-tiger#1801)
Sub-agent costs in TUI footer. The TUI footer now shows sub-agent LLM costs as a separate “subagents $X.XX” segment so total spend is no longer understated when using spawned sub-agents. (dreadnode/dreadnode-tiger#1834)
Bare --resume opens session picker. dn --resume (or -r) with no argument now opens the session picker instead of erroring, matching the UX of Codex and Claude. (dreadnode/dreadnode-tiger#1836)
Chat runtime reliability overhaul. Sandbox state management is simplified and the chat page handles runtime lifecycle more reliably, reducing spurious disconnects and state loss. (dreadnode/dreadnode-tiger#1809)

Fixes

AIRT assessments no longer stuck in pending/running. Assessments stuck indefinitely are now automatically finalized after a configurable timeout (default 2h), and a one-time migration clears the existing backlog. (dreadnode/dreadnode-tiger#1813)
AIRT overview page 500 errors resolved. The AIRT project summary endpoint no longer intermittently 500s (~58/hr) due to a closed database transaction. (dreadnode/dreadnode-tiger#1811)
Eval task timeout now respects org ceiling. Evals no longer fail at provisioning when --task-timeout-sec is unset; the runtime is capped to your org’s ceiling, and an over-ceiling value returns a clear 400. (dreadnode/dreadnode-tiger#1815)
dn/* model IDs work in AIRT runs. All three AIRT model roles (attacker, judge, and target) now correctly route through the LiteLLM proxy instead of failing with “LLM Provider NOT provided.” (dreadnode/capabilities#66, dreadnode/dreadnode-tiger#1840, dreadnode/dreadnode-tiger#1845)
TUI conversation scrolling is smooth. The viewport no longer jumps when new messages arrive, and scroll performance is significantly improved for long conversations (~129 ms/step → ~47 ms). (dreadnode/dreadnode-tiger#1824)
TypeScript files no longer sent as video. The SDK read tool no longer crashes generation when reading .ts/.mts files — they’re now correctly identified as text. (dreadnode/dreadnode-tiger#1797)
Image content no longer fails text-only models. Text-only models (e.g. via OpenRouter) no longer fail with a 404 when a tool returns image content — the SDK substitutes a textual description and retries automatically. (dreadnode/dreadnode-tiger#1804)
MCP tool calls retry transient failures. MCP tool calls now retry once on dropped connections or subprocess crashes instead of immediately marking the server unavailable. (dreadnode/dreadnode-tiger#1823, dreadnode/capabilities#62)
MCP shutdown no longer emits noisy warnings. The MCP client no longer logs “Task exception was never retrieved” on shutdown when the subprocess or remote server has already exited. (dreadnode/dreadnode-tiger#1803)
Unique browser session names required for web-security agents. Concurrent web-security agents no longer clobber each other’s browser state — the --session <name> flag is now required. (dreadnode/capabilities#64)
Agents page defaults to no session grouping. The sessions page now defaults to no grouping instead of grouping by workflow instance; workflow grouping remains available in the selector. (dreadnode/dreadnode-tiger#1854)
Claude Code engine transcript fidelity. Transcripts now show the initiating user message and correctly link tool call outputs to their calls; orphaned tool results and lost calls from burst-emit engines are fixed. (dreadnode/dreadnode-tiger#1820, dreadnode/dreadnode-tiger#1822)
Engine-override eval sessions no longer hang. Evaluations using --engine claude-code no longer wait indefinitely for tool approval that never arrives. (dreadnode/dreadnode-tiger#1819)
Eval task timeout respects org ceiling. Evals with an unset --task-timeout-sec no longer fail at provisioning due to exceeding the org’s 6h ceiling. (dreadnode/dreadnode-tiger#1815)

June 2026

June 25, 2026

Analytics query and notebook UX overhaul

3 improved 1 fixed

A full web chat interface for project runtimes lands this week, alongside capability management and rich tool-call rendering.

Improvements

Analytics query and notebook UX overhaul. The analytics query and notebook pages now include empty states, a stats summary strip, loading skeletons, inline error banners, schema column insertion, and example query chips. (dreadnode/dreadnode-tiger#1766)
Web-security agent project memory. The web-security agent now persists reconnaissance findings, gadgets, leads, defense bypasses, and discoveries across sessions using structured project memory. (dreadnode/capabilities#60)
Hide revealed media in tool results. Revealed images in tool results and logged outputs can now be re-hidden — clicking Hide removes the image from view and frees the decoded blob from memory. (dreadnode/dreadnode-tiger#1781)

Fixes

AIRT project overview crash on non-finite values. The AIRT project overview no longer returns a 500 error for projects with a stale running assessment that produced NaN or Infinity values in its snapshot. (dreadnode/dreadnode-tiger#1776)

June 18, 2026

Tool media evidence in session transcript

6 new 9 improved 14 fixed

Agents can now persist and retrieve durable project-scoped memory across runs with the new ProjectMemory tools.

New

Tool media evidence in session transcript. Screenshots and other media logged via dn.log_output() inside tools now render inline in the session transcript next to the tool call that produced them. (dreadnode/dreadnode-tiger#1747)
Unified LiteLLM admin proxy and model browser. Admins can manage LiteLLM model deployments and credentials directly from the platform; platform models are now always org-allowlist-scoped, and Ctrl+K in the TUI opens the full model browser. (dreadnode/dreadnode-tiger#1688)
Release Notes page on docs site. A new page at docs.dreadnode.io/release-notes shows the full changelog with tier chips, month navigation, per-entry permalinks, and an RSS feed. (dreadnode/dreadnode-tiger#1723)
App-layer DoS skill for web-security. New app-layer-dos skill covers ReDoS, decompression bombs, server delay exploitation, and cross-protocol amplification for application-layer DoS testing. (dreadnode/capabilities#57)
Archive path-traversal skill. New archive-path-traversal skill covers Zip Slip, symlink attacks, polyglot MIME bypass, and Unicode path confusion via the archivealchemist tooling. (dreadnode/capabilities#50)
EXIF metadata manipulation tools. New exiftool capability adds exif_read, exif_write, exif_strip, and exif_copy methods for EXIF metadata manipulation in web-security agents. (dreadnode/capabilities#58)

Improvements

AIRT findings: newest-first, auto-refresh, and richer detail. All Findings now defaults to newest-first order, auto-surfaces new findings every 20 seconds with a notification pill, and the finding detail panel shows a structured Human Review table, a Metadata tile with Finding/Assessment IDs, a copy button on reasoning, and UUID-prefix search. (dreadnode/dreadnode-tiger#1753, dreadnode/dreadnode-tiger#1757)
Web-security continuous scan status. The web-security agent now emits a STATUS update (including an Unexplored field) after every action, reducing silent drift and improving visibility into scan coverage. (dreadnode/capabilities#54)
Confidence trace IDs in web-security reports. Web-security vulnerability reports now include a trace ID linking each finding back to the confidence assessment that approved it. (dreadnode/capabilities#46)
CVSS score in confidence assessments. The web-security credence tool now accepts an optional CVSS score, echoes it as [cvss:N.N] in output, and flags mismatches such as low confidence paired with a high CVSS score. (dreadnode/capabilities#53)
ASR by Category chart. The ASR by Category card on the Assessments page now renders a bar chart in Charts mode, matching the ASR by Attack and ASR by Transform cards. (dreadnode/dreadnode-tiger#1691)
Actor visibility across activity feed and agent sessions. The home page activity feed now shows an actor chip for each event, and the agent sessions list shows who or what started each session. (dreadnode/dreadnode-tiger#1685, dreadnode/dreadnode-tiger#1687)
Evaluations page shows capability and user detail. The evaluations list now surfaces who or what ran each sample alongside the capability reference, making it easier to identify the origin of a run. (dreadnode/dreadnode-tiger#1717)
Web-security interrupted tool result recovery. The web-security capability now automatically retries when the model emits an interrupted tool call result, replaying the last tool outcome up to a bounded retry budget. (dreadnode/capabilities#1)
Inject env vars into smoke tests via -e/--env. The new -e/--env flag on dn task validate --smoke lets you inject environment variables into the challenge service at smoke-test time without editing docker-compose.yaml. (dreadnode/dreadnode-tiger#1745)

Fixes

Evaluation reliability: deadlocks, sandbox provisioning, and circuit-breaker gaps. Evaluations no longer get stuck in queue for hours due to deadlocks, sandbox provisioning failures, or missing circuit-breaker coverage; transient infrastructure errors now retry automatically. (dreadnode/dreadnode-tiger#1699)
AIRT findings list restored: legacy compat and fast load. The findings list now correctly displays findings saved with legacy review-event field names and loads without the previous 6–10 second delay caused by a sequential scan. (dreadnode/dreadnode-tiger#1759)
Multi-attack campaigns no longer crash mid-run. All 12 attack types now accept airt_* span-linkage kwargs, fixing TypeError crashes that caused campaigns to partially complete and produce duplicate assessments. (dreadnode/capabilities#44, dreadnode/dreadnode-tiger#1693, dreadnode/dreadnode-tiger#1707)
Judge verification timeout extended. Trajectory-based task verifiers (outcome_judge, script_and_judge, flag_and_judge) no longer time out prematurely — the judge phase now has a 180-second minimum, preventing empty judgement results. (dreadnode/dreadnode-tiger#1706)
ASR formula consistent between detail and overview pages. Successful attack counts and ASR on the assessment detail page now use the same severity-based formula as the overview page. (dreadnode/dreadnode-tiger#1694)
Duplicate rows removed from ASR by Category chart. The chart no longer shows both jailbreak and jailbreak_general for runs using a single underscore-named category. (dreadnode/dreadnode-tiger#1732)
Project selector persists across pages. Navigating from agent sessions to Charts or Analytics no longer resets your selected project. (dreadnode/dreadnode-tiger#1689)
Catalog filter buttons appear immediately after a transient load failure. Filter buttons on the Environments, Capabilities, Datasets, and Models catalog pages no longer stay hidden for minutes after a facets load error. (dreadnode/dreadnode-tiger#1696)
Video inputs work in eval model tasks. The SDK now serializes video content parts correctly for Gemini/Google/Vertex and OpenAI-compatible paths. (dreadnode/dreadnode-tiger#1718)
Failed assessments stay failed. Assessments that fail during finalization now correctly report status=failed in dn airt list --json instead of being overwritten to completed. (dreadnode/dreadnode-tiger#1733)
Workflow regeneration preserves hand-patched files. Re-running workflow generation no longer silently overwrites hand-edited files — existing files are kept and new versions are written as name_v2.py, name_v3.py, etc. (dreadnode/capabilities#56)
Anti-fabrication guard on report-writer trace IDs. The report-writer skill now marks missing confidence trace IDs as MISSING instead of allowing the agent to hallucinate a plausible-looking ID. (dreadnode/capabilities#47)
TUI /model command no longer overwrites profile default. Using /model or /models in a TUI session no longer persists the selection to your profile, so different sessions can use different models without interfering. (dreadnode/dreadnode-tiger#1763)
Activity feed status colors unified. Status badge colors on the home page activity feed now match the rest of the platform — e.g. AIRT “running” shows blue and World “cancelled” shows gray. (dreadnode/dreadnode-tiger#1737)

June 11, 2026

Session and Trace Views Now Display Agent-Captured Images, Audio, and Video

11 new 14 improved 16 fixed

Sandbox runtime limits now resolve through a four-level hierarchy configurable by platform admins, org owners, and individual users — alongside a concentrated push to eliminate evaluation sandbox timeouts that were causing widespread ‘sandbox was not found’ failures.

New

Session and Trace Views Now Display Agent-Captured Images, Audio, and Video Analysts can now view the images, audio, and video an agent logged during a run directly in the session transcript and trace details, rather than as raw base64. Each item stays behind an explicit Show media click, so potentially hostile artifacts are never decoded until an analyst chooses to open them.
Sandbox runtime limit hierarchy. Sandbox runtime limits now resolve through a four-level hierarchy (platform max → org max → org default → user-requested), configurable by platform admins, org owners, and individual users across the UI, API, SDK, and TUI. (dreadnode/dreadnode-tiger#1651)
Attack surface management capability. A new attack surface management capability combines BBOT scanning, Shodan enrichment, Neo4j graph analysis, screenshot triage, and a multi-agent ASM pipeline. (dreadnode/capabilities#43)
HTTP/2 WAF bypass skill. New h2-waf-bypass skill in the web-security capability covers 6 bypass classes including delayed DATA frame timing, body size truncation, Extended CONNECT, ForwardAuth body stripping, path normalization, and JSON content-type gaps. (dreadnode/capabilities#38)
GraphQL penetration testing skill. New graphql-pentest skill covers endpoint discovery, introspection mining, resource abuse (CWE-400, CWE-674), content-type CSRF, and capability matrix testing. (dreadnode/capabilities#39)
Vulnerability assessment methodology capability. A new standalone vuln-assessment-methodology capability provides a reusable severity matrix, disprove-first rules, and reporting standards that any security capability can load. (dreadnode/capabilities#40)
Dashboard workspace toggle. The dashboard now has a toggle to filter activity between your personal workspace and the full organization view. (dreadnode/dreadnode-tiger#1619)
--judge-model override for evaluation creation. New --judge-model flag on dn evaluation create lets you override the task-level judge model at eval creation time without modifying task.yaml. (dreadnode/dreadnode-tiger#1662)
Pinned smoke solution image for dn task validate. dn task validate --smoke now runs solution and verify scripts in a pinned container that mirrors the production agent runtime, so smoke results match production behavior. (dreadnode/dreadnode-tiger#1659)
ProcessJudge prefix caching. ProcessJudge now supports prefix caching (opt-in cache flag), cutting LLM costs on long judge sessions by billing repeated calls at cache-read rates; GuardSessionPolicy enables it by default. (dreadnode/dreadnode-tiger#1655)
Automated Cloud Assessments Now Measure the Impact of Exposed AWS Credentials Security teams can now confirm how far an exposed AWS credential or instance-metadata finding could be taken, with the platform’s agents running authorized AWS exploitation checks to validate real-world impact.

Improvements

Secrets and chat models moved to Account Settings. Secrets and chat model settings have moved from org settings pages into the Account Settings modal, with deep-link support via URL hash (e.g. #account-settings/secrets). (dreadnode/dreadnode-tiger#1677)
Sandboxes view shows workload context. The sandboxes view now shows evaluation, task, and agent context for each sandbox, making it easier to trace which workload a sandbox belongs to. (dreadnode/dreadnode-tiger#1656)
Default agent timeout doubled to 1 hour. Default agent timeout for evaluations increased from 30 minutes to 1 hour, giving agents more time to complete complex multi-step tasks like Cybench challenges. (dreadnode/dreadnode-tiger#1676)
In-flight eval samples pinned to top. In-flight eval samples now stay pinned at the top of the samples pane during concurrent runs, so active samples no longer get buried in the list. (dreadnode/dreadnode-tiger#1670)
Actor chip in activity feed. The activity feed on the home page now shows an actor chip identifying who performed each action. (dreadnode/dreadnode-tiger#1685)
Session starter visibility in agent sessions. Agent sessions list now shows who or what started each session, making it easier to identify the origin of a run at a glance. (dreadnode/dreadnode-tiger#1687)
ASR by Category chart in Assessment Reporting. The ASR by Category card in Assessment Reporting now renders a bar chart in Charts mode, matching the ASR by Attack and ASR by Transform cards. (dreadnode/dreadnode-tiger#1691)
Job-status discs now have accessible labels. Job-status discs across evaluations, training, optimization, and worlds now show hover tooltips and ARIA labels so their color and animation meaning is clear to all users. (dreadnode/dreadnode-tiger#1654)
/agents TUI listing cleaned up. The /agents command in the TUI now groups agents under capability headings with aligned columns, an active-agent marker, and a footer hint — and correctly displays on the home screen. (dreadnode/dreadnode-tiger#1628)
‘Add Custom Secret’ button moved to top. The ‘Add Custom Secret’ button on the Secrets settings page now appears at the top of the section, so you no longer need to scroll past existing secrets to reach it. (dreadnode/dreadnode-tiger#1664)
Smoke test now validates multi-file task solutions. Smoke test runner stages the full task directory into the agent sandbox, so multi-file solutions referencing sibling scripts or assets validate correctly. (dreadnode/dreadnode-tiger#1673)
dn task validate --smoke covers judge-backed verification methods. dn task validate --smoke now runs the mechanical verification steps (solution + verify) for script_and_judge and flag_and_judge tasks instead of silently skipping them. (dreadnode/dreadnode-tiger#1652)
Project selector persists across navigation. The project selector now persists across pages — navigating from agent sessions to Charts or Analytics no longer resets your selected project. (dreadnode/dreadnode-tiger#1689)
Expired Verification Links Can Now Be Resent in One Click Users whose email verification link has expired can now request a fresh one in a single click, directly from the recovery page.

Fixes

Evaluation sandbox timeouts eliminated. A cluster of fixes ensures eval sandboxes now correctly honour their configured runtime limit: the rebalancer no longer resets timeouts to the org default, org ceiling validation is bypassed for eval-managed sandboxes, and explicit runtime limits are passed through from evaluation service to sandbox. (dreadnode/dreadnode-tiger#1680, dreadnode/dreadnode-tiger#1681, dreadnode/dreadnode-tiger#1682, dreadnode/dreadnode-tiger#1683)
Cybench eval port connectivity fixed. Cybench evaluations no longer return 502 ‘port not open’ errors; sandbox timeout increased to 30 minutes and eval concurrency raised to 10; cold Docker Compose builds no longer expire the sandbox mid-run. (dreadnode/dreadnode-tiger#1669, dreadnode/dreadnode-tiger#1671)
Evaluation queue starvation and sandbox provisioning failures fixed. Evaluation runs no longer queue-starve for hours behind large evals in the same org; env sandbox provisioning timeout raised to 300s to prevent ~30% of runs from failing. (dreadnode/dreadnode-tiger#1638)
AI red-teaming SDK config regression and analytics restored. AI red-teaming attack workflows no longer fail with a misleading SDK config error, and analytics results are now correctly persisted and surfaced instead of reporting false failures. (dreadnode/capabilities#34)
generate_category_attack accepts all input formats. The skill no longer fails with cryptic ‘Unknown attack: t’ errors when passing attack names as strings — string, list, and comma-separated inputs all now work. (dreadnode/capabilities#35)
dn task validate and API now enforce the same rules. Tasks that pass local validation will no longer be rejected with an opaque 400 on upload. (dreadnode/dreadnode-tiger#1631)
OAuth buttons respond on first click. “Continue with Google” and “Continue with GitHub” buttons now respond on the first click, previously dead due to a hydration race on the sign-in/sign-up form. (dreadnode/dreadnode-tiger#1645)
Training runs honour the epochs parameter. The epochs parameter for Tinker SFT is no longer silently ignored — jobs previously always ran 100 steps regardless of dataset size or epoch setting. (dreadnode/dreadnode-tiger#1640)
Training run charts no longer show gaps in validation loss. The SDK no longer emits a None-padded val_loss series, and historical runs are handled gracefully. (dreadnode/dreadnode-tiger#1636)
Gemini models no longer fail with authentication errors. Models returning null cache usage fields from LiteLLM (e.g. gemini/gdm-eval-model-lis) no longer fail with a spurious authentication error. (dreadnode/dreadnode-tiger#1641)
read tool returns images as vision input. The read tool now returns image files as ContentImageUrl so VLMs receive proper vision input instead of a raw base64 text blob. (dreadnode/dreadnode-tiger#1653)
Catalog page filter buttons load immediately. Filter buttons on Environments, Capabilities, Datasets, and Models catalog pages now appear immediately instead of taking minutes to load. (dreadnode/dreadnode-tiger#1696)
Queued and running eval statuses now show distinct colors. Queued/pending returns to gray while running stays blue — previously both showed the same blue. (dreadnode/dreadnode-tiger#1661)
Cancelled evaluations no longer leave samples spinning. Cancelled evaluations no longer leave sample rows spinning indefinitely — statuses now correctly reflect the cancelled state. (dreadnode/dreadnode-tiger#1672)
Legacy runtime timeout no longer triggers ceiling errors. Evaluations no longer fail with ‘requested_runtime_limit_seconds exceeds platform ceiling’ when you didn’t explicitly set a runtime limit. (dreadnode/dreadnode-tiger#1663)
Unused runtime limit set silently respected by evaluations. The platform no longer incorrectly treated internal timeout values as explicit user runtime requests. (dreadnode/dreadnode-tiger#1663)

June 4, 2026

Binary analysis capability replaces windows-reversing with cross-platform PE/ELF/Mach-O support

8 new 5 improved 14 fixed

The binary-analysis capability lands this week with cross-platform support for PE, ELF, and Mach-O — replacing the older windows-reversing bundle.

New

Binary analysis capability (cross-platform). A new binary-analysis capability replaces windows-reversing with PE/ELF/Mach-O support, an expanded Ghidra MCP server, and a consolidated 8-phase analysis skill. (dreadnode/capabilities#32)
Android APK research capability. A new android-apk-research capability bundle ships a 10-tool MCP server and four skills for static semantic vulnerability research on Android APKs. (dreadnode/capabilities#29)
Bulk evaluation log download. You can now download all sample logs from an evaluation as a single ZIP bundle instead of exporting one sample at a time. (dreadnode/dreadnode-tiger#1612)
Dashboard workspace toggle. The dashboard now has a toggle to filter activity between your personal workspace and the full organization view. (dreadnode/dreadnode-tiger#1619)
Billing purchase receipt downloads. Org owners and admins can now download receipts for individual credit purchases from the billing settings page. (dreadnode/dreadnode-tiger#1622)
Conditional MCP header omission. MCP server headers can now be marked optional: true so unset env vars omit the header entirely, letting OAuth run as the default without a separate wrapper capability. (dreadnode/dreadnode-tiger#1617)
New end-to-end guides. Two structured guides — “Building a Capability” and “Red Teaming a Model” — replace several older scattered deep-dive docs. (dreadnode/dreadnode-tiger#1587)
Automated Web-Security Testing Now Covers Adobe Experience Manager and Apache Sling Security teams can now run automated assessments against Adobe Experience Manager and Apache Sling applications, surfacing the selector-abuse, dispatcher-bypass, and cross-site scripting weaknesses specific to those platforms. Existing coverage for blind SSRF chains and DOM-based vulnerabilities has also been sharpened for more reliable detection.

Improvements

Server-side transcript search. “Find in transcript” now searches all messages in a session server-side, so you can find matches in long sessions without paging through the entire transcript first. (dreadnode/dreadnode-tiger#1576)
Web-security skill quality pass. 28 web-security skills were updated with improved tooling, executable commands, and validation checkpoints — average skill score raised from 86% to 92% across all 58 skills. (dreadnode/capabilities#31)
Credits and pricing explained inline. Credits and pricing are now explained across the signup page, TUI status bar, TUI help panel, and docs Quickstart — each surface links to the canonical Credits docs. (dreadnode/dreadnode-tiger#1575)
Task verification method now visible in the UI. Task verification method (script, flag, judge, or compound) is now shown in both the environment task detail view and evaluation sample detail view. (dreadnode/dreadnode-tiger#1596)
Consistent activity awareness across views. Status indicators, severity tags, and activity awareness are now more consistent across assessments, evaluations, training, and optimization views. (dreadnode/dreadnode-tiger#1611)

Fixes

AI red-teaming attack workflows no longer fail with an SDK config error. Analytics results are now persisted correctly, and tool errors surface as clean messages instead of raw tracebacks. (dreadnode/capabilities#34)
Multi-study AIRT runs now export all traces. In multi-study workflows (transform comparisons, campaigns, category sweeps), all study traces are now exported — not just the first study’s. Applying a single transform no longer adds an unwanted baseline run. (dreadnode/capabilities#33)
Session transcripts no longer truncate at 2,000 messages. Sessions with more than 2,000 messages now load all messages via pagination — previously the most recent messages were silently dropped. (dreadnode/dreadnode-tiger#1574)
TUI login no longer gets stuck in a LiteLLM key provisioning loop. Connecting to the platform no longer fails with a repeated “Failed to provision LiteLLM Key” error. (dreadnode/dreadnode-tiger#1590)
SDK and platform now enforce the same task-validation rules. Tasks that pass dn task validate locally will no longer be rejected with an opaque 400 error on upload. (dreadnode/dreadnode-tiger#1631)
generate_category_attack no longer fails with “Unknown attack: ‘t’” errors. All input formats for attack names — string, list, or comma-separated — now work correctly. (dreadnode/capabilities#35)
flag_and_judge and script_and_judge verification methods now accepted by SDK validation. This fixes 248 task definitions that were previously rejected incorrectly by dn task validate. (dreadnode/dreadnode-tiger#1582)
Session headers now show accurate message counts and reports. Active vs. total message counts are shown for compacted sessions, and the Reports tab correctly surfaces all reports from the full transcript. (dreadnode/dreadnode-tiger#1618)
TUI model browser now shows context and credit cost for hosted models. dn/* model rows no longer display - for context window size and price when catalog metadata is available. (dreadnode/dreadnode-tiger#1583)
TUI hosted model list no longer shrinks after restart. New models deployed to the platform are now visible without having to delete ~/.dreadnode/proxy-models.json. (dreadnode/dreadnode-tiger#1578)
TUI /models screen respects admin-configured model order. Models are now displayed in the order set in the admin UI instead of being re-sorted alphabetically. (dreadnode/dreadnode-tiger#1581)
Evaluation tool-use lists now show the actual skill name. The literal word skill no longer appears in place of the loaded skill name (e.g. agent-browser). (dreadnode/dreadnode-tiger#1609)
Running and failed status dots are now visually distinct. Running tasks in the evaluations view show a blue dot instead of orange-red, making them clearly distinguishable from failed (red) at a glance. (dreadnode/dreadnode-tiger#1603)
Selected items in sidebar panels now show a visible highlight. Evaluations, AIRT assessments, training runs, and trace-viewer runs now display a brand-tinted highlight on the active selection instead of blending into the background. (dreadnode/dreadnode-tiger#1579)

May 2026

May 28, 2026

Android APK research capability and issue-tracker connectors for web-security

8 new 3 improved 11 fixed

New capability bundles for Android APK research and issue-tracker integrations land alongside a full sweep of assistant-led CLI guides across the platform.

New

Android APK research capability. A new android-apk-research capability bundle ships with a 10-tool MCP server and four skills for static semantic vulnerability research on Android APKs. (dreadnode/capabilities#29)
Jira connector for web-security. Web-security agents can now export validated findings directly to Jira Cloud as remediation tickets via a new Jira MCP connector. (dreadnode/capabilities#23)
GitHub connector for web-security. A new GitHub MCP connector lets web-security agents file findings as GitHub issues and add follow-up comments without leaving the agent workflow. (dreadnode/capabilities#24)
Run artifact logging tools. Web-security agents can now attach screenshots, audio, video, and file artifacts directly to a run using four new path-based logging tools: log_image_output, log_audio_output, log_video_output, and log_file_artifact. (dreadnode/capabilities#21)
waymore in web-security runtime. The web-security capability runtime now includes waymore, enabling historical URL and response retrieval from Wayback Machine, CommonCrawl, OTX, URLScan, and VirusTotal for recon and JS archaeology. (dreadnode/capabilities#17)
IDOR/BOLA judge scorer rubric. A new built-in IDOR/BOLA rubric is available for the SDK’s default LLM judge scorer, with graduated pentest-aligned cross-boundary access criteria; the web-security scorer-reference skill documents how to compose it in task.yaml and SDK configs. (dreadnode/dreadnode-tiger#1558, dreadnode/capabilities#19)
Activity feed on the home page. The home page now shows a live activity feed with recent projects, sessions, evaluations, training jobs, and other workspace activity. (dreadnode/dreadnode-tiger#1569)
Administrators Can Now Control Which AI Models Members Use. Organization administrators can now grant each member access to a specific set of AI models, with the limits enforced automatically across the web app, API, and CLI.

Improvements

Assistant-led CLI guides across all surfaces. Every in-product CLI guide (capability, task/environment, dataset, model, evaluation, optimization, world manifest, trajectory, training) is rewritten into a two-tab layout with an ‘Ask an agent’ tab, accurate dn alias commands, and corrected flags. (dreadnode/dreadnode-tiger#1538, dreadnode/dreadnode-tiger#1541, dreadnode/dreadnode-tiger#1542, dreadnode/dreadnode-tiger#1543, dreadnode/dreadnode-tiger#1547, dreadnode/dreadnode-tiger#1559, dreadnode/dreadnode-tiger#1560, dreadnode/dreadnode-tiger#1562, dreadnode/dreadnode-tiger#1563)
CLI guide launcher button labels clarified. Launcher buttons across nine surfaces now use accurate action verbs (Create, Evaluate, Optimize, Train) instead of generic or incorrect labels. (dreadnode/dreadnode-tiger#1564)
AIRT and Traces visual language overhaul. The AIRT Traces view and shared TraceViewer now use consistent design-system primitives, unified span-category colors, and a redesigned trace control bar. (dreadnode/dreadnode-tiger#1536)

Fixes

Compound eval false negatives resolved. script_and_judge and flag_and_judge evaluation methods no longer report a failed sample when the agent successfully completes a task. (dreadnode/dreadnode-tiger#1545)
Large-trajectory judge staging no longer fails. Outcome-judge staging no longer errors on large eval trajectories; the sandbox now fetches the trajectory directly via session ID instead of having it serialized and shipped by the API. (dreadnode/dreadnode-tiger#1565)
Session transcripts no longer truncated. Session transcripts now load all messages via cursor pagination — previously, sessions with more than 2,000 messages were silently truncated, dropping the most recent ones. (dreadnode/dreadnode-tiger#1574)
dn task validate accepts compound verification methods. dn task validate now accepts flag_and_judge and script_and_judge, fixing validation failures across 248 tasks. (dreadnode/dreadnode-tiger#1582)
TUI Ctrl+C now copies to clipboard. In the agent TUI, Ctrl+C copies selected text to clipboard (with fallback to pbcopy/xclip/xsel/wl-copy); use Ctrl+Q to quit. (dreadnode/dreadnode-tiger#1566)
TUI hosted model list no longer shrinks after restart. New platform model deployments now appear in the TUI without manually deleting ~/.dreadnode/proxy-models.json. (dreadnode/dreadnode-tiger#1578)
TUI /models screen respects admin model ordering. The TUI /models screen now displays models in the order configured by your admin instead of re-sorting alphabetically. (dreadnode/dreadnode-tiger#1581)
TUI no longer crashes on mid-load screen navigation. The TUI no longer throws a NoMatches exception when navigating away from a screen while a background worker is still loading data. (dreadnode/dreadnode-tiger#1570)
TUI tool call meta line now visible. TUI tool calls correctly display the ↳ <summary> meta line beneath the tool name instead of showing a blank gutter. (dreadnode/dreadnode-tiger#1546)
Image files render correctly in the environment file viewer. Image files in the environment file viewer now render as images instead of garbled binary content. (dreadnode/dreadnode-tiger#1572)
Linear MCP credentials now passed correctly. LINEAR_API_KEY, LINEAR_ACCESS_TOKEN, and LINEAR_API_URL are now correctly forwarded to the Linear MCP server in the web-security capability. (dreadnode/capabilities#27)

May 21, 2026

Compound eval verification — script_and_judge and flag_and_judge

9 new 7 improved 18 fixed

Eval tasks now support compound verification strategies that chain mechanical checks with an LLM judge to close reward-hacking gaps.

New

Compound eval verification: script_and_judge and flag_and_judge. Two new verification methods let eval task authors run a script or flag check first, then invoke an outcome judge as a second line of defense against reward hacking — no custom rubric required. (dreadnode/dreadnode-tiger#1534, dreadnode/dreadnode-tiger#1537)
OutcomeJudge agentic verification. A new outcome_judge verification method lets eval tasks use an agentic LLM to inspect agent trajectories and emit pass/fail verdicts, alongside the existing script and flag methods; exposed via the new dn outcome-judge run CLI subcommand. (dreadnode/dreadnode-tiger#1475)
intent_plus_outputs_summary guard strategy. A new opt-in policy replaces raw tool outputs with LLM-generated summaries in judge prompts, reducing prompt-injection surface while preserving tool-call context. (dreadnode/dreadnode-tiger#1524)
Six new web app pentesting skills. ESI injection, gRPC-Web pentest, H2C/WebSocket smuggling, HTTP connection contamination, timing-attack recon, and XSLT injection are now available as knowledge distillations in the web-security capability. (dreadnode/capabilities#16)
waymore added to web-security capability runtime. Historical URL and response retrieval from Wayback Machine, CommonCrawl, OTX, URLScan, and VirusTotal is now available for recon and JS archaeology. (dreadnode/capabilities#17)
Free-text search in evaluation trajectories. Matches highlight inline and auto-expand collapsed tool calls, with the query synced to the URL. (dreadnode/dreadnode-tiger#1491)
Expanded AIRT goal categories. The AI red-teaming agent now surfaces all 15 goal categories — including reasoning_exploitation, supply_chain, and resource_exhaustion — instead of the previous incomplete list of 9. (dreadnode/capabilities#14)
Simplified AIRT workspace layout. The AI red-teaming capability now uses a cleaner ~/.dreadnode/airt/[org]/[workspace]/ path structure with clearer error messages, replacing the previous env-var-based path system. (dreadnode/capabilities#10)
New setting for org admins re: model usage. Organization admins can now control which AI models each member is allowed to use, set right from the org members admin page

Improvements

TUI keystroke performance. Keystroke routing is now ~400× faster in long conversations — typing in the composer no longer slows down as conversation history grows. (dreadnode/dreadnode-tiger#1507)
TUI event loop and grep tool overhaul. The TUI no longer stalls when sync tools run blocking calls; grep gains output modes, context lines, and smarter filtering; tool result summaries now reflect actual output. (dreadnode/dreadnode-tiger#1510)
TUI session commands rationalized. /clear and /reset both now start a fresh session (preserving the old one), and the destructive in-place wipe is removed. (dreadnode/dreadnode-tiger#1511)
Assistant-led CLI guide modals. The in-product CLI guides for capabilities, tasks, environments, and models are rewritten with an assistant-led flow, accurate commands, and an ‘Ask an agent’ tab that generates a copyable dn --prompt launcher. (dreadnode/dreadnode-tiger#1538, dreadnode/dreadnode-tiger#1541, dreadnode/dreadnode-tiger#1543)
Consistent empty states across the platform. Empty states now distinguish ‘nothing here yet’ from ‘no filter matches’, include inline docs links, and show ghost visuals that hint at what each surface will contain. (dreadnode/dreadnode-tiger#1498)
AIRT Details glossary consolidated. AIRT assessments now show a single glossary popover at the section header instead of three separate per-cell tooltips. (dreadnode/dreadnode-tiger#1493)
Consistent browser tab titles. All pages now show titles in the format {Page} | Dreadnode, replacing a mix of inconsistent formats and blank tabs. (dreadnode/dreadnode-tiger#1490)

Fixes

Compound eval verification false negatives resolved. script_and_judge and flag_and_judge no longer report failed results when the agent successfully completes a task. (dreadnode/dreadnode-tiger#1545)
AIRT CLI assessment creation now surfaces in the UI. dn airt run and dn airt run-suite now correctly connect to the platform, so runs and results appear in the assessment UI instead of silently disappearing or showing Assessment: None. (dreadnode/dreadnode-tiger#1499, dreadnode/dreadnode-tiger#1501)
Monitoring tab reports now display content. The Reports panel now shows the actual markdown report body instead of an S3 path pointer for reports over 10 KB. (dreadnode/dreadnode-tiger#1514)
Reports tab no longer stuck on ‘Loading report content…’. The Reports tab spinner no longer gets stuck after a deep-link reload when the report body was already cached. (dreadnode/dreadnode-tiger#1520)
Sandbox readiness timeout raised to 300 seconds. Capability-heavy sandboxes (e.g. web-security + zero-day-research) no longer time out during startup. (dreadnode/dreadnode-tiger#1503)
Cross-provider AIRT runs label the correct target model. The assessment UI now shows the target model instead of the attacker/orchestrator model for cross-provider runs. (dreadnode/dreadnode-tiger#1519)
Bundled capability disable state now persists. Toggling off bundled capabilities (e.g. self-improvement) in the TUI correctly disables them on reload instead of silently reverting to enabled. (dreadnode/dreadnode-tiger#1517)
Capability preflight checks now use the capability root as working directory. Checks referencing relative paths no longer fail with ‘No such file or directory’. (dreadnode/dreadnode-tiger#1516)
Invitation links show Sign In prompt for unauthenticated users. Invitation links no longer show ‘Invalid Invitation’ — unauthenticated visitors now see a Sign In / Create Account prompt instead. (dreadnode/dreadnode-tiger#1518)
BYOK OpenAI quota errors surface immediately. Agents using BYOK OpenAI no longer retry up to 8 times on insufficient_quota — the failure surfaces on the first attempt. (dreadnode/dreadnode-tiger#1512)
save_workflow now detects silent overwrites. The tool verifies file writes succeeded and warns when content is unchanged, preventing agents from operating on stale workflow scripts. (dreadnode/capabilities#15)
Deep links to evaluation samples no longer show ‘sample not found’. Navigating directly to a specific evaluation sample now resolves correctly. (dreadnode/dreadnode-tiger#1484)
Hosted sandboxes page shows accurate active counts. Sandbox counts are now backed by a facets API that aggregates state correctly. (dreadnode/dreadnode-tiger#1483)
Errored evaluation statuses tracked correctly. Errored statuses are now included in eval phase summaries instead of being silently dropped. (dreadnode/dreadnode-tiger#1523)
TUI tool meta line no longer drops. Tool calls now display the ↳ <summary> meta line beneath the tool name instead of showing a blank gutter. (dreadnode/dreadnode-tiger#1546)
TUI spinner stays visible when Esc drains a queued message. The spinner no longer disappears while a queued message is still running. (dreadnode/dreadnode-tiger#1513)
TUI tool output wrapping stays within the gutter. Wrapped lines at narrow terminal widths now stay aligned under the gutter border instead of leaking to column 0. (dreadnode/dreadnode-tiger#1522)
Docs code blocks render correct spacing. CLI commands like dn --capability now display with correct spacing in docs code blocks (Geist Mono ligatures were collapsing the space before --). (dreadnode/dreadnode-tiger#1500)

May 14, 2026

Free-text search in evaluation trajectories

4 new 8 improved 15 fixed

Free-text search lands in hosted evaluation trajectories this week, alongside session rewind, cross-org capability share links, and sandbox dependency auto-install.

New

Free-text search in evaluation trajectories. Matches are highlighted inline in hosted evaluation trajectories and tool calls auto-expand when their content matches the search term. (dreadnode/dreadnode-tiger#1491)
Session rewind. Roll back a conversation to any prior user message via /rewind or double-Esc in the TUI — the full session history is preserved. (dreadnode/dreadnode-tiger#1436)
Cross-org capability share links. Capability hub links now work across orgs — recipients are automatically redirected to the right page regardless of their org slug. (dreadnode/dreadnode-tiger#1487)
Sandbox dependency auto-install. Sandbox runtimes can now automatically install Python packages and system dependencies declared in capability manifests at load time. (dreadnode/dreadnode-tiger#1458)

Improvements

AIRT auto-discovery with updated capability counts. The AIRT CLI now auto-discovers attacks, transforms, and scorers at runtime (61 attacks, 547 transforms, 141 scorers), and capability counts in the hub accurately reflect these results. (dreadnode/dreadnode-tiger#1459, dreadnode/capabilities#6)
Simplified AIRT workspace layout. The AI red-teaming capability uses a cleaner ~/.dreadnode/airt/[org]/[workspace]/ workspace structure with improved error messages and input validation. (dreadnode/capabilities#10)
AIRT docs cover all result-review surfaces. AIRT docs and CLI help text now document all result-review paths — overview dashboard, per-assessment view, trace view, and the custom report builder — not just CLI commands. (dreadnode/dreadnode-tiger#1445)
Settings sidebar grouped into Personal vs Organization. Secrets and Chat Models are now clearly labeled as Personal, separate from Organization settings. (dreadnode/dreadnode-tiger#1474)
ATIF trajectory exports include source lineage. Exported trajectories now carry an extra.dreadnode block with org, workspace, project, and optional evaluation/item IDs for traceability. (dreadnode/dreadnode-tiger#1450)
AIRT assessments Details glossary consolidated. The Details section now shows a single glossary popover at the section header instead of three separate per-cell tooltips. (dreadnode/dreadnode-tiger#1493)
Optimization job sidebar shows author email. The optimization job sidebar now displays the author’s email instead of a raw user UUID, consistent with Evaluation and Training job lists. (dreadnode/dreadnode-tiger#1454)
Task name copy button. A copy-to-clipboard button next to the task name in the environment detail drawer makes it easier to grab long task names for CLI commands or evaluation setup. (dreadnode/dreadnode-tiger#1072)

Fixes

AIRT CLI runs now appear in the platform UI. Assessment creation was silently succeeding without connecting to the platform; AIRT CLI runs now correctly appear in the UI and analytics. (dreadnode/dreadnode-tiger#1499)
ModuleNotFoundError in UV tool and managed Python environments resolved. AI red-teaming workflows no longer fail with ModuleNotFoundError: No module named dreadnode in UV tool, container, and managed Python environments. (dreadnode/dreadnode-tiger#1476, dreadnode/dreadnode-tiger#1479, dreadnode/capabilities#7)
AIRT --project flag now correctly scopes assessments. The --project flag no longer silently falls back to default — assessments and trace exports are scoped to the specified project. (dreadnode/dreadnode-tiger#1452)
TUI context gauge shows last-generation tokens. The context gauge now shows last-generation input tokens (e.g. 800k/1M) instead of a cumulative sum that could exceed the model’s context limit. (dreadnode/dreadnode-tiger#1451)
TUI tool output and XMLModel serialization fixed. TUI tool output no longer gets clipped after completion, and XMLModel serialization no longer crashes on NUL bytes or ANSI escape sequences. (dreadnode/dreadnode-tiger#1457)
ASR values display consistently as percentages. Attack Success Rate values now render as percentages across all AIRT views — no more mixed decimal/percentage formatting. (dreadnode/dreadnode-tiger#1489)
All 15 AIRT goal categories now surfaced. The AI red-teaming agent now offers all 15 goal categories (including reasoning_exploitation, supply_chain, and resource_exhaustion) instead of only 9. (dreadnode/capabilities#14)
AIRT severity reference documents all five levels. Critical, High, Medium, Low, and Info are now all defined — previously only Critical and High were documented. (dreadnode/dreadnode-tiger#1444)
Trace copy button restored on Safari. The trace copy button (rich and log views) now works correctly on Safari after breaking in v2.0.20. (dreadnode/dreadnode-tiger#1469)
AIRT report generation no longer floods UI with toasts. Generating a report no longer triggers an infinite loop of ‘Report build failed’ toasts that locked the UI. (dreadnode/dreadnode-tiger#1470)
Onboarding username now names the default workspace. Your chosen username during onboarding correctly names your default workspace instead of an autogenerated placeholder. (dreadnode/dreadnode-tiger#1473)
Deep links to evaluation samples no longer show ‘sample not found’. Navigating directly to a specific evaluation sample via deep link now loads correctly. (dreadnode/dreadnode-tiger#1484)
projects command in TUI scoped to current workspace. The projects TUI command now loads projects for the active workspace only. (dreadnode/dreadnode-tiger#1464)
ProcessJudge no longer crashes on rubrics mentioning .yaml. Plaintext rubrics that reference a .yaml filename no longer trigger OSError: File name too long. (dreadnode/dreadnode-tiger#1465)
Docs code blocks render -- flags with correct spacing. Geist Mono ligatures are disabled in docs code blocks, so dn --capability no longer renders as dn--capability. (dreadnode/dreadnode-tiger#1500)

May 7, 2026

AIRT Report Builder now generally available with templates, branding, and DOCX export

13 new 11 improved 18 fixed

The AIRT Report Builder graduates from feature-flagged preview to general availability this week, bringing saved templates, custom branding, DOCX export, and a live structural preview to all users.

New

AIRT Report Builder GA. The AIRT Reports tab is now visible to all users by default; the builder supports saved templates (load/save/rename/delete), custom branding (logo, company name, title, confidentiality footer), DOCX export, and a live structural preview panel. (dreadnode/dreadnode-tiger#1355, dreadnode/dreadnode-tiger#1385)
Trace-analysis advisor skill. A new AI red teaming skill surfaces attack-effectiveness analysis, transform recommendations, and vulnerability pattern identification from historical run data directly inside the agent. (dreadnode/capabilities#4)
GuardSessionPolicy with LLM-judged tool gating. New GuardSessionPolicy gates every agent tool call through an LLM judge before execution, with a built-in safety rubric and optional custom rules. (dreadnode/dreadnode-tiger#1374)
Guard policy transcript strategies. GuardSessionPolicy now supports five transcript strategies (rubric_only, intent_only, intent_plus_calls, full, and more); the default is now intent_plus_calls for richer judge context. (dreadnode/dreadnode-tiger#1437)
ATIF v1.7 trajectory export and dn session CLI. A new export endpoint converts session transcripts to ATIF v1.7 or OpenAI Chat Completions format; dn session CLI command group and a download menu in the session UI let you browse, inspect, and export sessions. (dreadnode/dreadnode-tiger#1434)
Persistent TUI welcome panel and getting-started skill. The TUI welcome screen is now a persistent panel showing profile, runtime, and capability context; a new getting-started skill routes new users to the right capability or task automatically. (dreadnode/dreadnode-tiger#1372)
dn capability uninstall, session lifecycle actions, and rewritten session picker. New dn capability uninstall command, session archive/freeze/delete actions in the TUI, and a rewritten session picker with richer status columns and inline lifecycle controls. (dreadnode/dreadnode-tiger#1388)
CLI conventions overhaul. New dn inference-model (alias dn llm), dn secret list, and dn environment wait commands added; dn env renamed to dn environment with alias retained; destructive-action --yes/-y confirmation standardized across all commands. (dreadnode/dreadnode-tiger#1376)
dn update CLI command. You can now update the Dreadnode CLI directly from the terminal with dn update. (dreadnode/dreadnode-tiger#1438)
Admin-configurable featured catalog models. Platform admins can configure which models appear as featured in the catalog via a new admin UI, without code changes or redeployments. Manual entry of preview or unreleased model IDs is also supported. (dreadnode/dreadnode-tiger#1364, dreadnode/dreadnode-tiger#1369)
TUI inline report rendering with web deep-link. The TUI now renders report tool calls inline with full markdown content, a smart title, and a clickable “View in web” link that opens the platform Reports tab for the active session. (dreadnode/dreadnode-tiger#1358)
Session tool count and cost stats. Agent session views and the TUI footer now show tool call count and estimated USD cost alongside message count and token usage. (dreadnode/dreadnode-tiger#1348)
Capability hooks documentation. New documentation covers lifecycle events, observers, and gating patterns for building agents with capability hooks. (dreadnode/dreadnode-tiger#1401)

Improvements

AIRT analytics performance. AIRT analytics pages now load in 4–5 seconds on first visit and under 1 second on repeat visits, replacing a previous infinite-loop hang; per-section error indicators surface when analytics data is malformed instead of silently rendering blank charts. (dreadnode/dreadnode-tiger#1422, dreadnode/dreadnode-tiger#1435)
ask_user structured questions and cancel flow. The ask_user tool now supports structured multi-question bundles, explicit cancel flows (raising UserCancelled), and a new interactive TUI widget with tab navigation, multi-select, and Esc-to-cancel. (dreadnode/dreadnode-tiger#1387)
Artifact deep-linking. Datasets, environments, models, and capabilities now have dedicated URLs you can share or bookmark directly. Training job selections are also reflected in the URL (?job=<id>) for the same purpose. (dreadnode/dreadnode-tiger#1353, dreadnode/dreadnode-tiger#1370)
ATIF trajectory source lineage. Exported ATIF trajectories now include an extra.dreadnode block with origin and {id, key} pairs for organization, workspace, and project, so downstream consumers can trace data back to its origin. (dreadnode/dreadnode-tiger#1450)
AIRT assessments page visual and UX refresh. The Assessments page now matches the Overview’s design system with shared UI primitives, semantic tokens, and URL-driven assessment selection (right-click to open in new tab works). (dreadnode/dreadnode-tiger#1409)
AIRT report template button labels clarified. Template save buttons now read “Save as template”, “Save as new template”, and “Update template” instead of the ambiguous “Save as new” / “Update”. (dreadnode/dreadnode-tiger#1406)
TUI session title in context bar. The TUI context bar now shows the session title (truncated at 40 chars) instead of the 8-character hex ID. (dreadnode/dreadnode-tiger#1352)
Legacy client-side PDF export removed. The “Export PDF Report” button on the AIRT overview page has been removed; the new Reports tab handles PDF and all other export formats. (dreadnode/dreadnode-tiger#1386)
Notification badges on login. Pending org invitation badges now appear in the nav bar and Account Settings immediately on login or page refresh. (dreadnode/dreadnode-tiger#1393)
AIRT guidance expanded in TUI and docs. TUI help text, CLI --help, and docs now surface all web app review paths — overview dashboard, per-assessment view, trace view, and custom report builder. (dreadnode/dreadnode-tiger#1445)
Brand accent color and notification badge component. Numeric notification badges (invitations, active filters, unread events) now use a consistent BadgeAlert component, and the brand accent color has been updated to #FF6B3D with improved WCAG AA contrast. (dreadnode/dreadnode-tiger#1375)

Fixes

AIRT analytics data accuracy. Attack success rates, transform usage, trial counts, and chart data now reflect accurate totals — a series of regressions caused rates to show as ~64% when the actual value was ~100%, and large projects (500+ assessments) to time out with empty dashboards. (dreadnode/dreadnode-tiger#1417, dreadnode/dreadnode-tiger#1418, dreadnode/dreadnode-tiger#1419, dreadnode/dreadnode-tiger#1420, dreadnode/dreadnode-tiger#1425)
SDK dependency conflict resolved (2.0.19). SDK versions 2.0.16–2.0.18 were uninstallable via uv tool install or pipx install due to a litellm/fastmcp dependency conflict; 2.0.19 resolves this. (dreadnode/dreadnode-tiger#1443)
AIRT PDF/DOCX report fidelity fixes. PDF reports now include ML metadata fields (transforms_applied, original_class, adversarial_class, distance_value) previously missing from exports; DOCX reports render in landscape layout with correct column widths; truncated-column footers point to CSV export and show the correct additional-column count. (dreadnode/dreadnode-tiger#1411, dreadnode/dreadnode-tiger#1415, dreadnode/dreadnode-tiger#1407, dreadnode/dreadnode-tiger#1408)
TUI context gauge shows correct token count. The context gauge now shows last-generation input tokens (e.g. 800k/1M) instead of a cumulative sum that incorrectly exceeded the model’s context limit. (dreadnode/dreadnode-tiger#1451)
AIRT --project flag now scopes correctly. The AIRT CLI --project flag correctly scopes assessments and trace exports to the specified project instead of silently resolving to default. (dreadnode/dreadnode-tiger#1452)
AIRT compliance coverage table fixed. The Attacks Used and Trials columns now display real data; compliance framework rows expand and collapse correctly on every click; re-clicking the already-active project no longer flashes “not found”. (dreadnode/dreadnode-tiger#1414, dreadnode/dreadnode-tiger#1413, dreadnode/dreadnode-tiger#1412)
Self-improvement reflector hook restored. The self-improvement reflector hook now correctly calls Agent.run() instead of the removed Agent.chat(), restoring the feedback loop after failed agent turns. (dreadnode/dreadnode-tiger#1377)
TUI zero-balance error message. The TUI now shows “Insufficient credits — top up your org or switch to a BYOK model” instead of a misleading sign-in/provisioning error when the org has zero balance. (dreadnode/dreadnode-tiger#1402)
TUI model browser keyboard navigation. The TUI model browser now responds to arrow keys immediately on open — no mouse click required. (dreadnode/dreadnode-tiger#1403)
TUI /update false “version unchanged” warning fixed. The /update command no longer shows a false “version unchanged” warning after a successful update when multiple dn installations exist. (dreadnode/dreadnode-tiger#1396)
Cross-org public dataset access. Public datasets from other orgs now open and download correctly instead of returning 404 errors. (dreadnode/dreadnode-tiger#1405)
AIRT report download fallback for Safari iOS. The AIRT report toast now includes a Download link so users whose browser blocked the auto-download can retrieve their report without re-running generation. (dreadnode/dreadnode-tiger#1382)
AIRT “Total Findings” count corrected. The AIRT overview “Total Findings” card now shows the correct total count instead of the current page size (20). (dreadnode/dreadnode-tiger#1404)
Dataset pull command fixed. The dn dataset pull command now works correctly. (dreadnode/dreadnode-tiger#1390)
Members and Workspaces tables sort correctly. Clicking column headers in the Members and Workspaces settings tables now sorts rows as expected. (dreadnode/dreadnode-tiger#1440)
AIRT severity levels documented. The AIRT severity reference now documents all five levels (Critical, High, Medium, Low, Info) — previously only Critical and High were defined. (dreadnode/dreadnode-tiger#1444)
System prompts visible in session transcripts. Session transcripts now display system prompts in a collapsible disclosure panel. (dreadnode/dreadnode-tiger#1394)
Docs copy button alignment fixed. The copy button on terminal code blocks in the docs site now stays inside the code block. (dreadnode/dreadnode-tiger#1439)