Skip to content

Release Notes

What’s new in the Dreadnode platform — features, improvements, and fixes, published weekly.

What’s new across the Dreadnode platform — features, improvements, and fixes, published weekly. Subscribe via RSS.

June 2026

June 11, 2026

Session and Trace Views Now Display Agent-Captured Images, Audio, and Video

11 new 14 improved 16 fixed

Sandbox runtime limits now resolve through a four-level hierarchy configurable by platform admins, org owners, and individual users — alongside a concentrated push to eliminate evaluation sandbox timeouts that were causing widespread ‘sandbox was not found’ failures.

New

  • Session and Trace Views Now Display Agent-Captured Images, Audio, and Video Analysts can now view the images, audio, and video an agent logged during a run directly in the session transcript and trace details, rather than as raw base64. Each item stays behind an explicit Show media click, so potentially hostile artifacts are never decoded until an analyst chooses to open them.
  • Sandbox runtime limit hierarchy. Sandbox runtime limits now resolve through a four-level hierarchy (platform max → org max → org default → user-requested), configurable by platform admins, org owners, and individual users across the UI, API, SDK, and TUI. (dreadnode/dreadnode-tiger#1651)
  • Attack surface management capability. A new attack surface management capability combines BBOT scanning, Shodan enrichment, Neo4j graph analysis, screenshot triage, and a multi-agent ASM pipeline. (dreadnode/capabilities#43)
  • HTTP/2 WAF bypass skill. New h2-waf-bypass skill in the web-security capability covers 6 bypass classes including delayed DATA frame timing, body size truncation, Extended CONNECT, ForwardAuth body stripping, path normalization, and JSON content-type gaps. (dreadnode/capabilities#38)
  • GraphQL penetration testing skill. New graphql-pentest skill covers endpoint discovery, introspection mining, resource abuse (CWE-400, CWE-674), content-type CSRF, and capability matrix testing. (dreadnode/capabilities#39)
  • Vulnerability assessment methodology capability. A new standalone vuln-assessment-methodology capability provides a reusable severity matrix, disprove-first rules, and reporting standards that any security capability can load. (dreadnode/capabilities#40)
  • Dashboard workspace toggle. The dashboard now has a toggle to filter activity between your personal workspace and the full organization view. (dreadnode/dreadnode-tiger#1619)
  • --judge-model override for evaluation creation. New --judge-model flag on dn evaluation create lets you override the task-level judge model at eval creation time without modifying task.yaml. (dreadnode/dreadnode-tiger#1662)
  • Pinned smoke solution image for dn task validate. dn task validate --smoke now runs solution and verify scripts in a pinned container that mirrors the production agent runtime, so smoke results match production behavior. (dreadnode/dreadnode-tiger#1659)
  • ProcessJudge prefix caching. ProcessJudge now supports prefix caching (opt-in cache flag), cutting LLM costs on long judge sessions by billing repeated calls at cache-read rates; GuardSessionPolicy enables it by default. (dreadnode/dreadnode-tiger#1655)
  • Automated Cloud Assessments Now Measure the Impact of Exposed AWS Credentials Security teams can now confirm how far an exposed AWS credential or instance-metadata finding could be taken, with the platform’s agents running authorized AWS exploitation checks to validate real-world impact.

Improvements

  • Secrets and chat models moved to Account Settings. Secrets and chat model settings have moved from org settings pages into the Account Settings modal, with deep-link support via URL hash (e.g. #account-settings/secrets). (dreadnode/dreadnode-tiger#1677)
  • Sandboxes view shows workload context. The sandboxes view now shows evaluation, task, and agent context for each sandbox, making it easier to trace which workload a sandbox belongs to. (dreadnode/dreadnode-tiger#1656)
  • Default agent timeout doubled to 1 hour. Default agent timeout for evaluations increased from 30 minutes to 1 hour, giving agents more time to complete complex multi-step tasks like Cybench challenges. (dreadnode/dreadnode-tiger#1676)
  • In-flight eval samples pinned to top. In-flight eval samples now stay pinned at the top of the samples pane during concurrent runs, so active samples no longer get buried in the list. (dreadnode/dreadnode-tiger#1670)
  • Actor chip in activity feed. The activity feed on the home page now shows an actor chip identifying who performed each action. (dreadnode/dreadnode-tiger#1685)
  • Session starter visibility in agent sessions. Agent sessions list now shows who or what started each session, making it easier to identify the origin of a run at a glance. (dreadnode/dreadnode-tiger#1687)
  • ASR by Category chart in Assessment Reporting. The ASR by Category card in Assessment Reporting now renders a bar chart in Charts mode, matching the ASR by Attack and ASR by Transform cards. (dreadnode/dreadnode-tiger#1691)
  • Job-status discs now have accessible labels. Job-status discs across evaluations, training, optimization, and worlds now show hover tooltips and ARIA labels so their color and animation meaning is clear to all users. (dreadnode/dreadnode-tiger#1654)
  • /agents TUI listing cleaned up. The /agents command in the TUI now groups agents under capability headings with aligned columns, an active-agent marker, and a footer hint — and correctly displays on the home screen. (dreadnode/dreadnode-tiger#1628)
  • ‘Add Custom Secret’ button moved to top. The ‘Add Custom Secret’ button on the Secrets settings page now appears at the top of the section, so you no longer need to scroll past existing secrets to reach it. (dreadnode/dreadnode-tiger#1664)
  • Smoke test now validates multi-file task solutions. Smoke test runner stages the full task directory into the agent sandbox, so multi-file solutions referencing sibling scripts or assets validate correctly. (dreadnode/dreadnode-tiger#1673)
  • dn task validate --smoke covers judge-backed verification methods. dn task validate --smoke now runs the mechanical verification steps (solution + verify) for script_and_judge and flag_and_judge tasks instead of silently skipping them. (dreadnode/dreadnode-tiger#1652)
  • Project selector persists across navigation. The project selector now persists across pages — navigating from agent sessions to Charts or Analytics no longer resets your selected project. (dreadnode/dreadnode-tiger#1689)
  • Expired Verification Links Can Now Be Resent in One Click Users whose email verification link has expired can now request a fresh one in a single click, directly from the recovery page.

Fixes

  • Evaluation sandbox timeouts eliminated. A cluster of fixes ensures eval sandboxes now correctly honour their configured runtime limit: the rebalancer no longer resets timeouts to the org default, org ceiling validation is bypassed for eval-managed sandboxes, and explicit runtime limits are passed through from evaluation service to sandbox. (dreadnode/dreadnode-tiger#1680, dreadnode/dreadnode-tiger#1681, dreadnode/dreadnode-tiger#1682, dreadnode/dreadnode-tiger#1683)
  • Cybench eval port connectivity fixed. Cybench evaluations no longer return 502 ‘port not open’ errors; sandbox timeout increased to 30 minutes and eval concurrency raised to 10; cold Docker Compose builds no longer expire the sandbox mid-run. (dreadnode/dreadnode-tiger#1669, dreadnode/dreadnode-tiger#1671)
  • Evaluation queue starvation and sandbox provisioning failures fixed. Evaluation runs no longer queue-starve for hours behind large evals in the same org; env sandbox provisioning timeout raised to 300s to prevent ~30% of runs from failing. (dreadnode/dreadnode-tiger#1638)
  • AI red-teaming SDK config regression and analytics restored. AI red-teaming attack workflows no longer fail with a misleading SDK config error, and analytics results are now correctly persisted and surfaced instead of reporting false failures. (dreadnode/capabilities#34)
  • generate_category_attack accepts all input formats. The skill no longer fails with cryptic ‘Unknown attack: t’ errors when passing attack names as strings — string, list, and comma-separated inputs all now work. (dreadnode/capabilities#35)
  • dn task validate and API now enforce the same rules. Tasks that pass local validation will no longer be rejected with an opaque 400 on upload. (dreadnode/dreadnode-tiger#1631)
  • OAuth buttons respond on first click. “Continue with Google” and “Continue with GitHub” buttons now respond on the first click, previously dead due to a hydration race on the sign-in/sign-up form. (dreadnode/dreadnode-tiger#1645)
  • Training runs honour the epochs parameter. The epochs parameter for Tinker SFT is no longer silently ignored — jobs previously always ran 100 steps regardless of dataset size or epoch setting. (dreadnode/dreadnode-tiger#1640)
  • Training run charts no longer show gaps in validation loss. The SDK no longer emits a None-padded val_loss series, and historical runs are handled gracefully. (dreadnode/dreadnode-tiger#1636)
  • Gemini models no longer fail with authentication errors. Models returning null cache usage fields from LiteLLM (e.g. gemini/gdm-eval-model-lis) no longer fail with a spurious authentication error. (dreadnode/dreadnode-tiger#1641)
  • read tool returns images as vision input. The read tool now returns image files as ContentImageUrl so VLMs receive proper vision input instead of a raw base64 text blob. (dreadnode/dreadnode-tiger#1653)
  • Catalog page filter buttons load immediately. Filter buttons on Environments, Capabilities, Datasets, and Models catalog pages now appear immediately instead of taking minutes to load. (dreadnode/dreadnode-tiger#1696)
  • Queued and running eval statuses now show distinct colors. Queued/pending returns to gray while running stays blue — previously both showed the same blue. (dreadnode/dreadnode-tiger#1661)
  • Cancelled evaluations no longer leave samples spinning. Cancelled evaluations no longer leave sample rows spinning indefinitely — statuses now correctly reflect the cancelled state. (dreadnode/dreadnode-tiger#1672)
  • Legacy runtime timeout no longer triggers ceiling errors. Evaluations no longer fail with ‘requested_runtime_limit_seconds exceeds platform ceiling’ when you didn’t explicitly set a runtime limit. (dreadnode/dreadnode-tiger#1663)
  • Unused runtime limit set silently respected by evaluations. The platform no longer incorrectly treated internal timeout values as explicit user runtime requests. (dreadnode/dreadnode-tiger#1663)
June 4, 2026

Binary analysis capability replaces windows-reversing with cross-platform PE/ELF/Mach-O support

8 new 5 improved 14 fixed

The binary-analysis capability lands this week with cross-platform support for PE, ELF, and Mach-O — replacing the older windows-reversing bundle.

New

  • Binary analysis capability (cross-platform). A new binary-analysis capability replaces windows-reversing with PE/ELF/Mach-O support, an expanded Ghidra MCP server, and a consolidated 8-phase analysis skill. (dreadnode/capabilities#32)
  • Android APK research capability. A new android-apk-research capability bundle ships a 10-tool MCP server and four skills for static semantic vulnerability research on Android APKs. (dreadnode/capabilities#29)
  • Bulk evaluation log download. You can now download all sample logs from an evaluation as a single ZIP bundle instead of exporting one sample at a time. (dreadnode/dreadnode-tiger#1612)
  • Dashboard workspace toggle. The dashboard now has a toggle to filter activity between your personal workspace and the full organization view. (dreadnode/dreadnode-tiger#1619)
  • Billing purchase receipt downloads. Org owners and admins can now download receipts for individual credit purchases from the billing settings page. (dreadnode/dreadnode-tiger#1622)
  • Conditional MCP header omission. MCP server headers can now be marked optional: true so unset env vars omit the header entirely, letting OAuth run as the default without a separate wrapper capability. (dreadnode/dreadnode-tiger#1617)
  • New end-to-end guides. Two structured guides — “Building a Capability” and “Red Teaming a Model” — replace several older scattered deep-dive docs. (dreadnode/dreadnode-tiger#1587)
  • Automated Web-Security Testing Now Covers Adobe Experience Manager and Apache Sling Security teams can now run automated assessments against Adobe Experience Manager and Apache Sling applications, surfacing the selector-abuse, dispatcher-bypass, and cross-site scripting weaknesses specific to those platforms. Existing coverage for blind SSRF chains and DOM-based vulnerabilities has also been sharpened for more reliable detection.

Improvements

  • Server-side transcript search. “Find in transcript” now searches all messages in a session server-side, so you can find matches in long sessions without paging through the entire transcript first. (dreadnode/dreadnode-tiger#1576)
  • Web-security skill quality pass. 28 web-security skills were updated with improved tooling, executable commands, and validation checkpoints — average skill score raised from 86% to 92% across all 58 skills. (dreadnode/capabilities#31)
  • Credits and pricing explained inline. Credits and pricing are now explained across the signup page, TUI status bar, TUI help panel, and docs Quickstart — each surface links to the canonical Credits docs. (dreadnode/dreadnode-tiger#1575)
  • Task verification method now visible in the UI. Task verification method (script, flag, judge, or compound) is now shown in both the environment task detail view and evaluation sample detail view. (dreadnode/dreadnode-tiger#1596)
  • Consistent activity awareness across views. Status indicators, severity tags, and activity awareness are now more consistent across assessments, evaluations, training, and optimization views. (dreadnode/dreadnode-tiger#1611)

Fixes

  • AI red-teaming attack workflows no longer fail with an SDK config error. Analytics results are now persisted correctly, and tool errors surface as clean messages instead of raw tracebacks. (dreadnode/capabilities#34)
  • Multi-study AIRT runs now export all traces. In multi-study workflows (transform comparisons, campaigns, category sweeps), all study traces are now exported — not just the first study’s. Applying a single transform no longer adds an unwanted baseline run. (dreadnode/capabilities#33)
  • Session transcripts no longer truncate at 2,000 messages. Sessions with more than 2,000 messages now load all messages via pagination — previously the most recent messages were silently dropped. (dreadnode/dreadnode-tiger#1574)
  • TUI login no longer gets stuck in a LiteLLM key provisioning loop. Connecting to the platform no longer fails with a repeated “Failed to provision LiteLLM Key” error. (dreadnode/dreadnode-tiger#1590)
  • SDK and platform now enforce the same task-validation rules. Tasks that pass dn task validate locally will no longer be rejected with an opaque 400 error on upload. (dreadnode/dreadnode-tiger#1631)
  • generate_category_attack no longer fails with “Unknown attack: ‘t’” errors. All input formats for attack names — string, list, or comma-separated — now work correctly. (dreadnode/capabilities#35)
  • flag_and_judge and script_and_judge verification methods now accepted by SDK validation. This fixes 248 task definitions that were previously rejected incorrectly by dn task validate. (dreadnode/dreadnode-tiger#1582)
  • Session headers now show accurate message counts and reports. Active vs. total message counts are shown for compacted sessions, and the Reports tab correctly surfaces all reports from the full transcript. (dreadnode/dreadnode-tiger#1618)
  • TUI model browser now shows context and credit cost for hosted models. dn/* model rows no longer display - for context window size and price when catalog metadata is available. (dreadnode/dreadnode-tiger#1583)
  • TUI hosted model list no longer shrinks after restart. New models deployed to the platform are now visible without having to delete ~/.dreadnode/proxy-models.json. (dreadnode/dreadnode-tiger#1578)
  • TUI /models screen respects admin-configured model order. Models are now displayed in the order set in the admin UI instead of being re-sorted alphabetically. (dreadnode/dreadnode-tiger#1581)
  • Evaluation tool-use lists now show the actual skill name. The literal word skill no longer appears in place of the loaded skill name (e.g. agent-browser). (dreadnode/dreadnode-tiger#1609)
  • Running and failed status dots are now visually distinct. Running tasks in the evaluations view show a blue dot instead of orange-red, making them clearly distinguishable from failed (red) at a glance. (dreadnode/dreadnode-tiger#1603)
  • Selected items in sidebar panels now show a visible highlight. Evaluations, AIRT assessments, training runs, and trace-viewer runs now display a brand-tinted highlight on the active selection instead of blending into the background. (dreadnode/dreadnode-tiger#1579)

May 2026

May 28, 2026

Android APK research capability and issue-tracker connectors for web-security

8 new 3 improved 11 fixed

New capability bundles for Android APK research and issue-tracker integrations land alongside a full sweep of assistant-led CLI guides across the platform.

New

  • Android APK research capability. A new android-apk-research capability bundle ships with a 10-tool MCP server and four skills for static semantic vulnerability research on Android APKs. (dreadnode/capabilities#29)
  • Jira connector for web-security. Web-security agents can now export validated findings directly to Jira Cloud as remediation tickets via a new Jira MCP connector. (dreadnode/capabilities#23)
  • GitHub connector for web-security. A new GitHub MCP connector lets web-security agents file findings as GitHub issues and add follow-up comments without leaving the agent workflow. (dreadnode/capabilities#24)
  • Run artifact logging tools. Web-security agents can now attach screenshots, audio, video, and file artifacts directly to a run using four new path-based logging tools: log_image_output, log_audio_output, log_video_output, and log_file_artifact. (dreadnode/capabilities#21)
  • waymore in web-security runtime. The web-security capability runtime now includes waymore, enabling historical URL and response retrieval from Wayback Machine, CommonCrawl, OTX, URLScan, and VirusTotal for recon and JS archaeology. (dreadnode/capabilities#17)
  • IDOR/BOLA judge scorer rubric. A new built-in IDOR/BOLA rubric is available for the SDK’s default LLM judge scorer, with graduated pentest-aligned cross-boundary access criteria; the web-security scorer-reference skill documents how to compose it in task.yaml and SDK configs. (dreadnode/dreadnode-tiger#1558, dreadnode/capabilities#19)
  • Activity feed on the home page. The home page now shows a live activity feed with recent projects, sessions, evaluations, training jobs, and other workspace activity. (dreadnode/dreadnode-tiger#1569)
  • Administrators Can Now Control Which AI Models Members Use. Organization administrators can now grant each member access to a specific set of AI models, with the limits enforced automatically across the web app, API, and CLI.

Improvements

  • Assistant-led CLI guides across all surfaces. Every in-product CLI guide (capability, task/environment, dataset, model, evaluation, optimization, world manifest, trajectory, training) is rewritten into a two-tab layout with an ‘Ask an agent’ tab, accurate dn alias commands, and corrected flags. (dreadnode/dreadnode-tiger#1538, dreadnode/dreadnode-tiger#1541, dreadnode/dreadnode-tiger#1542, dreadnode/dreadnode-tiger#1543, dreadnode/dreadnode-tiger#1547, dreadnode/dreadnode-tiger#1559, dreadnode/dreadnode-tiger#1560, dreadnode/dreadnode-tiger#1562, dreadnode/dreadnode-tiger#1563)
  • CLI guide launcher button labels clarified. Launcher buttons across nine surfaces now use accurate action verbs (Create, Evaluate, Optimize, Train) instead of generic or incorrect labels. (dreadnode/dreadnode-tiger#1564)
  • AIRT and Traces visual language overhaul. The AIRT Traces view and shared TraceViewer now use consistent design-system primitives, unified span-category colors, and a redesigned trace control bar. (dreadnode/dreadnode-tiger#1536)

Fixes

  • Compound eval false negatives resolved. script_and_judge and flag_and_judge evaluation methods no longer report a failed sample when the agent successfully completes a task. (dreadnode/dreadnode-tiger#1545)
  • Large-trajectory judge staging no longer fails. Outcome-judge staging no longer errors on large eval trajectories; the sandbox now fetches the trajectory directly via session ID instead of having it serialized and shipped by the API. (dreadnode/dreadnode-tiger#1565)
  • Session transcripts no longer truncated. Session transcripts now load all messages via cursor pagination — previously, sessions with more than 2,000 messages were silently truncated, dropping the most recent ones. (dreadnode/dreadnode-tiger#1574)
  • dn task validate accepts compound verification methods. dn task validate now accepts flag_and_judge and script_and_judge, fixing validation failures across 248 tasks. (dreadnode/dreadnode-tiger#1582)
  • TUI Ctrl+C now copies to clipboard. In the agent TUI, Ctrl+C copies selected text to clipboard (with fallback to pbcopy/xclip/xsel/wl-copy); use Ctrl+Q to quit. (dreadnode/dreadnode-tiger#1566)
  • TUI hosted model list no longer shrinks after restart. New platform model deployments now appear in the TUI without manually deleting ~/.dreadnode/proxy-models.json. (dreadnode/dreadnode-tiger#1578)
  • TUI /models screen respects admin model ordering. The TUI /models screen now displays models in the order configured by your admin instead of re-sorting alphabetically. (dreadnode/dreadnode-tiger#1581)
  • TUI no longer crashes on mid-load screen navigation. The TUI no longer throws a NoMatches exception when navigating away from a screen while a background worker is still loading data. (dreadnode/dreadnode-tiger#1570)
  • TUI tool call meta line now visible. TUI tool calls correctly display the ↳ <summary> meta line beneath the tool name instead of showing a blank gutter. (dreadnode/dreadnode-tiger#1546)
  • Image files render correctly in the environment file viewer. Image files in the environment file viewer now render as images instead of garbled binary content. (dreadnode/dreadnode-tiger#1572)
  • Linear MCP credentials now passed correctly. LINEAR_API_KEY, LINEAR_ACCESS_TOKEN, and LINEAR_API_URL are now correctly forwarded to the Linear MCP server in the web-security capability. (dreadnode/capabilities#27)
May 21, 2026

Compound eval verification — script_and_judge and flag_and_judge

9 new 7 improved 18 fixed

Eval tasks now support compound verification strategies that chain mechanical checks with an LLM judge to close reward-hacking gaps.

New

  • Compound eval verification: script_and_judge and flag_and_judge. Two new verification methods let eval task authors run a script or flag check first, then invoke an outcome judge as a second line of defense against reward hacking — no custom rubric required. (dreadnode/dreadnode-tiger#1534, dreadnode/dreadnode-tiger#1537)
  • OutcomeJudge agentic verification. A new outcome_judge verification method lets eval tasks use an agentic LLM to inspect agent trajectories and emit pass/fail verdicts, alongside the existing script and flag methods; exposed via the new dn outcome-judge run CLI subcommand. (dreadnode/dreadnode-tiger#1475)
  • intent_plus_outputs_summary guard strategy. A new opt-in policy replaces raw tool outputs with LLM-generated summaries in judge prompts, reducing prompt-injection surface while preserving tool-call context. (dreadnode/dreadnode-tiger#1524)
  • Six new web app pentesting skills. ESI injection, gRPC-Web pentest, H2C/WebSocket smuggling, HTTP connection contamination, timing-attack recon, and XSLT injection are now available as knowledge distillations in the web-security capability. (dreadnode/capabilities#16)
  • waymore added to web-security capability runtime. Historical URL and response retrieval from Wayback Machine, CommonCrawl, OTX, URLScan, and VirusTotal is now available for recon and JS archaeology. (dreadnode/capabilities#17)
  • Free-text search in evaluation trajectories. Matches highlight inline and auto-expand collapsed tool calls, with the query synced to the URL. (dreadnode/dreadnode-tiger#1491)
  • Expanded AIRT goal categories. The AI red-teaming agent now surfaces all 15 goal categories — including reasoning_exploitation, supply_chain, and resource_exhaustion — instead of the previous incomplete list of 9. (dreadnode/capabilities#14)
  • Simplified AIRT workspace layout. The AI red-teaming capability now uses a cleaner ~/.dreadnode/airt/[org]/[workspace]/ path structure with clearer error messages, replacing the previous env-var-based path system. (dreadnode/capabilities#10)
  • New setting for org admins re: model usage. Organization admins can now control which AI models each member is allowed to use, set right from the org members admin page

Improvements

  • TUI keystroke performance. Keystroke routing is now ~400× faster in long conversations — typing in the composer no longer slows down as conversation history grows. (dreadnode/dreadnode-tiger#1507)
  • TUI event loop and grep tool overhaul. The TUI no longer stalls when sync tools run blocking calls; grep gains output modes, context lines, and smarter filtering; tool result summaries now reflect actual output. (dreadnode/dreadnode-tiger#1510)
  • TUI session commands rationalized. /clear and /reset both now start a fresh session (preserving the old one), and the destructive in-place wipe is removed. (dreadnode/dreadnode-tiger#1511)
  • Assistant-led CLI guide modals. The in-product CLI guides for capabilities, tasks, environments, and models are rewritten with an assistant-led flow, accurate commands, and an ‘Ask an agent’ tab that generates a copyable dn --prompt launcher. (dreadnode/dreadnode-tiger#1538, dreadnode/dreadnode-tiger#1541, dreadnode/dreadnode-tiger#1543)
  • Consistent empty states across the platform. Empty states now distinguish ‘nothing here yet’ from ‘no filter matches’, include inline docs links, and show ghost visuals that hint at what each surface will contain. (dreadnode/dreadnode-tiger#1498)
  • AIRT Details glossary consolidated. AIRT assessments now show a single glossary popover at the section header instead of three separate per-cell tooltips. (dreadnode/dreadnode-tiger#1493)
  • Consistent browser tab titles. All pages now show titles in the format {Page} | Dreadnode, replacing a mix of inconsistent formats and blank tabs. (dreadnode/dreadnode-tiger#1490)

Fixes

  • Compound eval verification false negatives resolved. script_and_judge and flag_and_judge no longer report failed results when the agent successfully completes a task. (dreadnode/dreadnode-tiger#1545)
  • AIRT CLI assessment creation now surfaces in the UI. dn airt run and dn airt run-suite now correctly connect to the platform, so runs and results appear in the assessment UI instead of silently disappearing or showing Assessment: None. (dreadnode/dreadnode-tiger#1499, dreadnode/dreadnode-tiger#1501)
  • Monitoring tab reports now display content. The Reports panel now shows the actual markdown report body instead of an S3 path pointer for reports over 10 KB. (dreadnode/dreadnode-tiger#1514)
  • Reports tab no longer stuck on ‘Loading report content…’. The Reports tab spinner no longer gets stuck after a deep-link reload when the report body was already cached. (dreadnode/dreadnode-tiger#1520)
  • Sandbox readiness timeout raised to 300 seconds. Capability-heavy sandboxes (e.g. web-security + zero-day-research) no longer time out during startup. (dreadnode/dreadnode-tiger#1503)
  • Cross-provider AIRT runs label the correct target model. The assessment UI now shows the target model instead of the attacker/orchestrator model for cross-provider runs. (dreadnode/dreadnode-tiger#1519)
  • Bundled capability disable state now persists. Toggling off bundled capabilities (e.g. self-improvement) in the TUI correctly disables them on reload instead of silently reverting to enabled. (dreadnode/dreadnode-tiger#1517)
  • Capability preflight checks now use the capability root as working directory. Checks referencing relative paths no longer fail with ‘No such file or directory’. (dreadnode/dreadnode-tiger#1516)
  • Invitation links show Sign In prompt for unauthenticated users. Invitation links no longer show ‘Invalid Invitation’ — unauthenticated visitors now see a Sign In / Create Account prompt instead. (dreadnode/dreadnode-tiger#1518)
  • BYOK OpenAI quota errors surface immediately. Agents using BYOK OpenAI no longer retry up to 8 times on insufficient_quota — the failure surfaces on the first attempt. (dreadnode/dreadnode-tiger#1512)
  • save_workflow now detects silent overwrites. The tool verifies file writes succeeded and warns when content is unchanged, preventing agents from operating on stale workflow scripts. (dreadnode/capabilities#15)
  • Deep links to evaluation samples no longer show ‘sample not found’. Navigating directly to a specific evaluation sample now resolves correctly. (dreadnode/dreadnode-tiger#1484)
  • Hosted sandboxes page shows accurate active counts. Sandbox counts are now backed by a facets API that aggregates state correctly. (dreadnode/dreadnode-tiger#1483)
  • Errored evaluation statuses tracked correctly. Errored statuses are now included in eval phase summaries instead of being silently dropped. (dreadnode/dreadnode-tiger#1523)
  • TUI tool meta line no longer drops. Tool calls now display the ↳ <summary> meta line beneath the tool name instead of showing a blank gutter. (dreadnode/dreadnode-tiger#1546)
  • TUI spinner stays visible when Esc drains a queued message. The spinner no longer disappears while a queued message is still running. (dreadnode/dreadnode-tiger#1513)
  • TUI tool output wrapping stays within the gutter. Wrapped lines at narrow terminal widths now stay aligned under the gutter border instead of leaking to column 0. (dreadnode/dreadnode-tiger#1522)
  • Docs code blocks render correct spacing. CLI commands like dn --capability now display with correct spacing in docs code blocks (Geist Mono ligatures were collapsing the space before --). (dreadnode/dreadnode-tiger#1500)
May 14, 2026

Free-text search in evaluation trajectories

4 new 8 improved 15 fixed

Free-text search lands in hosted evaluation trajectories this week, alongside session rewind, cross-org capability share links, and sandbox dependency auto-install.

New

  • Free-text search in evaluation trajectories. Matches are highlighted inline in hosted evaluation trajectories and tool calls auto-expand when their content matches the search term. (dreadnode/dreadnode-tiger#1491)
  • Session rewind. Roll back a conversation to any prior user message via /rewind or double-Esc in the TUI — the full session history is preserved. (dreadnode/dreadnode-tiger#1436)
  • Cross-org capability share links. Capability hub links now work across orgs — recipients are automatically redirected to the right page regardless of their org slug. (dreadnode/dreadnode-tiger#1487)
  • Sandbox dependency auto-install. Sandbox runtimes can now automatically install Python packages and system dependencies declared in capability manifests at load time. (dreadnode/dreadnode-tiger#1458)

Improvements

  • AIRT auto-discovery with updated capability counts. The AIRT CLI now auto-discovers attacks, transforms, and scorers at runtime (61 attacks, 547 transforms, 141 scorers), and capability counts in the hub accurately reflect these results. (dreadnode/dreadnode-tiger#1459, dreadnode/capabilities#6)
  • Simplified AIRT workspace layout. The AI red-teaming capability uses a cleaner ~/.dreadnode/airt/[org]/[workspace]/ workspace structure with improved error messages and input validation. (dreadnode/capabilities#10)
  • AIRT docs cover all result-review surfaces. AIRT docs and CLI help text now document all result-review paths — overview dashboard, per-assessment view, trace view, and the custom report builder — not just CLI commands. (dreadnode/dreadnode-tiger#1445)
  • Settings sidebar grouped into Personal vs Organization. Secrets and Chat Models are now clearly labeled as Personal, separate from Organization settings. (dreadnode/dreadnode-tiger#1474)
  • ATIF trajectory exports include source lineage. Exported trajectories now carry an extra.dreadnode block with org, workspace, project, and optional evaluation/item IDs for traceability. (dreadnode/dreadnode-tiger#1450)
  • AIRT assessments Details glossary consolidated. The Details section now shows a single glossary popover at the section header instead of three separate per-cell tooltips. (dreadnode/dreadnode-tiger#1493)
  • Optimization job sidebar shows author email. The optimization job sidebar now displays the author’s email instead of a raw user UUID, consistent with Evaluation and Training job lists. (dreadnode/dreadnode-tiger#1454)
  • Task name copy button. A copy-to-clipboard button next to the task name in the environment detail drawer makes it easier to grab long task names for CLI commands or evaluation setup. (dreadnode/dreadnode-tiger#1072)

Fixes

  • AIRT CLI runs now appear in the platform UI. Assessment creation was silently succeeding without connecting to the platform; AIRT CLI runs now correctly appear in the UI and analytics. (dreadnode/dreadnode-tiger#1499)
  • ModuleNotFoundError in UV tool and managed Python environments resolved. AI red-teaming workflows no longer fail with ModuleNotFoundError: No module named dreadnode in UV tool, container, and managed Python environments. (dreadnode/dreadnode-tiger#1476, dreadnode/dreadnode-tiger#1479, dreadnode/capabilities#7)
  • AIRT --project flag now correctly scopes assessments. The --project flag no longer silently falls back to default — assessments and trace exports are scoped to the specified project. (dreadnode/dreadnode-tiger#1452)
  • TUI context gauge shows last-generation tokens. The context gauge now shows last-generation input tokens (e.g. 800k/1M) instead of a cumulative sum that could exceed the model’s context limit. (dreadnode/dreadnode-tiger#1451)
  • TUI tool output and XMLModel serialization fixed. TUI tool output no longer gets clipped after completion, and XMLModel serialization no longer crashes on NUL bytes or ANSI escape sequences. (dreadnode/dreadnode-tiger#1457)
  • ASR values display consistently as percentages. Attack Success Rate values now render as percentages across all AIRT views — no more mixed decimal/percentage formatting. (dreadnode/dreadnode-tiger#1489)
  • All 15 AIRT goal categories now surfaced. The AI red-teaming agent now offers all 15 goal categories (including reasoning_exploitation, supply_chain, and resource_exhaustion) instead of only 9. (dreadnode/capabilities#14)
  • AIRT severity reference documents all five levels. Critical, High, Medium, Low, and Info are now all defined — previously only Critical and High were documented. (dreadnode/dreadnode-tiger#1444)
  • Trace copy button restored on Safari. The trace copy button (rich and log views) now works correctly on Safari after breaking in v2.0.20. (dreadnode/dreadnode-tiger#1469)
  • AIRT report generation no longer floods UI with toasts. Generating a report no longer triggers an infinite loop of ‘Report build failed’ toasts that locked the UI. (dreadnode/dreadnode-tiger#1470)
  • Onboarding username now names the default workspace. Your chosen username during onboarding correctly names your default workspace instead of an autogenerated placeholder. (dreadnode/dreadnode-tiger#1473)
  • Deep links to evaluation samples no longer show ‘sample not found’. Navigating directly to a specific evaluation sample via deep link now loads correctly. (dreadnode/dreadnode-tiger#1484)
  • projects command in TUI scoped to current workspace. The projects TUI command now loads projects for the active workspace only. (dreadnode/dreadnode-tiger#1464)
  • ProcessJudge no longer crashes on rubrics mentioning .yaml. Plaintext rubrics that reference a .yaml filename no longer trigger OSError: File name too long. (dreadnode/dreadnode-tiger#1465)
  • Docs code blocks render -- flags with correct spacing. Geist Mono ligatures are disabled in docs code blocks, so dn --capability no longer renders as dn--capability. (dreadnode/dreadnode-tiger#1500)
May 7, 2026

AIRT Report Builder now generally available with templates, branding, and DOCX export

13 new 11 improved 18 fixed

The AIRT Report Builder graduates from feature-flagged preview to general availability this week, bringing saved templates, custom branding, DOCX export, and a live structural preview to all users.

New

  • AIRT Report Builder GA. The AIRT Reports tab is now visible to all users by default; the builder supports saved templates (load/save/rename/delete), custom branding (logo, company name, title, confidentiality footer), DOCX export, and a live structural preview panel. (dreadnode/dreadnode-tiger#1355, dreadnode/dreadnode-tiger#1385)

  • Trace-analysis advisor skill. A new AI red teaming skill surfaces attack-effectiveness analysis, transform recommendations, and vulnerability pattern identification from historical run data directly inside the agent. (dreadnode/capabilities#4)

  • GuardSessionPolicy with LLM-judged tool gating. New GuardSessionPolicy gates every agent tool call through an LLM judge before execution, with a built-in safety rubric and optional custom rules. (dreadnode/dreadnode-tiger#1374)

  • Guard policy transcript strategies. GuardSessionPolicy now supports five transcript strategies (rubric_only, intent_only, intent_plus_calls, full, and more); the default is now intent_plus_calls for richer judge context. (dreadnode/dreadnode-tiger#1437)

  • ATIF v1.7 trajectory export and dn session CLI. A new export endpoint converts session transcripts to ATIF v1.7 or OpenAI Chat Completions format; dn session CLI command group and a download menu in the session UI let you browse, inspect, and export sessions. (dreadnode/dreadnode-tiger#1434)

  • Persistent TUI welcome panel and getting-started skill. The TUI welcome screen is now a persistent panel showing profile, runtime, and capability context; a new getting-started skill routes new users to the right capability or task automatically. (dreadnode/dreadnode-tiger#1372)

  • dn capability uninstall, session lifecycle actions, and rewritten session picker. New dn capability uninstall command, session archive/freeze/delete actions in the TUI, and a rewritten session picker with richer status columns and inline lifecycle controls. (dreadnode/dreadnode-tiger#1388)

  • CLI conventions overhaul. New dn inference-model (alias dn llm), dn secret list, and dn environment wait commands added; dn env renamed to dn environment with alias retained; destructive-action --yes/-y confirmation standardized across all commands. (dreadnode/dreadnode-tiger#1376)

  • dn update CLI command. You can now update the Dreadnode CLI directly from the terminal with dn update. (dreadnode/dreadnode-tiger#1438)

  • Admin-configurable featured catalog models. Platform admins can configure which models appear as featured in the catalog via a new admin UI, without code changes or redeployments. Manual entry of preview or unreleased model IDs is also supported. (dreadnode/dreadnode-tiger#1364, dreadnode/dreadnode-tiger#1369)

  • TUI inline report rendering with web deep-link. The TUI now renders report tool calls inline with full markdown content, a smart title, and a clickable “View in web” link that opens the platform Reports tab for the active session. (dreadnode/dreadnode-tiger#1358)

  • Session tool count and cost stats. Agent session views and the TUI footer now show tool call count and estimated USD cost alongside message count and token usage. (dreadnode/dreadnode-tiger#1348)

  • Capability hooks documentation. New documentation covers lifecycle events, observers, and gating patterns for building agents with capability hooks. (dreadnode/dreadnode-tiger#1401)

Improvements

  • AIRT analytics performance. AIRT analytics pages now load in 4–5 seconds on first visit and under 1 second on repeat visits, replacing a previous infinite-loop hang; per-section error indicators surface when analytics data is malformed instead of silently rendering blank charts. (dreadnode/dreadnode-tiger#1422, dreadnode/dreadnode-tiger#1435)

  • ask_user structured questions and cancel flow. The ask_user tool now supports structured multi-question bundles, explicit cancel flows (raising UserCancelled), and a new interactive TUI widget with tab navigation, multi-select, and Esc-to-cancel. (dreadnode/dreadnode-tiger#1387)

  • Artifact deep-linking. Datasets, environments, models, and capabilities now have dedicated URLs you can share or bookmark directly. Training job selections are also reflected in the URL (?job=<id>) for the same purpose. (dreadnode/dreadnode-tiger#1353, dreadnode/dreadnode-tiger#1370)

  • ATIF trajectory source lineage. Exported ATIF trajectories now include an extra.dreadnode block with origin and {id, key} pairs for organization, workspace, and project, so downstream consumers can trace data back to its origin. (dreadnode/dreadnode-tiger#1450)

  • AIRT assessments page visual and UX refresh. The Assessments page now matches the Overview’s design system with shared UI primitives, semantic tokens, and URL-driven assessment selection (right-click to open in new tab works). (dreadnode/dreadnode-tiger#1409)

  • AIRT report template button labels clarified. Template save buttons now read “Save as template”, “Save as new template”, and “Update template” instead of the ambiguous “Save as new” / “Update”. (dreadnode/dreadnode-tiger#1406)

  • TUI session title in context bar. The TUI context bar now shows the session title (truncated at 40 chars) instead of the 8-character hex ID. (dreadnode/dreadnode-tiger#1352)

  • Legacy client-side PDF export removed. The “Export PDF Report” button on the AIRT overview page has been removed; the new Reports tab handles PDF and all other export formats. (dreadnode/dreadnode-tiger#1386)

  • Notification badges on login. Pending org invitation badges now appear in the nav bar and Account Settings immediately on login or page refresh. (dreadnode/dreadnode-tiger#1393)

  • AIRT guidance expanded in TUI and docs. TUI help text, CLI --help, and docs now surface all web app review paths — overview dashboard, per-assessment view, trace view, and custom report builder. (dreadnode/dreadnode-tiger#1445)

  • Brand accent color and notification badge component. Numeric notification badges (invitations, active filters, unread events) now use a consistent BadgeAlert component, and the brand accent color has been updated to #FF6B3D with improved WCAG AA contrast. (dreadnode/dreadnode-tiger#1375)

Fixes

  • AIRT analytics data accuracy. Attack success rates, transform usage, trial counts, and chart data now reflect accurate totals — a series of regressions caused rates to show as ~64% when the actual value was ~100%, and large projects (500+ assessments) to time out with empty dashboards. (dreadnode/dreadnode-tiger#1417, dreadnode/dreadnode-tiger#1418, dreadnode/dreadnode-tiger#1419, dreadnode/dreadnode-tiger#1420, dreadnode/dreadnode-tiger#1425)

  • SDK dependency conflict resolved (2.0.19). SDK versions 2.0.16–2.0.18 were uninstallable via uv tool install or pipx install due to a litellm/fastmcp dependency conflict; 2.0.19 resolves this. (dreadnode/dreadnode-tiger#1443)

  • AIRT PDF/DOCX report fidelity fixes. PDF reports now include ML metadata fields (transforms_applied, original_class, adversarial_class, distance_value) previously missing from exports; DOCX reports render in landscape layout with correct column widths; truncated-column footers point to CSV export and show the correct additional-column count. (dreadnode/dreadnode-tiger#1411, dreadnode/dreadnode-tiger#1415, dreadnode/dreadnode-tiger#1407, dreadnode/dreadnode-tiger#1408)

  • TUI context gauge shows correct token count. The context gauge now shows last-generation input tokens (e.g. 800k/1M) instead of a cumulative sum that incorrectly exceeded the model’s context limit. (dreadnode/dreadnode-tiger#1451)

  • AIRT --project flag now scopes correctly. The AIRT CLI --project flag correctly scopes assessments and trace exports to the specified project instead of silently resolving to default. (dreadnode/dreadnode-tiger#1452)

  • AIRT compliance coverage table fixed. The Attacks Used and Trials columns now display real data; compliance framework rows expand and collapse correctly on every click; re-clicking the already-active project no longer flashes “not found”. (dreadnode/dreadnode-tiger#1414, dreadnode/dreadnode-tiger#1413, dreadnode/dreadnode-tiger#1412)

  • Self-improvement reflector hook restored. The self-improvement reflector hook now correctly calls Agent.run() instead of the removed Agent.chat(), restoring the feedback loop after failed agent turns. (dreadnode/dreadnode-tiger#1377)

  • TUI zero-balance error message. The TUI now shows “Insufficient credits — top up your org or switch to a BYOK model” instead of a misleading sign-in/provisioning error when the org has zero balance. (dreadnode/dreadnode-tiger#1402)

  • TUI model browser keyboard navigation. The TUI model browser now responds to arrow keys immediately on open — no mouse click required. (dreadnode/dreadnode-tiger#1403)

  • TUI /update false “version unchanged” warning fixed. The /update command no longer shows a false “version unchanged” warning after a successful update when multiple dn installations exist. (dreadnode/dreadnode-tiger#1396)

  • Cross-org public dataset access. Public datasets from other orgs now open and download correctly instead of returning 404 errors. (dreadnode/dreadnode-tiger#1405)

  • AIRT report download fallback for Safari iOS. The AIRT report toast now includes a Download link so users whose browser blocked the auto-download can retrieve their report without re-running generation. (dreadnode/dreadnode-tiger#1382)

  • AIRT “Total Findings” count corrected. The AIRT overview “Total Findings” card now shows the correct total count instead of the current page size (20). (dreadnode/dreadnode-tiger#1404)

  • Dataset pull command fixed. The dn dataset pull command now works correctly. (dreadnode/dreadnode-tiger#1390)

  • Members and Workspaces tables sort correctly. Clicking column headers in the Members and Workspaces settings tables now sorts rows as expected. (dreadnode/dreadnode-tiger#1440)

  • AIRT severity levels documented. The AIRT severity reference now documents all five levels (Critical, High, Medium, Low, Info) — previously only Critical and High were defined. (dreadnode/dreadnode-tiger#1444)

  • System prompts visible in session transcripts. Session transcripts now display system prompts in a collapsible disclosure panel. (dreadnode/dreadnode-tiger#1394)

  • Docs copy button alignment fixed. The copy button on terminal code blocks in the docs site now stays inside the code block. (dreadnode/dreadnode-tiger#1439)