Skip to content

June 11, 2026

Session and Trace Views Now Display Agent-Captured Images, Audio, and Video

← All updates

Session and Trace Views Now Display Agent-Captured Images, Audio, and Video

11 new 14 improved 16 fixed

Sandbox runtime limits now resolve through a four-level hierarchy configurable by platform admins, org owners, and individual users — alongside a concentrated push to eliminate evaluation sandbox timeouts that were causing widespread ‘sandbox was not found’ failures.

New

  • Session and Trace Views Now Display Agent-Captured Images, Audio, and Video Analysts can now view the images, audio, and video an agent logged during a run directly in the session transcript and trace details, rather than as raw base64. Each item stays behind an explicit Show media click, so potentially hostile artifacts are never decoded until an analyst chooses to open them.
  • Sandbox runtime limit hierarchy. Sandbox runtime limits now resolve through a four-level hierarchy (platform max → org max → org default → user-requested), configurable by platform admins, org owners, and individual users across the UI, API, SDK, and TUI. (dreadnode/dreadnode-tiger#1651)
  • Attack surface management capability. A new attack surface management capability combines BBOT scanning, Shodan enrichment, Neo4j graph analysis, screenshot triage, and a multi-agent ASM pipeline. (dreadnode/capabilities#43)
  • HTTP/2 WAF bypass skill. New h2-waf-bypass skill in the web-security capability covers 6 bypass classes including delayed DATA frame timing, body size truncation, Extended CONNECT, ForwardAuth body stripping, path normalization, and JSON content-type gaps. (dreadnode/capabilities#38)
  • GraphQL penetration testing skill. New graphql-pentest skill covers endpoint discovery, introspection mining, resource abuse (CWE-400, CWE-674), content-type CSRF, and capability matrix testing. (dreadnode/capabilities#39)
  • Vulnerability assessment methodology capability. A new standalone vuln-assessment-methodology capability provides a reusable severity matrix, disprove-first rules, and reporting standards that any security capability can load. (dreadnode/capabilities#40)
  • Dashboard workspace toggle. The dashboard now has a toggle to filter activity between your personal workspace and the full organization view. (dreadnode/dreadnode-tiger#1619)
  • --judge-model override for evaluation creation. New --judge-model flag on dn evaluation create lets you override the task-level judge model at eval creation time without modifying task.yaml. (dreadnode/dreadnode-tiger#1662)
  • Pinned smoke solution image for dn task validate. dn task validate --smoke now runs solution and verify scripts in a pinned container that mirrors the production agent runtime, so smoke results match production behavior. (dreadnode/dreadnode-tiger#1659)
  • ProcessJudge prefix caching. ProcessJudge now supports prefix caching (opt-in cache flag), cutting LLM costs on long judge sessions by billing repeated calls at cache-read rates; GuardSessionPolicy enables it by default. (dreadnode/dreadnode-tiger#1655)
  • Automated Cloud Assessments Now Measure the Impact of Exposed AWS Credentials Security teams can now confirm how far an exposed AWS credential or instance-metadata finding could be taken, with the platform’s agents running authorized AWS exploitation checks to validate real-world impact.

Improvements

  • Secrets and chat models moved to Account Settings. Secrets and chat model settings have moved from org settings pages into the Account Settings modal, with deep-link support via URL hash (e.g. #account-settings/secrets). (dreadnode/dreadnode-tiger#1677)
  • Sandboxes view shows workload context. The sandboxes view now shows evaluation, task, and agent context for each sandbox, making it easier to trace which workload a sandbox belongs to. (dreadnode/dreadnode-tiger#1656)
  • Default agent timeout doubled to 1 hour. Default agent timeout for evaluations increased from 30 minutes to 1 hour, giving agents more time to complete complex multi-step tasks like Cybench challenges. (dreadnode/dreadnode-tiger#1676)
  • In-flight eval samples pinned to top. In-flight eval samples now stay pinned at the top of the samples pane during concurrent runs, so active samples no longer get buried in the list. (dreadnode/dreadnode-tiger#1670)
  • Actor chip in activity feed. The activity feed on the home page now shows an actor chip identifying who performed each action. (dreadnode/dreadnode-tiger#1685)
  • Session starter visibility in agent sessions. Agent sessions list now shows who or what started each session, making it easier to identify the origin of a run at a glance. (dreadnode/dreadnode-tiger#1687)
  • ASR by Category chart in Assessment Reporting. The ASR by Category card in Assessment Reporting now renders a bar chart in Charts mode, matching the ASR by Attack and ASR by Transform cards. (dreadnode/dreadnode-tiger#1691)
  • Job-status discs now have accessible labels. Job-status discs across evaluations, training, optimization, and worlds now show hover tooltips and ARIA labels so their color and animation meaning is clear to all users. (dreadnode/dreadnode-tiger#1654)
  • /agents TUI listing cleaned up. The /agents command in the TUI now groups agents under capability headings with aligned columns, an active-agent marker, and a footer hint — and correctly displays on the home screen. (dreadnode/dreadnode-tiger#1628)
  • ‘Add Custom Secret’ button moved to top. The ‘Add Custom Secret’ button on the Secrets settings page now appears at the top of the section, so you no longer need to scroll past existing secrets to reach it. (dreadnode/dreadnode-tiger#1664)
  • Smoke test now validates multi-file task solutions. Smoke test runner stages the full task directory into the agent sandbox, so multi-file solutions referencing sibling scripts or assets validate correctly. (dreadnode/dreadnode-tiger#1673)
  • dn task validate --smoke covers judge-backed verification methods. dn task validate --smoke now runs the mechanical verification steps (solution + verify) for script_and_judge and flag_and_judge tasks instead of silently skipping them. (dreadnode/dreadnode-tiger#1652)
  • Project selector persists across navigation. The project selector now persists across pages — navigating from agent sessions to Charts or Analytics no longer resets your selected project. (dreadnode/dreadnode-tiger#1689)
  • Expired Verification Links Can Now Be Resent in One Click Users whose email verification link has expired can now request a fresh one in a single click, directly from the recovery page.

Fixes

  • Evaluation sandbox timeouts eliminated. A cluster of fixes ensures eval sandboxes now correctly honour their configured runtime limit: the rebalancer no longer resets timeouts to the org default, org ceiling validation is bypassed for eval-managed sandboxes, and explicit runtime limits are passed through from evaluation service to sandbox. (dreadnode/dreadnode-tiger#1680, dreadnode/dreadnode-tiger#1681, dreadnode/dreadnode-tiger#1682, dreadnode/dreadnode-tiger#1683)
  • Cybench eval port connectivity fixed. Cybench evaluations no longer return 502 ‘port not open’ errors; sandbox timeout increased to 30 minutes and eval concurrency raised to 10; cold Docker Compose builds no longer expire the sandbox mid-run. (dreadnode/dreadnode-tiger#1669, dreadnode/dreadnode-tiger#1671)
  • Evaluation queue starvation and sandbox provisioning failures fixed. Evaluation runs no longer queue-starve for hours behind large evals in the same org; env sandbox provisioning timeout raised to 300s to prevent ~30% of runs from failing. (dreadnode/dreadnode-tiger#1638)
  • AI red-teaming SDK config regression and analytics restored. AI red-teaming attack workflows no longer fail with a misleading SDK config error, and analytics results are now correctly persisted and surfaced instead of reporting false failures. (dreadnode/capabilities#34)
  • generate_category_attack accepts all input formats. The skill no longer fails with cryptic ‘Unknown attack: t’ errors when passing attack names as strings — string, list, and comma-separated inputs all now work. (dreadnode/capabilities#35)
  • dn task validate and API now enforce the same rules. Tasks that pass local validation will no longer be rejected with an opaque 400 on upload. (dreadnode/dreadnode-tiger#1631)
  • OAuth buttons respond on first click. “Continue with Google” and “Continue with GitHub” buttons now respond on the first click, previously dead due to a hydration race on the sign-in/sign-up form. (dreadnode/dreadnode-tiger#1645)
  • Training runs honour the epochs parameter. The epochs parameter for Tinker SFT is no longer silently ignored — jobs previously always ran 100 steps regardless of dataset size or epoch setting. (dreadnode/dreadnode-tiger#1640)
  • Training run charts no longer show gaps in validation loss. The SDK no longer emits a None-padded val_loss series, and historical runs are handled gracefully. (dreadnode/dreadnode-tiger#1636)
  • Gemini models no longer fail with authentication errors. Models returning null cache usage fields from LiteLLM (e.g. gemini/gdm-eval-model-lis) no longer fail with a spurious authentication error. (dreadnode/dreadnode-tiger#1641)
  • read tool returns images as vision input. The read tool now returns image files as ContentImageUrl so VLMs receive proper vision input instead of a raw base64 text blob. (dreadnode/dreadnode-tiger#1653)
  • Catalog page filter buttons load immediately. Filter buttons on Environments, Capabilities, Datasets, and Models catalog pages now appear immediately instead of taking minutes to load. (dreadnode/dreadnode-tiger#1696)
  • Queued and running eval statuses now show distinct colors. Queued/pending returns to gray while running stays blue — previously both showed the same blue. (dreadnode/dreadnode-tiger#1661)
  • Cancelled evaluations no longer leave samples spinning. Cancelled evaluations no longer leave sample rows spinning indefinitely — statuses now correctly reflect the cancelled state. (dreadnode/dreadnode-tiger#1672)
  • Legacy runtime timeout no longer triggers ceiling errors. Evaluations no longer fail with ‘requested_runtime_limit_seconds exceeds platform ceiling’ when you didn’t explicitly set a runtime limit. (dreadnode/dreadnode-tiger#1663)
  • Unused runtime limit set silently respected by evaluations. The platform no longer incorrectly treated internal timeout values as explicit user runtime requests. (dreadnode/dreadnode-tiger#1663)