July 2, 2026

Claude Code engine — run Claude's agentic loop as a governed, session-backed agent

6 new 5 improved 14 fixed

Agents get a pluggable execution engine this week, with Claude Code shipping as the first built-in option alongside a full suite of transcript fidelity and eval reliability fixes that make it production-ready.

New

Pluggable agent engine + Claude Code harness. Agents now support swappable execution engines; the built-in Claude Code engine runs Claude’s agentic loop as a first-class, governed, session-backed agent with full trace viewer support. (dreadnode/dreadnode-tiger#1768, dreadnode/dreadnode-tiger#1816, dreadnode/dreadnode-tiger#1819, dreadnode/dreadnode-tiger#1820, dreadnode/dreadnode-tiger#1822)
Task Sets. Tasks can now be grouped into named, versioned collections and run as a single evaluation target via the API, dn task-set CLI commands, or the Environments UI. (dreadnode/dreadnode-tiger#1826)
User-controlled task-environment models. You can now choose which model a task’s environment (defender) uses per role via --env-model role=id in the CLI, model_overrides in evaluation.yaml, or the API — independently of the solver and judge models. (dreadnode/dreadnode-tiger#1839)
Evaluation sample search. You can now search across all samples in an evaluation by free text — matches are highlighted with per-sample counts, and a “Matches only” filter collapses non-matching samples. (dreadnode/dreadnode-tiger#1810)
Blind SQL injection extraction skill. A new blind SQLi extraction skill and tool in the web-security capability supports boolean/timing-based oracle identification, WAF bypass patterns, and automated char-by-char extraction. (dreadnode/capabilities#61)
Blind SSRF chains with attacker infrastructure provisioning. The blind-SSRF-chains skill now guides agents to detect available cloud CLIs and provision attacker infrastructure (S3 buckets, redirect servers) for SSRF proof-of-concept evidence. (dreadnode/capabilities#65)

Improvements

Task-set provenance on evaluation detail. Evaluation detail now shows the linked task-set org/name and a collapsible list of skipped members with reasons when an eval was run from a task set. (dreadnode/dreadnode-tiger#1851)
Usage summary in headless print mode. dn --print now outputs a model, tokens, tool calls, and cost summary to stderr after each run, matching the cost visibility available in the TUI. (dreadnode/dreadnode-tiger#1801)
Sub-agent costs in TUI footer. The TUI footer now shows sub-agent LLM costs as a separate “subagents $X.XX” segment so total spend is no longer understated when using spawned sub-agents. (dreadnode/dreadnode-tiger#1834)
Bare --resume opens session picker. dn --resume (or -r) with no argument now opens the session picker instead of erroring, matching the UX of Codex and Claude. (dreadnode/dreadnode-tiger#1836)
Chat runtime reliability overhaul. Sandbox state management is simplified and the chat page handles runtime lifecycle more reliably, reducing spurious disconnects and state loss. (dreadnode/dreadnode-tiger#1809)

Fixes

AIRT assessments no longer stuck in pending/running. Assessments stuck indefinitely are now automatically finalized after a configurable timeout (default 2h), and a one-time migration clears the existing backlog. (dreadnode/dreadnode-tiger#1813)
AIRT overview page 500 errors resolved. The AIRT project summary endpoint no longer intermittently 500s (~58/hr) due to a closed database transaction. (dreadnode/dreadnode-tiger#1811)
Eval task timeout now respects org ceiling. Evals no longer fail at provisioning when --task-timeout-sec is unset; the runtime is capped to your org’s ceiling, and an over-ceiling value returns a clear 400. (dreadnode/dreadnode-tiger#1815)
dn/* model IDs work in AIRT runs. All three AIRT model roles (attacker, judge, and target) now correctly route through the LiteLLM proxy instead of failing with “LLM Provider NOT provided.” (dreadnode/capabilities#66, dreadnode/dreadnode-tiger#1840, dreadnode/dreadnode-tiger#1845)
TUI conversation scrolling is smooth. The viewport no longer jumps when new messages arrive, and scroll performance is significantly improved for long conversations (~129 ms/step → ~47 ms). (dreadnode/dreadnode-tiger#1824)
TypeScript files no longer sent as video. The SDK read tool no longer crashes generation when reading .ts/.mts files — they’re now correctly identified as text. (dreadnode/dreadnode-tiger#1797)
Image content no longer fails text-only models. Text-only models (e.g. via OpenRouter) no longer fail with a 404 when a tool returns image content — the SDK substitutes a textual description and retries automatically. (dreadnode/dreadnode-tiger#1804)
MCP tool calls retry transient failures. MCP tool calls now retry once on dropped connections or subprocess crashes instead of immediately marking the server unavailable. (dreadnode/dreadnode-tiger#1823, dreadnode/capabilities#62)
MCP shutdown no longer emits noisy warnings. The MCP client no longer logs “Task exception was never retrieved” on shutdown when the subprocess or remote server has already exited. (dreadnode/dreadnode-tiger#1803)
Unique browser session names required for web-security agents. Concurrent web-security agents no longer clobber each other’s browser state — the --session <name> flag is now required. (dreadnode/capabilities#64)
Agents page defaults to no session grouping. The sessions page now defaults to no grouping instead of grouping by workflow instance; workflow grouping remains available in the selector. (dreadnode/dreadnode-tiger#1854)
Claude Code engine transcript fidelity. Transcripts now show the initiating user message and correctly link tool call outputs to their calls; orphaned tool results and lost calls from burst-emit engines are fixed. (dreadnode/dreadnode-tiger#1820, dreadnode/dreadnode-tiger#1822)
Engine-override eval sessions no longer hang. Evaluations using --engine claude-code no longer wait indefinitely for tool approval that never arrives. (dreadnode/dreadnode-tiger#1819)
Eval task timeout respects org ceiling. Evals with an unset --task-timeout-sec no longer fail at provisioning due to exceeding the org’s 6h ceiling. (dreadnode/dreadnode-tiger#1815)