Custom engines

Run your own Claude Agent SDK loop on Dreadnode — wrap your orchestration as an engine and keep sessions, scoring, and evaluation.

A built-in engine like claude-code runs a standard Claude Code agent that the platform configures. When you already have your own agent loop — multi-phase orchestration, custom tool dispatch, logic between turns, written against the Claude Agent SDK — you wrap that loop as a custom engine instead. Your code keeps running; sessions, scoring, and evaluation ride on the events it emits.

A custom engine is an AgentEngine whose run_loop runs your code and yields native events as it goes.

Your existing loop

Say you have a two-phase pentest agent. The phase logic between turns is your code:

import claude_agent_sdk as sdk


async def run_pentest(target_url: str) -> str:
    recon = []
    async for msg in sdk.query(
        prompt=f"Recon {target_url}",
        options=sdk.ClaudeAgentOptions(allowed_tools=["Bash", "WebFetch"]),
    ):
        recon.append(msg)

    endpoints = parse_endpoints(recon)   # your logic
    target = pick_target(endpoints)      # your logic

    out = ""
    async for msg in sdk.query(prompt=f"Exploit {target}", options=sdk.ClaudeAgentOptions()):
        out = final_text(msg)
    return out

The same loop, wrapped as an engine

Subclass ClaudeCodeEngine to reuse its message translation. Your orchestration moves into run_loop; the only new lines turn each SDK message into native events and dispatch them:

import claude_agent_sdk as sdk

from dreadnode.agents.engines import (
    ClaudeCodeEngine,
    ClaudeCodeTranslationState,
    EngineContext,
    register_engine,
)
from dreadnode.agents.events import AgentEnd, AgentStart


@register_engine
class PentestEngine(ClaudeCodeEngine):
    name = "acme-pentest"

    async def run_loop(self, ctx: EngineContext):
        state = ClaudeCodeTranslationState()

        async for ev in ctx.dispatch(
            AgentStart(agent_id=ctx.agent.agent_id, agent_name=ctx.agent.name)
        ):
            yield ev

        # --- recon phase ---
        recon = []
        async for msg in sdk.query(
            prompt=f"Recon {ctx.goal}",
            options=sdk.ClaudeAgentOptions(allowed_tools=["Bash", "WebFetch"]),
        ):
            recon.append(msg)
            for ev in self.translate(ctx, msg, state):   # SDK message -> native events
                async for out in ctx.dispatch(ev):        # scorers/hooks run here
                    yield out

        endpoints = parse_endpoints(recon)   # your logic, unchanged
        target = pick_target(endpoints)      # your logic, unchanged

        # --- exploit phase ---
        async for msg in sdk.query(prompt=f"Exploit {target}", options=sdk.ClaudeAgentOptions()):
            for ev in self.translate(ctx, msg, state):
                async for out in ctx.dispatch(ev):
                    yield out

        async for ev in ctx.dispatch(
            AgentEnd(agent_id=ctx.agent.agent_id, status="finished", stop_reason="finished")
        ):
            yield ev

parse_endpoints, pick_target, and your phase structure are untouched. The new code is the run_loop shell plus self.translate(...) and ctx.dispatch(...) around each message.

What each piece does

ctx (EngineContext) carries the run: ctx.goal is the task input, ctx.agent is the agent config, and ctx.dispatch(event) runs an event through the agent’s hooks.
self.translate(ctx, msg, state) turns one Claude Agent SDK message into native events — assistant text and tool calls become GenerationStep, tool results become ToolStep, with reasoning and token usage carried along. ClaudeCodeTranslationState holds per-run state (step counter, pending tool calls, token totals); it’s named for the harness it parses, so a future engine like codex has its own.
ctx.dispatch(event) is where scoring happens. Each event flows through the agent’s hooks, so the scorers attached to the agent — “did discovery surface the right endpoints,” “did validation over-filter” — see every step your loop emits.

run_loop must yield exactly one terminal AgentEnd. Everything between AgentStart and AgentEnd becomes the trajectory.

Declare it on an agent

Reference the engine by module:Class in the agent frontmatter — the runtime imports it:

---
name: pentest-agent
model: claude-sonnet-4-5
engine: acme.engines:PentestEngine
---

No platform change is needed; engine: resolves built-in names and import references.

Declare what it can govern

A foreign loop can’t enforce everything the native loop can, so an engine declares its enforcement surface and the runtime reconciles it against the session policy (see Engines → Governance). Subclassing ClaudeCodeEngine inherits an honest default — autonomy and step budget enforced, tool approval bridged, mid-loop steering observe-only. Override describe_enforcement if your loop enforces more or less. If a policy needs something your engine can’t enforce, the runtime refuses the session rather than pretending.

Where it runs

A custom engine runs your code inside the runtime. Treat it like any code you ship: an engine referenced by module:Class that isn’t an audited built-in runs out-of-process inside the runtime sandbox — the platform’s isolation boundary. Keep secrets and untrusted input on the sandbox side of that line.