Skip to content

Worker API

Worker construction, lifecycle states, transition rules, standalone entry points, and the RuntimeClient method index.

Reference companion to the Workers guide. The guide covers what each decorator does; this page covers the lifecycle state machine, the standalone entry points, the EventEnvelope shape, and the RuntimeClient surface.

from dreadnode.capabilities.worker import Worker
worker = Worker(name="bridge")

Construct at module level. When loaded via a capability manifest, the manifest key is authoritative; if name is provided it must match the key. Workers run as a standalone process (worker.run()) must provide name explicitly.

A plain dict for worker-owned state. Set keys in on_startup, read them in event and task handlers, clean them up in on_shutdown. No lock — guard concurrent mutation yourself (see the State and concurrency section of the guide).

Worker.run() and Worker.arun() bootstrap the framework inside a subprocess or a one-off Python entry point. Both read DREADNODE_RUNTIME_* env vars (see environment variables), open a RuntimeClient, install signal handlers, and drive the same runner used for in-process workers.

if __name__ == "__main__":
worker.run() # blocking — asyncio.run()
# or inside an existing event loop
await worker.arun()

A non-zero exit indicates an error state — the parent subprocess supervisor re-raises the originating error message.

StateMeaning
loadingRuntime is importing the module or preparing the subprocess
startingon_startup is running, or the subprocess is spawning
runningNormal dispatch; subprocess is alive
stoppingon_shutdown is running, or the subprocess received SIGTERM
stoppedClean exit. on_shutdown exceptions land here with the error on health.
errorStartup failed, all supervised tasks crashed, or subprocess exited non-zero
gated_offwhen: predicate evaluated false — never started
  • Startup: loading → starting → running. Exception in on_startuperror.
  • Shutdown: running → stopping → stopped. Exception in on_shutdown still lands in stopped with the error attached to the worker’s health entry.
  • Subprocess exit while running: exit 0 → stopped, non-zero → error. No auto-restart of the worker process itself.
  • Task crash loop: every @worker.task supervisor exhausted (see backoff below) → error.
  • Restart: error and stopped workers restart via the TUI capability manager or client.restart_worker(capability, name). Gated workers require flipping the controlling flag.

@worker.task handlers restart with exponential backoff starting at 1 second, doubling up to 5 minutes. A task that runs stably for 60 seconds resets the backoff counter. A worker is declared in error only when every registered task supervisor has exhausted its retries.

@worker.every accepts exactly one of seconds, minutes, or cron. Any other combination raises ValueError at decoration time. Cron expressions use the standard 5-field format.

Every handler must be async def. Synchronous handlers raise TypeError at decoration time.

Multiple handlers can register for the same on_event kind — all of them dispatch. Handlers for the same kind can be invoked concurrently.

Delivered to every @worker.on_event handler and returned from client.subscribe(...).

AttributeTypeNotes
kindstrEvent kind; matches the string passed to @worker.on_event(...).
session_idstr | NoneSet for session-scoped events; None for runtime-scope.
turn_idstr | NoneSet for turn-lifecycle events.
seqintMonotonic per-session sequence.
payloaddict[str, Any]Event-specific body. See event kinds.
timestampdatetimeUTC time the envelope was created.
event_idstrEnvelope identity (UUID hex).
terminalboolTrue on the last event of a turn (turn.completed/failed/cancelled).
replayboolTrue when the event is being replayed from a buffer.
from dreadnode.capabilities.worker import (
Worker,
EventEnvelope,
RuntimeClient,
TurnCancelledError,
TurnFailedError,
)

EventEnvelope and RuntimeClient are available for type annotations without pulling the full server or client packages at import time. TurnCancelledError / TurnFailedError are raised by client.run_turn(...) on terminal failures.

Every handler receives a RuntimeClient — the worker’s channel back to the runtime. The same client is what worker.run() constructs from env, what the TUI uses, and what standalone scripts use. Method groups:

MethodPurpose
create_session(capability, agent, ..., session_id=...)Create a session. Idempotent on session_id — reuse to dedupe across external entities.
list_sessions(include_platform=False)List active sessions.
fetch_session_messages(session_id)Read the full message history for a session.
set_session_title(session_id, title)Rename a session.
set_session_policy(session_id, ...)Hot-swap a session’s policy (interactive ↔ headless).
reset_session(session_id)Clear the session’s message history in place.
compact_session(session_id, guidance="")Trigger context compaction for the session.
cancel_session(session_id)Cancel the active turn (queued turns still run).
delete_session(session_id)Remove a session and its resources.
MethodPurpose
stream_chat(session_id, message, model=..., agent=..., ...)Start a turn and yield an async iterator of envelopes. Discarding events is fine.
run_turn(...)Like stream_chat but collects into a completed turn object. Raises TurnFailedError / TurnCancelledError on terminal failure.
send_permission_response(session_id, request_id, decision)Respond to a permission prompt (prompt.required).
send_human_input_response(session_id, response)Respond to a human-input prompt.
MethodPurpose
publish(kind, payload, session_id=None)Emit a custom event onto the runtime bus. Reserved prefixes are rejected.
notify(title, body=None, severity='info', source=None, session_id=None)Push a user-facing notification — renders in the TUI. source defaults to capability.<name> for worker-hosted clients.
subscribe(*kinds)Open an event stream for ad-hoc consumption. Async iterator; close to unsubscribe. Reconnects automatically on transport loss.
subscribe_session(session_id)Subscribe to one session’s events.
unsubscribe_session(session_id)Drop that subscription.
MethodPurpose
fetch_runtime_info()Read current health for capabilities, MCP servers, workers, and the runtime itself.
fetch_tools() / fetch_skills()Enumerate registered tools and skills.
fetch_skill_content(name)Read the body of a skill by name.
fetch_mcp_detail(capability, server_name)Read detail + recent stderr for an MCP server.
fetch_worker_detail(capability, worker_name)Read detail + recent output + log path for a subprocess worker.
MethodPurpose
reload_capabilities()Re-discover capabilities on disk. Stops and restarts every worker.
reconnect_mcp_server(capability, server_name)Force a fresh connection to a capability’s MCP server.
restart_worker(capability, worker_name)Restart a worker. Works from an error or stopped state; gated workers require a flag flip.
MethodPurpose
list_files(path=None, depth=10)List files the runtime can see.
read_file(path)Read a file’s content.
execute_shell(command, cwd=None, timeout=30)Run a shell command on the runtime host.