Training

Fine-tune models with hosted SFT and RL jobs.

$ dn train <command>

Fine-tune models with hosted SFT and RL jobs.

sft

$ dn train sft <--model> <str> <--capability> <str>

Submit a hosted SFT training job.

Options

--model (Required) — Base model tinker_id. Run dreadnode train catalog to list supported values.
--capability (Required) — Capability ref in NAME@VERSION form
--dataset — Training dataset ref in NAME@VERSION form
--trajectory-dataset — Trajectory dataset ref in NAME@VERSION form (repeatable)
--eval-dataset — Evaluation dataset ref in NAME@VERSION form
--name — Optional training job name
--project-ref — Project reference for tracking
--run-ref — Run reference for tracking
--tag — Tag for the job (repeatable)
--max-sequence-length — Maximum sequence length
--batch-size — Training batch size
--gradient-accumulation-steps — Gradient accumulation steps
--learning-rate — Learning rate
--steps — Number of training steps
--epochs — Number of training epochs
--lora-rank — LoRA rank
--lora-alpha — LoRA alpha
--checkpoint-interval — Steps between checkpoints
--wait (default False)
--poll-interval-sec (default 5.0) — Polling interval in seconds
--timeout-sec — Timeout in seconds for waiting
--json (default False)

rl

$ dn train rl <--model> <str> <--capability> <str> <--algorithm> <literal[importance_sampling,> <ppo]>

Submit a hosted RL training job.

Options

--model (Required) — Base model tinker_id. Run dreadnode train catalog to list supported values.
--capability (Required) — Capability ref in NAME@VERSION form
--algorithm — [required] [choices: importance_sampling, ppo]
--prompt-dataset — Prompt dataset ref in NAME@VERSION form
--trajectory-dataset — Trajectory dataset ref in NAME@VERSION form (repeatable)
--world-manifest-id — World manifest ID for environment
--world-runtime-id — World runtime ID
--world-agent-name — Agent name in the world
--world-goal — Goal for world-based training
--task — Task ref
--reward-recipe — Reward recipe name
--reward-params — Reward recipe parameters as JSON
--world-reward — World reward policy name
--world-reward-params — World reward policy parameters as JSON
--execution-mode (default sync) — [choices: sync, one_step_off_async, fully_async]
--prompt-split — Dataset split for prompts
--name — Optional training job name
--project-ref — Project reference for tracking
--run-ref — Run reference for tracking
--tag — Tag for the job (repeatable)
--steps — Number of training steps
--lora-rank — LoRA rank
--max-turns — Maximum conversation turns
--max-episode-steps — Maximum steps per episode
--num-rollouts — Number of rollouts per step
--batch-size — Training batch size
--learning-rate — Learning rate
--weight-sync-interval — Steps between weight syncs
--max-steps-off-policy — Maximum off-policy steps
--max-new-tokens — Maximum new tokens per generation
--temperature — Sampling temperature
--stop — Stop sequence (repeatable)
--checkpoint-interval — Steps between checkpoints
--eval-dataset — Optional held-out prompt dataset ref (NAME@VERSION). Scored every —eval-interval steps with temperature=0 using the same —reward-recipe. Emits eval/reward[_max|_min] series.
--eval-interval — Eval cadence in optimizer steps (default 10)
--eval-max-rollouts — Cap on prompts sampled per eval pass
--wait (default False)
--poll-interval-sec (default 5.0) — Polling interval in seconds
--timeout-sec — Timeout in seconds for waiting
--json (default False)

list

$ dn train list

List hosted training jobs.

Options

--page (default 1)
--page-size (default 20)
--status, --state — [choices: queued, running, completed, failed, cancelled]
--backend — [choices: tinker]
--trainer-type — [choices: sft, rl]
--project-ref — Project reference filter
--json (default False)

get

$ dn train get <job-id>

Get a hosted training job.

Options

<job-id>, --job-id (Required)
--json (default False)

wait

$ dn train wait <job-id>

Wait for a hosted training job to reach a terminal state.

Options

<job-id>, --job-id (Required)
--poll-interval-sec (default 5.0) — Polling interval in seconds
--timeout-sec — Timeout in seconds for waiting
--json (default False)

logs

$ dn train logs <job-id>

Show hosted training logs.

Options

<job-id>, --job-id (Required)
--json (default False)

artifacts

$ dn train artifacts <job-id>

Show hosted training artifacts.

Options

<job-id>, --job-id (Required)
--json (default False)

cancel

$ dn train cancel <job-id>

Cancel a hosted training job.

Options

<job-id>, --job-id (Required) — The training job ID.
--yes, -y (default False) — Skip the confirmation prompt.
--json (default False) — Output as JSON.

catalog

$ dn train catalog

List supported training base models.

The values printed in the tinker_id column are what you pass as --model on dreadnode train sft / dreadnode train rl.

Options

--query, --search — Free-text search over model id / display name
--family — Filter by model family (e.g. llama, qwen)
--algorithm — Filter by supported algorithm (sft, importance_sampling, ppo)
--min-size-b — Minimum active parameter count (B)
--max-size-b — Maximum active parameter count (B)
--limit (default 20) — Maximum rows to render
--json (default False)