Skip to content

Training

Fine-tune models with hosted SFT and RL jobs.

Terminal window
$ dn train <command>

Fine-tune models with hosted SFT and RL jobs.

Terminal window
$ dn train sft <--model> <str> <--capability> <str>

Submit a hosted SFT training job.

Options

  • --model (Required) — Model identifier
  • --capability (Required) — Capability ref in NAME@VERSION form
  • --dataset — Training dataset ref in NAME@VERSION form
  • --trajectory-dataset — Trajectory dataset ref in NAME@VERSION form (repeatable)
  • --eval-dataset — Evaluation dataset ref in NAME@VERSION form
  • --name — Optional training job name
  • --project-ref — Project reference for tracking
  • --run-ref — Run reference for tracking
  • --tag — Tag for the job (repeatable)
  • --max-sequence-length — Maximum sequence length
  • --batch-size — Training batch size
  • --gradient-accumulation-steps — Gradient accumulation steps
  • --learning-rate — Learning rate
  • --steps — Number of training steps
  • --epochs — Number of training epochs
  • --lora-rank — LoRA rank
  • --lora-alpha — LoRA alpha
  • --checkpoint-interval — Steps between checkpoints
  • --wait (default False)
  • --poll-interval-sec (default 5.0) — Polling interval in seconds
  • --timeout-sec — Timeout in seconds for waiting
  • --json (default False)
Terminal window
$ dn train rl <--model> <str> <--capability> <str> <--algorithm> <literal[importance_sampling,> <ppo]>

Submit a hosted RL training job.

Options

  • --model (Required) — Model identifier
  • --capability (Required) — Capability ref in NAME@VERSION form
  • --algorithm[required] [choices: importance_sampling, ppo]
  • --prompt-dataset — Prompt dataset ref in NAME@VERSION form
  • --trajectory-dataset — Trajectory dataset ref in NAME@VERSION form (repeatable)
  • --world-manifest-id — World manifest ID for environment
  • --world-runtime-id — World runtime ID
  • --world-agent-name — Agent name in the world
  • --world-goal — Goal for world-based training
  • --task — Task ref
  • --reward-recipe — Reward recipe name
  • --reward-params — Reward recipe parameters as JSON
  • --world-reward — World reward policy name
  • --world-reward-params — World reward policy parameters as JSON
  • --execution-mode (default sync)[choices: sync, one_step_off_async, fully_async]
  • --prompt-split — Dataset split for prompts
  • --name — Optional training job name
  • --project-ref — Project reference for tracking
  • --run-ref — Run reference for tracking
  • --tag — Tag for the job (repeatable)
  • --steps — Number of training steps
  • --lora-rank — LoRA rank
  • --max-turns — Maximum conversation turns
  • --max-episode-steps — Maximum steps per episode
  • --num-rollouts — Number of rollouts per step
  • --batch-size — Training batch size
  • --learning-rate — Learning rate
  • --weight-sync-interval — Steps between weight syncs
  • --max-steps-off-policy — Maximum off-policy steps
  • --max-new-tokens — Maximum new tokens per generation
  • --temperature — Sampling temperature
  • --stop — Stop sequence (repeatable)
  • --checkpoint-interval — Steps between checkpoints
  • --wait (default False)
  • --poll-interval-sec (default 5.0) — Polling interval in seconds
  • --timeout-sec — Timeout in seconds for waiting
  • --json (default False)
Terminal window
$ dn train list

List hosted training jobs.

Options

  • --page (default 1)
  • --page-size (default 20)
  • --status[choices: queued, running, completed, failed, cancelled]
  • --backend[choices: tinker]
  • --trainer-type[choices: sft, rl]
  • --project-ref — Project reference filter
  • --json (default False)
Terminal window
$ dn train get <job-id>

Get a hosted training job.

Options

  • <job-id>, --job-id (Required)
  • --json (default False)
Terminal window
$ dn train wait <job-id>

Wait for a hosted training job to reach a terminal state.

Options

  • <job-id>, --job-id (Required)
  • --poll-interval-sec (default 5.0) — Polling interval in seconds
  • --timeout-sec — Timeout in seconds for waiting
  • --json (default False)
Terminal window
$ dn train logs <job-id>

Show hosted training logs.

Options

  • <job-id>, --job-id (Required)
  • --json (default False)
Terminal window
$ dn train artifacts <job-id>

Show hosted training artifacts.

Options

  • <job-id>, --job-id (Required)
  • --json (default False)
Terminal window
$ dn train cancel <job-id>

Cancel a hosted training job.

Options

  • <job-id>, --job-id (Required)
  • --json (default False)