Training
Fine-tune models with hosted SFT and RL jobs.
$ dn train <command>Fine-tune models with hosted SFT and RL jobs.
$ dn train sft <--model> <str> <--capability> <str>Submit a hosted SFT training job.
Options
--model(Required) — Model identifier--capability(Required) — Capability ref in NAME@VERSION form--dataset— Training dataset ref in NAME@VERSION form--trajectory-dataset— Trajectory dataset ref in NAME@VERSION form (repeatable)--eval-dataset— Evaluation dataset ref in NAME@VERSION form--name— Optional training job name--project-ref— Project reference for tracking--run-ref— Run reference for tracking--tag— Tag for the job (repeatable)--max-sequence-length— Maximum sequence length--batch-size— Training batch size--gradient-accumulation-steps— Gradient accumulation steps--learning-rate— Learning rate--steps— Number of training steps--epochs— Number of training epochs--lora-rank— LoRA rank--lora-alpha— LoRA alpha--checkpoint-interval— Steps between checkpoints--wait(defaultFalse)--poll-interval-sec(default5.0) — Polling interval in seconds--timeout-sec— Timeout in seconds for waiting--json(defaultFalse)
$ dn train rl <--model> <str> <--capability> <str> <--algorithm> <literal[importance_sampling,> <ppo]>Submit a hosted RL training job.
Options
--model(Required) — Model identifier--capability(Required) — Capability ref in NAME@VERSION form--algorithm— [required] [choices: importance_sampling, ppo]--prompt-dataset— Prompt dataset ref in NAME@VERSION form--trajectory-dataset— Trajectory dataset ref in NAME@VERSION form (repeatable)--world-manifest-id— World manifest ID for environment--world-runtime-id— World runtime ID--world-agent-name— Agent name in the world--world-goal— Goal for world-based training--task— Task ref--reward-recipe— Reward recipe name--reward-params— Reward recipe parameters as JSON--world-reward— World reward policy name--world-reward-params— World reward policy parameters as JSON--execution-mode(defaultsync) — [choices: sync, one_step_off_async, fully_async]--prompt-split— Dataset split for prompts--name— Optional training job name--project-ref— Project reference for tracking--run-ref— Run reference for tracking--tag— Tag for the job (repeatable)--steps— Number of training steps--lora-rank— LoRA rank--max-turns— Maximum conversation turns--max-episode-steps— Maximum steps per episode--num-rollouts— Number of rollouts per step--batch-size— Training batch size--learning-rate— Learning rate--weight-sync-interval— Steps between weight syncs--max-steps-off-policy— Maximum off-policy steps--max-new-tokens— Maximum new tokens per generation--temperature— Sampling temperature--stop— Stop sequence (repeatable)--checkpoint-interval— Steps between checkpoints--wait(defaultFalse)--poll-interval-sec(default5.0) — Polling interval in seconds--timeout-sec— Timeout in seconds for waiting--json(defaultFalse)
$ dn train listList hosted training jobs.
Options
--page(default1)--page-size(default20)--status— [choices: queued, running, completed, failed, cancelled]--backend— [choices: tinker]--trainer-type— [choices: sft, rl]--project-ref— Project reference filter--json(defaultFalse)
$ dn train get <job-id>Get a hosted training job.
Options
<job-id>,--job-id(Required)--json(defaultFalse)
$ dn train wait <job-id>Wait for a hosted training job to reach a terminal state.
Options
<job-id>,--job-id(Required)--poll-interval-sec(default5.0) — Polling interval in seconds--timeout-sec— Timeout in seconds for waiting--json(defaultFalse)
$ dn train logs <job-id>Show hosted training logs.
Options
<job-id>,--job-id(Required)--json(defaultFalse)
artifacts
Section titled “artifacts”$ dn train artifacts <job-id>Show hosted training artifacts.
Options
<job-id>,--job-id(Required)--json(defaultFalse)
cancel
Section titled “cancel”$ dn train cancel <job-id>Cancel a hosted training job.
Options
<job-id>,--job-id(Required)--json(defaultFalse)