Supervised fine-tuning
Adapt a model from demonstration data — normal supervised datasets or Worlds trajectory datasets.
Reach for supervised fine-tuning (SFT) when you already have examples of the behavior you want. The trainer converts each example into a chat-formatted conversation, scaffolds it with the capability’s system prompt, and runs cross-entropy training over the resulting tokens.
dn train sft \ --model meta-llama/Llama-3.1-8B-Instruct \ --steps 100 \ --batch-size 8 \ --learning-rate 1e-4 \ --lora-rank 16 \ --waitPick an input shape
Section titled “Pick an input shape”SFT accepts two kinds of training data. Pass one — or both, for ETL-merged training.
| Input | Flag | Use it when |
|---|---|---|
| Supervised dataset | --dataset NAME@VERSION | Rows are prompt/response (or chat-shaped) demonstrations. |
| Worlds trajectory datasets | --trajectory-dataset NAME@VERSION (repeatable) | Demonstrations are agent rollouts collected via Worlds. |
Both resolve against the published Datasets registry. Trajectory datasets are converted into SFT conversations on the worker side — you don’t flatten them yourself.
--eval-dataset NAME@VERSION is optional. When set, the trainer runs an eval pass after
training and records the eval loss alongside the per-step training loss.
Tuning knobs
Section titled “Tuning knobs”The full list lives in the manifest reference. The flags below are the ones SFT tuning usually touches:
| Flag | Does |
|---|---|
--steps <n> / --epochs <n> | Bound the inner loop — optimizer steps or passes over data. |
--batch-size <n> | Per-step batch size. |
--gradient-accumulation-steps <n> | Effective batch size without more GPU memory. |
--learning-rate <float> | Optimizer LR. |
--max-sequence-length <n> | Tokenization cap per example. |
--lora-rank <n>, --lora-alpha <n> | LoRA adapter shape. Smaller rank = faster, less capacity. |
--checkpoint-interval <n> | Save a checkpoint every N optimizer steps. |
Full CLI surface: dn train.
From the SDK
Section titled “From the SDK”Submit the same job programmatically when the CLI isn’t the right place — a notebook, a CI pipeline, or a larger Python workflow.
from dreadnode.app.api.client import ApiClientfrom dreadnode.app.api.models import ( CapabilityRef, CreateTinkerSFTJobRequest, DatasetRef, TinkerSFTJobConfig,)
client = ApiClient("https://app.dreadnode.io", api_key="dn_...")
job = client.create_training_job( "acme", "research", CreateTinkerSFTJobRequest( model="meta-llama/Llama-3.1-8B-Instruct", capability_ref=CapabilityRef(name="support-agent", version="1.0.0"), config=TinkerSFTJobConfig( dataset_ref=DatasetRef(name="support-demos", version="0.1.0"), eval_dataset_ref=DatasetRef(name="support-eval", version="0.1.0"), steps=100, batch_size=8, learning_rate=1e-4, lora_rank=16, ), ),)
print(job.id, job.status)TinkerSFTJobConfig requires either dataset_ref or at least one trajectory_dataset_refs
entry. All other fields are optional — unset fields fall back to backend defaults.
For trajectory-backed SFT, swap the dataset ref for a list of trajectories:
config=TinkerSFTJobConfig( trajectory_dataset_refs=[ DatasetRef(name="dreadnode/worlds-trajectories-a", version="0.1.0"), DatasetRef(name="dreadnode/worlds-trajectories-b", version="0.1.0"), ], steps=50, lora_rank=16,),CapabilityRef pins the capability at submission; the resolved snapshot is persisted on the job
alongside the resolved runtime digest.
After the job starts
Section titled “After the job starts”Submit is only the first step. See running training jobs for the lifecycle surface — list, get, wait, logs, cancel, retry — and outputs for what the trainer emits when it completes.