Skip to content

Supervised fine-tuning

Adapt a model from demonstration data — normal supervised datasets or Worlds trajectory datasets.

Reach for supervised fine-tuning (SFT) when you already have examples of the behavior you want. The trainer converts each example into a chat-formatted conversation, scaffolds it with the capability’s system prompt, and runs cross-entropy training over the resulting tokens.

Terminal window
dn train sft \
--model meta-llama/Llama-3.1-8B-Instruct \
--capability [email protected] \
--dataset [email protected] \
--eval-dataset [email protected] \
--steps 100 \
--batch-size 8 \
--learning-rate 1e-4 \
--lora-rank 16 \
--wait

SFT accepts two kinds of training data. Pass one — or both, for ETL-merged training.

InputFlagUse it when
Supervised dataset--dataset NAME@VERSIONRows are prompt/response (or chat-shaped) demonstrations.
Worlds trajectory datasets--trajectory-dataset NAME@VERSION (repeatable)Demonstrations are agent rollouts collected via Worlds.

Both resolve against the published Datasets registry. Trajectory datasets are converted into SFT conversations on the worker side — you don’t flatten them yourself.

--eval-dataset NAME@VERSION is optional. When set, the trainer runs an eval pass after training and records the eval loss alongside the per-step training loss.

The full list lives in the manifest reference. The flags below are the ones SFT tuning usually touches:

FlagDoes
--steps <n> / --epochs <n>Bound the inner loop — optimizer steps or passes over data.
--batch-size <n>Per-step batch size.
--gradient-accumulation-steps <n>Effective batch size without more GPU memory.
--learning-rate <float>Optimizer LR.
--max-sequence-length <n>Tokenization cap per example.
--lora-rank <n>, --lora-alpha <n>LoRA adapter shape. Smaller rank = faster, less capacity.
--checkpoint-interval <n>Save a checkpoint every N optimizer steps.

Full CLI surface: dn train.

Submit the same job programmatically when the CLI isn’t the right place — a notebook, a CI pipeline, or a larger Python workflow.

from dreadnode.app.api.client import ApiClient
from dreadnode.app.api.models import (
CapabilityRef,
CreateTinkerSFTJobRequest,
DatasetRef,
TinkerSFTJobConfig,
)
client = ApiClient("https://app.dreadnode.io", api_key="dn_...")
job = client.create_training_job(
"acme",
"research",
CreateTinkerSFTJobRequest(
model="meta-llama/Llama-3.1-8B-Instruct",
capability_ref=CapabilityRef(name="support-agent", version="1.0.0"),
config=TinkerSFTJobConfig(
dataset_ref=DatasetRef(name="support-demos", version="0.1.0"),
eval_dataset_ref=DatasetRef(name="support-eval", version="0.1.0"),
steps=100,
batch_size=8,
learning_rate=1e-4,
lora_rank=16,
),
),
)
print(job.id, job.status)

TinkerSFTJobConfig requires either dataset_ref or at least one trajectory_dataset_refs entry. All other fields are optional — unset fields fall back to backend defaults.

For trajectory-backed SFT, swap the dataset ref for a list of trajectories:

config=TinkerSFTJobConfig(
trajectory_dataset_refs=[
DatasetRef(name="dreadnode/worlds-trajectories-a", version="0.1.0"),
DatasetRef(name="dreadnode/worlds-trajectories-b", version="0.1.0"),
],
steps=50,
lora_rank=16,
),

CapabilityRef pins the capability at submission; the resolved snapshot is persisted on the job alongside the resolved runtime digest.

Submit is only the first step. See running training jobs for the lifecycle surface — list, get, wait, logs, cancel, retry — and outputs for what the trainer emits when it completes.