Supervised fine-tuning

Adapt a model from demonstration data — normal supervised datasets or Worlds trajectory datasets.

Reach for supervised fine-tuning (SFT) when you already have examples of the behavior you want. The trainer converts each example into a chat-formatted conversation, scaffolds it with the capability’s system prompt, and runs cross-entropy training over the resulting tokens.

dn train sft \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --capability [email protected] \
  --dataset [email protected] \
  --eval-dataset [email protected] \
  --steps 100 \
  --batch-size 8 \
  --learning-rate 1e-4 \
  --lora-rank 16 \
  --wait

Pick an input shape

SFT accepts two kinds of training data. Pass one — or both, for ETL-merged training.

Input	Flag	Use it when
Supervised dataset	`--dataset NAME@VERSION`	Rows are prompt/response (or chat-shaped) demonstrations.
Worlds trajectory datasets	`--trajectory-dataset NAME@VERSION` (repeatable)	Demonstrations are agent rollouts collected via Worlds.

Both resolve against the published Datasets registry. Trajectory datasets are converted into SFT conversations on the worker side — you don’t flatten them yourself.

--eval-dataset NAME@VERSION is optional. When set, the trainer runs an eval pass after training and records the eval loss alongside the per-step training loss.

Tuning knobs

The full list lives in the manifest reference. The flags below are the ones SFT tuning usually touches:

Flag	Does
`--steps <n>` / `--epochs <n>`	Bound the inner loop — optimizer steps or passes over data.
`--batch-size <n>`	Per-step batch size.
`--gradient-accumulation-steps <n>`	Effective batch size without more GPU memory.
`--learning-rate <float>`	Optimizer LR.
`--max-sequence-length <n>`	Tokenization cap per example.
`--lora-rank <n>`, `--lora-alpha <n>`	LoRA adapter shape. Smaller rank = faster, less capacity.
`--checkpoint-interval <n>`	Save a checkpoint every N optimizer steps.

Full CLI surface: dn train.

From the SDK

Submit the same job programmatically when the CLI isn’t the right place — a notebook, a CI pipeline, or a larger Python workflow.

from dreadnode.app.api.client import ApiClient
from dreadnode.app.api.models import (
    CapabilityRef,
    CreateTinkerSFTJobRequest,
    DatasetRef,
    TinkerSFTJobConfig,
)


client = ApiClient("https://app.dreadnode.io", api_key="dn_...")

job = client.create_training_job(
    "acme",
    "research",
    CreateTinkerSFTJobRequest(
        model="meta-llama/Llama-3.1-8B-Instruct",
        capability_ref=CapabilityRef(name="support-agent", version="1.0.0"),
        config=TinkerSFTJobConfig(
            dataset_ref=DatasetRef(name="support-demos", version="0.1.0"),
            eval_dataset_ref=DatasetRef(name="support-eval", version="0.1.0"),
            steps=100,
            batch_size=8,
            learning_rate=1e-4,
            lora_rank=16,
        ),
    ),
)

print(job.id, job.status)

TinkerSFTJobConfig requires either dataset_ref or at least one trajectory_dataset_refs entry. All other fields are optional — unset fields fall back to backend defaults.

For trajectory-backed SFT, swap the dataset ref for a list of trajectories:

config=TinkerSFTJobConfig(
    trajectory_dataset_refs=[
        DatasetRef(name="dreadnode/worlds-trajectories-a", version="0.1.0"),
        DatasetRef(name="dreadnode/worlds-trajectories-b", version="0.1.0"),
    ],
    steps=50,
    lora_rank=16,
),

CapabilityRef pins the capability at submission; the resolved snapshot is persisted on the job alongside the resolved runtime digest.

After the job starts

Submit is only the first step. See running training jobs for the lifecycle surface — list, get, wait, logs, cancel, retry — and outputs for what the trainer emits when it completes.