Training
Fine-tune a model or LoRA adapter on your own data and publish it as a new capability-ready checkpoint.
Training answers the question: “Can I adapt this model’s weights to ship a better version for my task?”
You pick a base model, a published capability, and one source of training data — supervised examples, prompt datasets, or Worlds trajectories. The platform provisions training compute, runs the job, streams logs and metrics into the App, and leaves you with a checkpoint or LoRA adapter you can publish to the Models registry.
Don’t reach for training until prompt and instruction optimization stops paying off. If the dataset, task, or reward is still unstable, optimization or evaluations are the right place to tighten the problem first — training on a moving target just burns compute.
Two shapes
Section titled “Two shapes”| Shape | Reach for it when | Primary input |
|---|---|---|
| Supervised fine-tuning (SFT) | You have demonstrations of the behavior you want. | A supervised dataset of prompt/response (or chat) rows, or one or more Worlds trajectory datasets — converted to chat at the worker. |
| Reinforcement learning (RL) | You have a reward function, a verifier, or a live environment. | A prompt dataset, one or more trajectory datasets, or a Worlds manifest. |
Both run on the Tinker backend today. They share the same job record, lifecycle, artifact surface, and App view — what changes is the input data and the inner loop.
Where to go next
Section titled “Where to go next”Related topics
Section titled “Related topics”- Capabilities hold the policy scaffold every training job adapts. Publish the capability version before you train against it.
- Datasets is where the training and eval corpora live. Publish with explicit versions — training against a moving dataset is not reproducible.
- Worlds produces the trajectory datasets and manifests that back offline and live RL.
- Optimization changes prompt and instruction text. Training changes the model. Use optimization first when you can.