Skip to content

Manifest reference

Every Tinker SFT and RL config field, validation rule, and default.

Exhaustive reference for every training-job request and config field. CLI flags map onto these one-for-one — the CLI surface lives on the auto-generated dn train page.

Every hosted training request carries the same base fields:

FieldTypeDefaultNotes
namestr | NonenullOptional job display name.
modelstrRequired. Base model or adapter target.
project_refstr | NonenullWorkspace project key. Defaults to the workspace default.
run_refstr | NonenullOptional run association for lineage.
capability_refCapabilityRefRequired. Versioned capability snapshot to train against.
tagslist[str][]Optional tag list.
backendliteraltinker or ray — set by the request class.
trainer_typeliteralsft or rl — set by the request class.
configtrainer configRequired. Trainer-specific config object (tables below).

CapabilityRef, DatasetRef, RewardRecipe, and WorldRewardPolicy are all { name, params? } / { name, version } shapes:

ModelFields
CapabilityRefname: str, version: str
DatasetRefname: str, version: str
RewardRecipename: str, params: dict (default {})
WorldRewardPolicyname: str, params: dict (default {}) — see reward recipes for preset names and component composition.

Hosted SFT on the Tinker backend. backend is "tinker", trainer_type is "sft", config is TinkerSFTJobConfig (below).

FieldTypeDefaultConstraintNotes
dataset_refDatasetRef | NonenullSupervised training dataset.
trajectory_dataset_refslist[DatasetRef][]Worlds trajectory datasets. Repeatable.
eval_dataset_refDatasetRef | NonenullOptional eval corpus; enables post-training eval loss.
max_sequence_lengthint | Nonenull>= 1Tokenization cap per example.
batch_sizeint | Nonenull>= 1Per-step batch size.
gradient_accumulation_stepsint | Nonenull>= 1Optimizer accumulation steps.
learning_ratefloat | Nonenull> 0Optimizer learning rate.
stepsint | Nonenull>= 1Maximum optimizer steps.
epochsint | Nonenull>= 1Maximum passes over the training set.
lora_rankint | Nonenull>= 1LoRA rank override.
lora_alphaint | Nonenull>= 1LoRA alpha override.
checkpoint_intervalint | Nonenull>= 1Checkpoint every N optimizer steps.

Validation (at submit time):

  • At least one source is required: dataset_ref, one or more trajectory_dataset_refs, or both. The trainer ETL-merges the inputs when both are set.

Hosted RL on the Tinker backend. backend is "tinker", trainer_type is "rl", config is TinkerRLJobConfig (below).

FieldTypeDefaultConstraintNotes
algorithm"importance_sampling" | "ppo"Required.
task_refstr | Nonenullname for latest or name@version for a pinned version.
world_manifest_idstr | NonenullWorlds manifest for live rollouts.
world_runtime_idstr | NonenullRuntime whose capability binding provides the rollout agent.
world_agent_namestr | NonenullAgent selection inside the runtime-bound capability.
world_goalstr | NonenullGoal prompt override for live rollouts.
prompt_dataset_refDatasetRef | NonenullPrompt dataset for verifier-driven RL.
trajectory_dataset_refslist[DatasetRef][]Worlds trajectory datasets for offline RL.
reward_recipeRewardRecipe | NonenullServer-side completion reward. See reward recipes.
world_rewardWorldRewardPolicy | NonenullSDK-side trajectory shaping for live Worlds rollouts.
execution_mode"sync" | "one_step_off_async" | "fully_async"syncRollout-group scheduler mode.
prompt_splitstr | NonenullDataset split used for prompt sampling.
stepsint | Nonenull>= 1Number of optimizer steps.
lora_rankint | Nonenull>= 1LoRA rank override.
max_turnsint | Nonenull>= 1Maximum agent turns per episode.
max_episode_stepsint | Nonenull>= 1Maximum environment steps per episode.
num_rolloutsint | Nonenull>= 1Rollouts per training window.
batch_sizeint | Nonenull>= 1Training batch size.
learning_ratefloat | Nonenull> 0Optimizer learning rate.
weight_sync_intervalint | Nonenull>= 1Sampler weight sync, in optimizer steps.
max_steps_off_policyint | Nonenull>= 1Rollout staleness budget for async modes.
max_new_tokensint | Nonenull>= 1Per-completion sampling cap.
temperaturefloat | Nonenull>= 0Sampling temperature.
stoplist[str] | NonenullStop sequences.
checkpoint_intervalint | Nonenull>= 1Checkpoint every N optimizer steps.

Validation (at submit time):

  • At least one input required: prompt_dataset_ref, world_manifest_id, or one or more trajectory_dataset_refs.
  • world_runtime_id requires world_manifest_id.
  • world_agent_name requires world_runtime_id.
  • execution_mode != "sync" requires max_steps_off_policy.
  • execution_mode == "one_step_off_async" forces max_steps_off_policy == 1.

Ray-backed GRPO. backend is "ray", trainer_type is "rl", config is RayGRPOJobConfig.

FieldTypeDefaultConstraintNotes
algorithm"grpo""grpo"Only GRPO is modelled on this config.
task_refstrRequired.
prompt_dataset_refDatasetRefRequired.
reward_recipeRewardRecipe | NonenullSee reward recipes.
execution_mode"async" | "colocated" | "distributed"asyncRay scheduling mode.
max_turnsint | Nonenull>= 1Maximum agent turns per episode.
max_episode_stepsint | Nonenull>= 1Environment-step cap per episode.
num_rolloutsint | Nonenull>= 1Rollouts per training window.
batch_sizeint | Nonenull>= 1Training batch size.
learning_ratefloat | Nonenull> 0Optimizer learning rate.
num_rollout_workersint | Nonenull>= 1Ray rollout workers.
buffer_sizeint | Nonenull>= 1Experience-buffer capacity.
checkpoint_intervalint | Nonenull>= 1Checkpoint every N learner steps.

TrainingJobResponse is the wire shape returned by every hosted-training endpoint. The SDK exposes the same fields under the type name TrainingJob.

FieldTypeNotes
idstrTraining-job identifier.
organization_idstrOwning organization.
workspace_idstrOwning workspace.
status"pending" | "queued" | "running" | "completed" | "failed" | "cancelled"Current lifecycle state.
namestr | nullOptional display name from the create request.
backend"tinker" | "ray"
trainer_type"sft" | "rl"
algorithm"grpo" | "importance_sampling" | "ppo" | nullSet on RL jobs; null for SFT.
modelstrBase model identifier.
capabilityTrainingCapabilitySnapshotResolved capability snapshot — name, version, runtime digest.
metricsdict[str, Any]Scalar + series metrics. See outputs.
artifactsdict[str, Any]Artifact references. See outputs.
tagslist[str]Tags carried from the create request.
errorstr | nullTop-level error string when the job settled to failed.
created_atstrISO-8601 submission time.
started_atstr | nullISO-8601 worker start time.
completed_atstr | nullISO-8601 terminal-state time.
cancel_requested_atstr | nullISO-8601. Set when a running job is asked to stop.

Plus the resolved refs from the create request: dataset_ref, trajectory_dataset_refs, task_ref, world_manifest_id, world_runtime_id, world_agent_name, world_goal, prompt_dataset_ref, project_ref, run_ref.

TrainingJobLogEntry:

FieldTypeNotes
timestampstrISO-8601.
level"debug" | "info" | "warning" | "error"
messagestrHuman-readable line.
datadict[str, Any]Optional structured payload.