Manifest reference

Every Tinker SFT and RL config field, validation rule, and default.

Exhaustive reference for every training-job request and config field. CLI flags map onto these one-for-one — the CLI surface lives on the auto-generated dn train page.

Request wrapper

Every hosted training request carries the same base fields:

Field	Type	Default	Notes
`name`	`str \| None`	`null`	Optional job display name.
`model`	`str`	—	Required. Base model or adapter target.
`project_ref`	`str \| None`	`null`	Workspace project key. Defaults to the workspace default.
`run_ref`	`str \| None`	`null`	Optional run association for lineage.
`capability_ref`	`CapabilityRef`	—	Required. Versioned capability snapshot to train against.
`tags`	`list[str]`	`[]`	Optional tag list.
`backend`	literal	—	`tinker` or `ray` — set by the request class.
`trainer_type`	literal	—	`sft` or `rl` — set by the request class.
`config`	trainer config	—	Required. Trainer-specific config object (tables below).

CapabilityRef, DatasetRef, RewardRecipe, and WorldRewardPolicy are all { name, params? } / { name, version } shapes:

Model	Fields
`CapabilityRef`	`name: str`, `version: str`
`DatasetRef`	`name: str`, `version: str`
`RewardRecipe`	`name: str`, `params: dict` (default `{}`)
`WorldRewardPolicy`	`name: str`, `params: dict` (default `{}`) — see reward recipes for preset names and component composition.

`CreateTinkerSFTJobRequest`

Hosted SFT on the Tinker backend. backend is "tinker", trainer_type is "sft", config is TinkerSFTJobConfig (below).

`TinkerSFTJobConfig`

Field	Type	Default	Constraint	Notes
`dataset_ref`	`DatasetRef \| None`	`null`	—	Supervised training dataset.
`trajectory_dataset_refs`	`list[DatasetRef]`	`[]`	—	Worlds trajectory datasets. Repeatable.
`eval_dataset_ref`	`DatasetRef \| None`	`null`	—	Optional eval corpus; enables post-training eval loss.
`max_sequence_length`	`int \| None`	`null`	`>= 1`	Tokenization cap per example.
`batch_size`	`int \| None`	`null`	`>= 1`	Per-step batch size.
`gradient_accumulation_steps`	`int \| None`	`null`	`>= 1`	Optimizer accumulation steps.
`learning_rate`	`float \| None`	`null`	`> 0`	Optimizer learning rate.
`steps`	`int \| None`	`null`	`>= 1`	Maximum optimizer steps.
`epochs`	`int \| None`	`null`	`>= 1`	Maximum passes over the training set.
`lora_rank`	`int \| None`	`null`	`>= 1`	LoRA rank override.
`lora_alpha`	`int \| None`	`null`	`>= 1`	LoRA alpha override.
`checkpoint_interval`	`int \| None`	`null`	`>= 1`	Checkpoint every N optimizer steps.

Validation (at submit time):

At least one source is required: dataset_ref, one or more trajectory_dataset_refs, or both. The trainer ETL-merges the inputs when both are set.

`CreateTinkerRLJobRequest`

Hosted RL on the Tinker backend. backend is "tinker", trainer_type is "rl", config is TinkerRLJobConfig (below).

`TinkerRLJobConfig`

Field	Type	Default	Constraint	Notes
`algorithm`	`"importance_sampling" \| "ppo"`	—	—	Required.
`task_ref`	`str \| None`	`null`	—	`name` for latest or `name@version` for a pinned version.
`world_manifest_id`	`str \| None`	`null`	—	Worlds manifest for live rollouts.
`world_runtime_id`	`str \| None`	`null`	—	Runtime whose capability binding provides the rollout agent.
`world_agent_name`	`str \| None`	`null`	—	Agent selection inside the runtime-bound capability.
`world_goal`	`str \| None`	`null`	—	Goal prompt override for live rollouts.
`prompt_dataset_ref`	`DatasetRef \| None`	`null`	—	Prompt dataset for verifier-driven RL.
`trajectory_dataset_refs`	`list[DatasetRef]`	`[]`	—	Worlds trajectory datasets for offline RL.
`reward_recipe`	`RewardRecipe \| None`	`null`	—	Server-side completion reward. See reward recipes.
`world_reward`	`WorldRewardPolicy \| None`	`null`	—	SDK-side trajectory shaping for live Worlds rollouts.
`execution_mode`	`"sync" \| "one_step_off_async" \| "fully_async"`	`sync`	—	Rollout-group scheduler mode.
`prompt_split`	`str \| None`	`null`	—	Dataset split used for prompt sampling.
`steps`	`int \| None`	`null`	`>= 1`	Number of optimizer steps.
`lora_rank`	`int \| None`	`null`	`>= 1`	LoRA rank override.
`max_turns`	`int \| None`	`null`	`>= 1`	Maximum agent turns per episode.
`max_episode_steps`	`int \| None`	`null`	`>= 1`	Maximum environment steps per episode.
`num_rollouts`	`int \| None`	`null`	`>= 1`	Rollouts per training window.
`batch_size`	`int \| None`	`null`	`>= 1`	Training batch size.
`learning_rate`	`float \| None`	`null`	`> 0`	Optimizer learning rate.
`weight_sync_interval`	`int \| None`	`null`	`>= 1`	Sampler weight sync, in optimizer steps.
`max_steps_off_policy`	`int \| None`	`null`	`>= 1`	Rollout staleness budget for async modes.
`max_new_tokens`	`int \| None`	`null`	`>= 1`	Per-completion sampling cap.
`temperature`	`float \| None`	`null`	`>= 0`	Sampling temperature.
`stop`	`list[str] \| None`	`null`	—	Stop sequences.
`checkpoint_interval`	`int \| None`	`null`	`>= 1`	Checkpoint every N optimizer steps.

Validation (at submit time):

At least one input required: prompt_dataset_ref, world_manifest_id, or one or more trajectory_dataset_refs.
world_runtime_id requires world_manifest_id.
world_agent_name requires world_runtime_id.
execution_mode != "sync" requires max_steps_off_policy.
execution_mode == "one_step_off_async" forces max_steps_off_policy == 1.

`CreateRayGRPOJobRequest`

Ray-backed GRPO. backend is "ray", trainer_type is "rl", config is RayGRPOJobConfig.

`RayGRPOJobConfig`

Field	Type	Default	Constraint	Notes
`algorithm`	`"grpo"`	`"grpo"`	—	Only GRPO is modelled on this config.
`task_ref`	`str`	—	—	Required.
`prompt_dataset_ref`	`DatasetRef`	—	—	Required.
`reward_recipe`	`RewardRecipe \| None`	`null`	—	See reward recipes.
`execution_mode`	`"async" \| "colocated" \| "distributed"`	`async`	—	Ray scheduling mode.
`max_turns`	`int \| None`	`null`	`>= 1`	Maximum agent turns per episode.
`max_episode_steps`	`int \| None`	`null`	`>= 1`	Environment-step cap per episode.
`num_rollouts`	`int \| None`	`null`	`>= 1`	Rollouts per training window.
`batch_size`	`int \| None`	`null`	`>= 1`	Training batch size.
`learning_rate`	`float \| None`	`null`	`> 0`	Optimizer learning rate.
`num_rollout_workers`	`int \| None`	`null`	`>= 1`	Ray rollout workers.
`buffer_size`	`int \| None`	`null`	`>= 1`	Experience-buffer capacity.
`checkpoint_interval`	`int \| None`	`null`	`>= 1`	Checkpoint every N learner steps.

Job response shape

TrainingJobResponse is the wire shape returned by every hosted-training endpoint. The SDK exposes the same fields under the type name TrainingJob.

Field	Type	Notes
`id`	`str`	Training-job identifier.
`organization_id`	`str`	Owning organization.
`workspace_id`	`str`	Owning workspace.
`status`	`"pending" \| "queued" \| "running" \| "completed" \| "failed" \| "cancelled"`	Current lifecycle state.
`name`	`str \| null`	Optional display name from the create request.
`backend`	`"tinker" \| "ray"`
`trainer_type`	`"sft" \| "rl"`
`algorithm`	`"grpo" \| "importance_sampling" \| "ppo" \| null`	Set on RL jobs; null for SFT.
`model`	`str`	Base model identifier.
`capability`	`TrainingCapabilitySnapshot`	Resolved capability snapshot — name, version, runtime digest.
`metrics`	`dict[str, Any]`	Scalar + series metrics. See outputs.
`artifacts`	`dict[str, Any]`	Artifact references. See outputs.
`tags`	`list[str]`	Tags carried from the create request.
`error`	`str \| null`	Top-level error string when the job settled to `failed`.
`created_at`	`str`	ISO-8601 submission time.
`started_at`	`str \| null`	ISO-8601 worker start time.
`completed_at`	`str \| null`	ISO-8601 terminal-state time.
`cancel_requested_at`	`str \| null`	ISO-8601. Set when a running job is asked to stop.

Plus the resolved refs from the create request: dataset_ref, trajectory_dataset_refs, task_ref, world_manifest_id, world_runtime_id, world_agent_name, world_goal, prompt_dataset_ref, project_ref, run_ref.

Log entry shape

TrainingJobLogEntry:

Field	Type	Notes
`timestamp`	`str`	ISO-8601.
`level`	`"debug" \| "info" \| "warning" \| "error"`
`message`	`str`	Human-readable line.
`data`	`dict[str, Any]`	Optional structured payload.