Manifest reference
Every Tinker SFT and RL config field, validation rule, and default.
Exhaustive reference for every training-job request and config field. CLI flags map onto these
one-for-one — the CLI surface lives on the auto-generated dn train page.
Request wrapper
Section titled “Request wrapper”Every hosted training request carries the same base fields:
| Field | Type | Default | Notes |
|---|---|---|---|
name | str | None | null | Optional job display name. |
model | str | — | Required. Base model or adapter target. |
project_ref | str | None | null | Workspace project key. Defaults to the workspace default. |
run_ref | str | None | null | Optional run association for lineage. |
capability_ref | CapabilityRef | — | Required. Versioned capability snapshot to train against. |
tags | list[str] | [] | Optional tag list. |
backend | literal | — | tinker or ray — set by the request class. |
trainer_type | literal | — | sft or rl — set by the request class. |
config | trainer config | — | Required. Trainer-specific config object (tables below). |
CapabilityRef, DatasetRef, RewardRecipe, and WorldRewardPolicy are all
{ name, params? } / { name, version } shapes:
| Model | Fields |
|---|---|
CapabilityRef | name: str, version: str |
DatasetRef | name: str, version: str |
RewardRecipe | name: str, params: dict (default {}) |
WorldRewardPolicy | name: str, params: dict (default {}) — see reward recipes for preset names and component composition. |
CreateTinkerSFTJobRequest
Section titled “CreateTinkerSFTJobRequest”Hosted SFT on the Tinker backend. backend is "tinker", trainer_type is "sft", config
is TinkerSFTJobConfig (below).
TinkerSFTJobConfig
Section titled “TinkerSFTJobConfig”| Field | Type | Default | Constraint | Notes |
|---|---|---|---|---|
dataset_ref | DatasetRef | None | null | — | Supervised training dataset. |
trajectory_dataset_refs | list[DatasetRef] | [] | — | Worlds trajectory datasets. Repeatable. |
eval_dataset_ref | DatasetRef | None | null | — | Optional eval corpus; enables post-training eval loss. |
max_sequence_length | int | None | null | >= 1 | Tokenization cap per example. |
batch_size | int | None | null | >= 1 | Per-step batch size. |
gradient_accumulation_steps | int | None | null | >= 1 | Optimizer accumulation steps. |
learning_rate | float | None | null | > 0 | Optimizer learning rate. |
steps | int | None | null | >= 1 | Maximum optimizer steps. |
epochs | int | None | null | >= 1 | Maximum passes over the training set. |
lora_rank | int | None | null | >= 1 | LoRA rank override. |
lora_alpha | int | None | null | >= 1 | LoRA alpha override. |
checkpoint_interval | int | None | null | >= 1 | Checkpoint every N optimizer steps. |
Validation (at submit time):
- At least one source is required:
dataset_ref, one or moretrajectory_dataset_refs, or both. The trainer ETL-merges the inputs when both are set.
CreateTinkerRLJobRequest
Section titled “CreateTinkerRLJobRequest”Hosted RL on the Tinker backend. backend is "tinker", trainer_type is "rl", config is
TinkerRLJobConfig (below).
TinkerRLJobConfig
Section titled “TinkerRLJobConfig”| Field | Type | Default | Constraint | Notes |
|---|---|---|---|---|
algorithm | "importance_sampling" | "ppo" | — | — | Required. |
task_ref | str | None | null | — | name for latest or name@version for a pinned version. |
world_manifest_id | str | None | null | — | Worlds manifest for live rollouts. |
world_runtime_id | str | None | null | — | Runtime whose capability binding provides the rollout agent. |
world_agent_name | str | None | null | — | Agent selection inside the runtime-bound capability. |
world_goal | str | None | null | — | Goal prompt override for live rollouts. |
prompt_dataset_ref | DatasetRef | None | null | — | Prompt dataset for verifier-driven RL. |
trajectory_dataset_refs | list[DatasetRef] | [] | — | Worlds trajectory datasets for offline RL. |
reward_recipe | RewardRecipe | None | null | — | Server-side completion reward. See reward recipes. |
world_reward | WorldRewardPolicy | None | null | — | SDK-side trajectory shaping for live Worlds rollouts. |
execution_mode | "sync" | "one_step_off_async" | "fully_async" | sync | — | Rollout-group scheduler mode. |
prompt_split | str | None | null | — | Dataset split used for prompt sampling. |
steps | int | None | null | >= 1 | Number of optimizer steps. |
lora_rank | int | None | null | >= 1 | LoRA rank override. |
max_turns | int | None | null | >= 1 | Maximum agent turns per episode. |
max_episode_steps | int | None | null | >= 1 | Maximum environment steps per episode. |
num_rollouts | int | None | null | >= 1 | Rollouts per training window. |
batch_size | int | None | null | >= 1 | Training batch size. |
learning_rate | float | None | null | > 0 | Optimizer learning rate. |
weight_sync_interval | int | None | null | >= 1 | Sampler weight sync, in optimizer steps. |
max_steps_off_policy | int | None | null | >= 1 | Rollout staleness budget for async modes. |
max_new_tokens | int | None | null | >= 1 | Per-completion sampling cap. |
temperature | float | None | null | >= 0 | Sampling temperature. |
stop | list[str] | None | null | — | Stop sequences. |
checkpoint_interval | int | None | null | >= 1 | Checkpoint every N optimizer steps. |
Validation (at submit time):
- At least one input required:
prompt_dataset_ref,world_manifest_id, or one or moretrajectory_dataset_refs. world_runtime_idrequiresworld_manifest_id.world_agent_namerequiresworld_runtime_id.execution_mode != "sync"requiresmax_steps_off_policy.execution_mode == "one_step_off_async"forcesmax_steps_off_policy == 1.
CreateRayGRPOJobRequest
Section titled “CreateRayGRPOJobRequest”Ray-backed GRPO. backend is "ray", trainer_type is "rl", config is
RayGRPOJobConfig.
RayGRPOJobConfig
Section titled “RayGRPOJobConfig”| Field | Type | Default | Constraint | Notes |
|---|---|---|---|---|
algorithm | "grpo" | "grpo" | — | Only GRPO is modelled on this config. |
task_ref | str | — | — | Required. |
prompt_dataset_ref | DatasetRef | — | — | Required. |
reward_recipe | RewardRecipe | None | null | — | See reward recipes. |
execution_mode | "async" | "colocated" | "distributed" | async | — | Ray scheduling mode. |
max_turns | int | None | null | >= 1 | Maximum agent turns per episode. |
max_episode_steps | int | None | null | >= 1 | Environment-step cap per episode. |
num_rollouts | int | None | null | >= 1 | Rollouts per training window. |
batch_size | int | None | null | >= 1 | Training batch size. |
learning_rate | float | None | null | > 0 | Optimizer learning rate. |
num_rollout_workers | int | None | null | >= 1 | Ray rollout workers. |
buffer_size | int | None | null | >= 1 | Experience-buffer capacity. |
checkpoint_interval | int | None | null | >= 1 | Checkpoint every N learner steps. |
Job response shape
Section titled “Job response shape”TrainingJobResponse is the wire shape returned by every hosted-training endpoint. The SDK
exposes the same fields under the type name TrainingJob.
| Field | Type | Notes |
|---|---|---|
id | str | Training-job identifier. |
organization_id | str | Owning organization. |
workspace_id | str | Owning workspace. |
status | "pending" | "queued" | "running" | "completed" | "failed" | "cancelled" | Current lifecycle state. |
name | str | null | Optional display name from the create request. |
backend | "tinker" | "ray" | |
trainer_type | "sft" | "rl" | |
algorithm | "grpo" | "importance_sampling" | "ppo" | null | Set on RL jobs; null for SFT. |
model | str | Base model identifier. |
capability | TrainingCapabilitySnapshot | Resolved capability snapshot — name, version, runtime digest. |
metrics | dict[str, Any] | Scalar + series metrics. See outputs. |
artifacts | dict[str, Any] | Artifact references. See outputs. |
tags | list[str] | Tags carried from the create request. |
error | str | null | Top-level error string when the job settled to failed. |
created_at | str | ISO-8601 submission time. |
started_at | str | null | ISO-8601 worker start time. |
completed_at | str | null | ISO-8601 terminal-state time. |
cancel_requested_at | str | null | ISO-8601. Set when a running job is asked to stop. |
Plus the resolved refs from the create request: dataset_ref, trajectory_dataset_refs,
task_ref, world_manifest_id, world_runtime_id, world_agent_name, world_goal,
prompt_dataset_ref, project_ref, run_ref.
Log entry shape
Section titled “Log entry shape”TrainingJobLogEntry:
| Field | Type | Notes |
|---|---|---|
timestamp | str | ISO-8601. |
level | "debug" | "info" | "warning" | "error" | |
message | str | Human-readable line. |
data | dict[str, Any] | Optional structured payload. |