Optimization

Submit, monitor, and promote optimization jobs for capabilities and evaluation datasets.

Optimization is the hosted control plane for iterative prompt and capability improvement.

In the App IA, this is a top-level workflow surface for iterative improvement and promotion.

The page is scoped by the current workspace and project. It lists existing jobs, streams live progress for the selected run, and can open the submission form automatically when the current scope has no jobs yet.

What the page is for

Use optimization when you need to:

submit a new optimization job against a capability and dataset
monitor live status, logs, metrics, and best score
compare training and validation behavior for one run
promote improved instructions back into a new capability version

Job inputs

The submission flow is built around these inputs:

target model
capability and pinned capability version
agent name inside that capability
primary dataset and pinned dataset version
optional validation dataset and version
reward recipe
optional objective, run reference, reflection model, seed, max metric calls, and tags

If you omit project, the control plane resolves the workspace default project and persists that association on the job. It also bootstraps the project’s first runtime if the project does not already have one, but optimization jobs still remain project-scoped records rather than runtime-selected sessions.

Current reward recipes include:

exact_match_v1
contains_v1
row_reward_v1
trajectory_imitation_v1

Today the hosted path is intentionally constrained:

backend: gepa
target kind: capability_agent
optimized surface: agent instructions

That narrow scope is a feature, not a bug. It keeps the hosted path reproducible and promotion-safe.

Hosted job pipeline

The normal hosted workflow is:

resolve a published capability version and one or two published dataset versions
create a queued optimization job
let the worker provision compute and stream logs, metric points, and summary updates
inspect the frontier, best score, and candidate summary
promote the best instructions into a new capability version if the result is worth shipping

This is the important split:

the job record is the control plane
the logs, metric series, and artifacts are the evidence
promotion is the release handoff back into the capability registry

Job monitoring

The main page is split into a job list and a selected-job detail panel.

The detail view exposes the data operators usually need:

status and last event time
best score
frontier size
training and validation sizes
logs
metric sparkline cards
candidate summary and promotion preview

Jobs can be refreshed, cancelled, retried, or watched live from the same surface.

Cancelling a queued job ends it immediately. Cancelling a running job records a cancellation request and asks the worker sandbox to stop. Retry only applies to terminal jobs and requeues the same saved setup with cleared metrics and artifacts.

Promotion

Promotion is the part that turns optimization output into something reusable.

The page computes a promotion preview from the selected job and shows the difference between source instructions and optimized instructions before you publish a new capability version.

Promotion is intentionally gated. It only works for completed jobs whose best candidate actually contains promotable instructions.

That means the optimization page is not just a metrics dashboard. It is also a release surface for improved capability behavior.

What agents should assume

Optimization jobs are workspace and project scoped operational records, not registry artifacts.
Reproducibility still depends on pinned capability and dataset versions.
Promotion is the handoff point where optimized instructions become a new capability version.
The page is a hosted control plane. For submission automation, prefer the CLI or SDK once you know the exact inputs.

For automation, use CLI Optimization or SDK Optimization. Promotion writes back into Capabilities, and the pinned inputs come from Datasets.