Optimization
Submit, monitor, and promote optimization jobs for capabilities and evaluation datasets.
Optimization is the hosted control plane for iterative prompt and capability improvement.
In the App IA, this is a top-level workflow surface for iterative improvement and promotion.
The page is scoped by the current workspace and project. It lists existing jobs, streams live progress for the selected run, and can open the submission form automatically when the current scope has no jobs yet.
What the page is for
Section titled “What the page is for”Use optimization when you need to:
- submit a new optimization job against a capability and dataset
- monitor live status, logs, metrics, and best score
- compare training and validation behavior for one run
- promote improved instructions back into a new capability version
Job inputs
Section titled “Job inputs”The submission flow is built around these inputs:
- target model
- capability and pinned capability version
- agent name inside that capability
- primary dataset and pinned dataset version
- optional validation dataset and version
- reward recipe
- optional objective, run reference, reflection model, seed, max metric calls, and tags
If you omit project, the control plane resolves the workspace default project and persists that
association on the job. It also bootstraps the project’s first runtime if the project does not
already have one, but optimization jobs still remain project-scoped records rather than
runtime-selected sessions.
Current reward recipes include:
exact_match_v1contains_v1row_reward_v1trajectory_imitation_v1
Today the hosted path is intentionally constrained:
- backend:
gepa - target kind:
capability_agent - optimized surface: agent
instructions
That narrow scope is a feature, not a bug. It keeps the hosted path reproducible and promotion-safe.
Hosted job pipeline
Section titled “Hosted job pipeline”The normal hosted workflow is:
- resolve a published capability version and one or two published dataset versions
- create a queued optimization job
- let the worker provision compute and stream logs, metric points, and summary updates
- inspect the frontier, best score, and candidate summary
- promote the best instructions into a new capability version if the result is worth shipping
This is the important split:
- the job record is the control plane
- the logs, metric series, and artifacts are the evidence
- promotion is the release handoff back into the capability registry
Job monitoring
Section titled “Job monitoring”The main page is split into a job list and a selected-job detail panel.
The detail view exposes the data operators usually need:
- status and last event time
- best score
- frontier size
- training and validation sizes
- logs
- metric sparkline cards
- candidate summary and promotion preview
Jobs can be refreshed, cancelled, retried, or watched live from the same surface.
Cancelling a queued job ends it immediately. Cancelling a running job records a cancellation request and asks the worker sandbox to stop. Retry only applies to terminal jobs and requeues the same saved setup with cleared metrics and artifacts.
Promotion
Section titled “Promotion”Promotion is the part that turns optimization output into something reusable.
The page computes a promotion preview from the selected job and shows the difference between source instructions and optimized instructions before you publish a new capability version.
Promotion is intentionally gated. It only works for completed jobs whose best candidate actually
contains promotable instructions.
That means the optimization page is not just a metrics dashboard. It is also a release surface for improved capability behavior.
What agents should assume
Section titled “What agents should assume”- Optimization jobs are workspace and project scoped operational records, not registry artifacts.
- Reproducibility still depends on pinned capability and dataset versions.
- Promotion is the handoff point where optimized instructions become a new capability version.
- The page is a hosted control plane. For submission automation, prefer the CLI or SDK once you know the exact inputs.
For automation, use CLI Optimization or SDK Optimization. Promotion writes back into Capabilities, and the pinned inputs come from Datasets.