Capability Optimization Loop

Improve a capability with a pinned dataset, monitor optimization jobs, and promote the best result into a new version.

Use this recipe when a published capability underperforms and you already have a pinned dataset that defines what “better” means. The loop is simple: freeze the inputs, run the hosted job, inspect the candidate, then promote only if the result survives a sanity check.

When to use this workflow

you need to improve a published capability rather than a local draft
you have a repeatable dataset that defines success
you want a new capability version as the output, not just a one-off experiment

What you need before you start

a published Capability reference pinned as org/name@version
the exact agent name inside that capability
a published Dataset version, plus an optional validation dataset
a reward recipe and target model

Input	Why it must be pinned
capability ref	you need to know exactly which instructions are being improved
dataset ref	optimization should not drift as new samples are published
validation dataset	use it when training metrics alone are not enough
workspace and project	this is where the job, logs, and follow-on evaluations will live

Recipe

1. Freeze the inputs

Before you submit anything:

pin the source capability as org/name@version
pin the dataset version instead of relying on latest
choose the exact agent name if the capability has more than one agent
add a validation dataset if you need stronger confidence than one training metric

2. Submit the hosted job

dn optimize submit \
  --model openai/gpt-4o-mini \
  --capability acme/[email protected] \
  --agent-name analyst \
  --dataset acme/[email protected] \
  --val-dataset acme/[email protected] \
  --reward-recipe exact_match_v1 \
  --objective "Find higher-signal recon plans without increasing noise." \
  --wait

Use the app when you want the submission form and promotion preview in one place. Use dn optimize or /sdk/optimization/ when the inputs are already known and you want a scriptable run.

3. Monitor the job like a job

Check:

live status
best score and frontier size
logs and artifacts
whether training and validation behavior disagree

From the CLI:

dn optimize list
dn optimize get <job-id>
dn optimize wait <job-id>
dn optimize logs <job-id>
dn optimize artifacts <job-id>

If the run is obviously wrong, cancel or retry before you think about promotion.

4. Compare the candidate before promotion

Before promoting:

verify the winning candidate improves the metric you actually care about
check validation behavior, not just training behavior
read the changed instructions and make sure they are understandable instead of overfit noise

The promotion preview is the release gate. Use it to review the diff between the source instructions and the optimized candidate.

5. Promote and re-evaluate

After a successful review:

publish the candidate as a new capability version
rerun the relevant evaluation workflows against that promoted version
update downstream automation to use the new pinned capability reference

What to keep

the source capability ref and dataset refs
the optimization job ID
the winning candidate summary and diff
the promoted capability version and the follow-on evaluation ID

Branches and decisions

if the inputs are still changing, do not optimize yet; first pin the capability and dataset
if a completed job does not produce a candidate worth promoting, treat it as a failed search, not a partial rollout
retry is useful when you want to reuse the saved inputs but clear the worker state, summary, metrics, and artifacts

Optimization

Capabilities

Datasets

Evaluations

SDK Optimization