Capability Optimization Loop
Improve a capability with a pinned dataset, monitor optimization jobs, and promote the best result into a new version.
Use this recipe when a published capability underperforms and you already have a pinned dataset that defines what “better” means. The loop is simple: freeze the inputs, run the hosted job, inspect the candidate, then promote only if the result survives a sanity check.
When to use this workflow
Section titled “When to use this workflow”- you need to improve a published capability rather than a local draft
- you have a repeatable dataset that defines success
- you want a new capability version as the output, not just a one-off experiment
What you need before you start
Section titled “What you need before you start”- a published Capability reference pinned as
org/name@version - the exact agent name inside that capability
- a published Dataset version, plus an optional validation dataset
- a reward recipe and target model
| Input | Why it must be pinned |
|---|---|
| capability ref | you need to know exactly which instructions are being improved |
| dataset ref | optimization should not drift as new samples are published |
| validation dataset | use it when training metrics alone are not enough |
| workspace and project | this is where the job, logs, and follow-on evaluations will live |
Recipe
Section titled “Recipe”1. Freeze the inputs
Section titled “1. Freeze the inputs”Before you submit anything:
- pin the source capability as
org/name@version - pin the dataset version instead of relying on latest
- choose the exact agent name if the capability has more than one agent
- add a validation dataset if you need stronger confidence than one training metric
2. Submit the hosted job
Section titled “2. Submit the hosted job”dn optimize submit \ --model openai/gpt-4o-mini \ --agent-name analyst \ --reward-recipe exact_match_v1 \ --objective "Find higher-signal recon plans without increasing noise." \ --waitUse the app when you want the submission form and promotion preview in one place. Use dn optimize
or /sdk/optimization/ when the inputs are already known and you want a
scriptable run.
3. Monitor the job like a job
Section titled “3. Monitor the job like a job”Check:
- live status
- best score and frontier size
- logs and artifacts
- whether training and validation behavior disagree
From the CLI:
dn optimize listdn optimize get <job-id>dn optimize wait <job-id>dn optimize logs <job-id>dn optimize artifacts <job-id>If the run is obviously wrong, cancel or retry before you think about promotion.
4. Compare the candidate before promotion
Section titled “4. Compare the candidate before promotion”Before promoting:
- verify the winning candidate improves the metric you actually care about
- check validation behavior, not just training behavior
- read the changed instructions and make sure they are understandable instead of overfit noise
The promotion preview is the release gate. Use it to review the diff between the source instructions and the optimized candidate.
5. Promote and re-evaluate
Section titled “5. Promote and re-evaluate”After a successful review:
- publish the candidate as a new capability version
- rerun the relevant evaluation workflows against that promoted version
- update downstream automation to use the new pinned capability reference
What to keep
Section titled “What to keep”- the source capability ref and dataset refs
- the optimization job ID
- the winning candidate summary and diff
- the promoted capability version and the follow-on evaluation ID
Branches and decisions
Section titled “Branches and decisions”- if the inputs are still changing, do not optimize yet; first pin the capability and dataset
- if a completed job does not produce a candidate worth promoting, treat it as a failed search, not a partial rollout
retryis useful when you want to reuse the saved inputs but clear the worker state, summary, metrics, and artifacts