Skip to content

Capability Optimization Loop

Improve a capability with a pinned dataset, monitor optimization jobs, and promote the best result into a new version.

Use this recipe when a published capability underperforms and you already have a pinned dataset that defines what “better” means. The loop is simple: freeze the inputs, run the hosted job, inspect the candidate, then promote only if the result survives a sanity check.

  • you need to improve a published capability rather than a local draft
  • you have a repeatable dataset that defines success
  • you want a new capability version as the output, not just a one-off experiment
  • a published Capability reference pinned as org/name@version
  • the exact agent name inside that capability
  • a published Dataset version, plus an optional validation dataset
  • a reward recipe and target model
InputWhy it must be pinned
capability refyou need to know exactly which instructions are being improved
dataset refoptimization should not drift as new samples are published
validation datasetuse it when training metrics alone are not enough
workspace and projectthis is where the job, logs, and follow-on evaluations will live

Before you submit anything:

  • pin the source capability as org/name@version
  • pin the dataset version instead of relying on latest
  • choose the exact agent name if the capability has more than one agent
  • add a validation dataset if you need stronger confidence than one training metric
Terminal window
dn optimize submit \
--model openai/gpt-4o-mini \
--capability acme/[email protected] \
--agent-name analyst \
--dataset acme/[email protected] \
--val-dataset acme/[email protected] \
--reward-recipe exact_match_v1 \
--objective "Find higher-signal recon plans without increasing noise." \
--wait

Use the app when you want the submission form and promotion preview in one place. Use dn optimize or /sdk/optimization/ when the inputs are already known and you want a scriptable run.

Check:

  • live status
  • best score and frontier size
  • logs and artifacts
  • whether training and validation behavior disagree

From the CLI:

Terminal window
dn optimize list
dn optimize get <job-id>
dn optimize wait <job-id>
dn optimize logs <job-id>
dn optimize artifacts <job-id>

If the run is obviously wrong, cancel or retry before you think about promotion.

Before promoting:

  • verify the winning candidate improves the metric you actually care about
  • check validation behavior, not just training behavior
  • read the changed instructions and make sure they are understandable instead of overfit noise

The promotion preview is the release gate. Use it to review the diff between the source instructions and the optimized candidate.

After a successful review:

  • publish the candidate as a new capability version
  • rerun the relevant evaluation workflows against that promoted version
  • update downstream automation to use the new pinned capability reference
  • the source capability ref and dataset refs
  • the optimization job ID
  • the winning candidate summary and diff
  • the promoted capability version and the follow-on evaluation ID
  • if the inputs are still changing, do not optimize yet; first pin the capability and dataset
  • if a completed job does not produce a candidate worth promoting, treat it as a failed search, not a partial rollout
  • retry is useful when you want to reuse the saved inputs but clear the worker state, summary, metrics, and artifacts