Optimization
Submit, inspect, wait on, and retry hosted optimization jobs from the dn CLI.
dn optimize ... is the hosted optimization control-plane surface. Use it when the capability,
dataset, and reward recipe already exist and you want the platform to run the job.
If you are still iterating on a local capability, use dn capability improve first. That command
optimizes capability-owned local files against local datasets and leaves behind a candidate bundle
plus ledger. dn optimize ... is for the published, hosted path.
What hosted CLI optimization is for
Section titled “What hosted CLI optimization is for”Today the hosted CLI path is intentionally narrow:
- backend:
gepa - target kind:
capability_agent - optimized component: agent
instructions
That is useful when you want platform-managed prompt or instruction improvement, not arbitrary local search.
Before you submit an optimization job
Section titled “Before you submit an optimization job”Hosted optimization is intentionally opinionated. The cleanest way to think about it is:
- pick a published capability
- pick the agent inside that capability whose instructions should change
- pick a published dataset
- pick a hosted reward recipe that scores the outputs
If any of those ingredients are still unstable, the SDK is usually a better place to experiment first.
That is the main boundary between the two CLI surfaces:
dn capability improveis local, stack-aware, and capability-scopeddn capability improvecan optionally use a proposer capability to suggest edits while the CLI still owns scoring and acceptancedn optimize submitis hosted, published-artifact-based, and instruction-only today
Submit an optimization job
Section titled “Submit an optimization job”dn optimize submit \ --server http://127.0.0.1:8000 \ --api-key "$DREADNODE_API_KEY" \ --organization dreadnode \ --workspace localdev \ --project default \ --model openai/gpt-4o-mini \ --agent-name assistant \ --reward-recipe exact_match_v1 \ --objective "Improve instruction quality without increasing verbosity." \ --max-metric-calls 100 \ --max-trials 10 \ --max-trials-without-improvement 3 \ --max-runtime-sec 1800 \ --reflection-lm gpt-5-mini \ --wait \ --jsonWhat that command is doing:
- it optimizes the selected agent’s
instructions, not model weights --datasetis the training set used during search--val-datasetis the held-out set for checking whether the improvement generalizes--reward-recipedefines how each candidate is scored--reflection-lmcontrols the model used during reflection steps, which can be different from the target model being improved
The flags that matter most
Section titled “The flags that matter most”| Flag | Description |
|---|---|
--capability NAME@VERSION | capability artifact containing the target agent |
--agent-name <name> | required when the capability exports multiple agents |
--dataset NAME@VERSION | training dataset used during optimization |
--val-dataset NAME@VERSION | optional held-out validation dataset |
--reward-recipe <name> | declarative hosted reward recipe |
--reward-params <json> | JSON params passed to the reward recipe |
--seed <n> | deterministic optimization seed |
--max-metric-calls <n> | metric-call budget |
--max-trials <n> | hard stop after this many trials |
--max-trials-without-improvement <n> | stop after this many finished trials without a new best |
--max-runtime-sec <n> | outer hosted sandbox lifetime override |
--reflection-lm <model> | reflection model override; defaults to --model |
--no-capture-traces | disable trajectory capture for reflection |
--wait | poll until terminal state |
--json | print the full job payload |
How to think about the stopping controls
Section titled “How to think about the stopping controls”These three flags solve different problems:
--max-metric-callslimits scoring budget--max-trialslimits search length--max-trials-without-improvementstops stagnant jobs that keep looping without a better result
If the job is already near-perfect but still iterating, --max-trials-without-improvement is
usually the most useful brake.
After the job starts
Section titled “After the job starts”Once the job exists, use the control-plane commands for different layers of inspection:
dn optimize listdn optimize get <job-id>dn optimize wait <job-id> --jsondn optimize logs <job-id>dn optimize artifacts <job-id>dn optimize cancel <job-id> --jsondn optimize retry <job-id>Use them like this:
listfinds old or in-flight jobsgetshows the saved config and top-level statuswaitis the simplest way to block until a terminal outcomelogstells you what the optimization loop is currently doingartifactsis where to look for outputs worth reusingretryreruns a terminal job when you want the same setup again
dn optimize wait exits non-zero if the job ends in failed or cancelled.
Read the result, not just the status
Section titled “Read the result, not just the status”A completed job only tells you that the hosted loop finished. It does not tell you whether the result is useful.
After a successful run, check:
- whether the best score actually improved
- whether validation stayed strong, not just training
- whether the artifacts contain instructions you would really want to ship
When sandboxes matter
Section titled “When sandboxes matter”Hosted optimization runs inside real sandboxes. If the job state and the underlying compute seem out of sync, inspect the compute directly:
dn sandbox list --state runningdn sandbox get <provider-sandbox-id>dn sandbox logs <provider-sandbox-id>dn sandbox delete --yes <provider-sandbox-id>See /cli/sandboxes/ for the compute view.
Practical rule
Section titled “Practical rule”Use dn optimize submit only after:
- the capability is already published
- the dataset is already published
- the reward recipe is already known
If you are still iterating locally on the metric or the candidate shape, the SDK is usually the better place to experiment first.