Optimization
Improve prompts, agent instructions, and capability behavior with local searches or hosted GEPA jobs.
Optimization answers the question: “Can I make this agent measurably better at this task?”
You hold the task, dataset, and scorer fixed, then let a search loop propose better prompts, instructions, or configurations and score each candidate against the metric you already trust. The output is a candidate you can ship — a new prompt, a new set of agent instructions, or a new capability version.
Don’t start optimizing until you trust the thing that measures quality. If your dataset or scorer is still moving, optimization will just fit to the noise.
Pick a mode
Section titled “Pick a mode”| Mode | Reach for it when | Driver |
|---|---|---|
| Local search | You’re iterating on a prompt, scorer, or dataset in a notebook. | dn.optimize_anything(...) in your SDK. |
| Capability improvement | You have a local capability directory and want a promotable candidate. | dn capability improve — CLI, on-machine. |
| Hosted jobs | The capability and dataset are published and you want platform-managed runs. | dn optimize submit — CLI + hosted GEPA. |
All three share the same vocabulary (candidate, trial, sampler, evaluator) and the same GEPA backend for instruction search. What changes is where the loop runs and how stable the inputs have to be before you commit.
Scoring against a dataset vs scoring against a live sandbox
Section titled “Scoring against a dataset vs scoring against a live sandbox”A fourth axis cuts across the modes above: what the reward is actually measured against.
- Dataset scoring — the agent produces text, a reward recipe (or a scorer you wrote) grades that text against the dataset row. All three modes above default to this.
- Sandbox scoring — each trial provisions a fresh task environment, the
agent runs against it, and a scorer reads the sandbox (flag file, service state, files on disk)
to decide if the trial passed. Use this when “better” is a property of the environment, not the
agent’s text output. The SDK entry point is
CapabilityEnvAdapter; the hosted entry point is atarget_kind="capability_env"job. The task-environment optimization guide walks the local-to-hosted scenario end to end.
Where to go next
Section titled “Where to go next”Related topics
Section titled “Related topics”Optimization builds on work from neighboring topics:
- Scorers and datasets define what “better” means. Build them before you optimize, not after.
- Capabilities hold the agent and instructions that hosted jobs and
dn capability improvepromote into a new version. - Training takes over when prompt and instruction optimization stops paying off and you need to change model weights.