Skip to content

Optimization

Improve prompts, agent instructions, and capability behavior with local searches or hosted GEPA jobs.

Optimization answers the question: “Can I make this agent measurably better at this task?”

You hold the task, dataset, and scorer fixed, then let a search loop propose better prompts, instructions, or configurations and score each candidate against the metric you already trust. The output is a candidate you can ship — a new prompt, a new set of agent instructions, or a new capability version.

Don’t start optimizing until you trust the thing that measures quality. If your dataset or scorer is still moving, optimization will just fit to the noise.

ModeReach for it whenDriver
Local searchYou’re iterating on a prompt, scorer, or dataset in a notebook.dn.optimize_anything(...) in your SDK.
Capability improvementYou have a local capability directory and want a promotable candidate.dn capability improve — CLI, on-machine.
Hosted jobsThe capability and dataset are published and you want platform-managed runs.dn optimize submit — CLI + hosted GEPA.

All three share the same vocabulary (candidate, trial, sampler, evaluator) and the same GEPA backend for instruction search. What changes is where the loop runs and how stable the inputs have to be before you commit.

Scoring against a dataset vs scoring against a live sandbox

Section titled “Scoring against a dataset vs scoring against a live sandbox”

A fourth axis cuts across the modes above: what the reward is actually measured against.

  • Dataset scoring — the agent produces text, a reward recipe (or a scorer you wrote) grades that text against the dataset row. All three modes above default to this.
  • Sandbox scoring — each trial provisions a fresh task environment, the agent runs against it, and a scorer reads the sandbox (flag file, service state, files on disk) to decide if the trial passed. Use this when “better” is a property of the environment, not the agent’s text output. The SDK entry point is CapabilityEnvAdapter; the hosted entry point is a target_kind="capability_env" job. The task-environment optimization guide walks the local-to-hosted scenario end to end.

Optimization builds on work from neighboring topics:

  • Scorers and datasets define what “better” means. Build them before you optimize, not after.
  • Capabilities hold the agent and instructions that hosted jobs and dn capability improve promote into a new version.
  • Training takes over when prompt and instruction optimization stops paying off and you need to change model weights.