Reward recipes

The hosted reward recipes, what each scores, their parameters, and the dataset fields they expect.

Hosted optimization jobs use a reward recipe to turn each trial’s completion into a score. Pick one by name when you submit a job:

dn optimize submit ... --reward-recipe exact_match_v1

Pass params as a JSON object when the recipe needs configuration:

dn optimize submit ... --reward-recipe contains_v1 \
  --reward-params '{"needle": "Dreadnode", "reward_if_true": 1.0, "reward_if_false": 0.0}'

Every recipe receives the completion text plus the dataset row for the current trial. A recipe returns a single float reward the optimizer maximizes.

`exact_match_v1`

Scores 1.0 when the completion exactly matches the expected answer (after whitespace strip), 0.0 otherwise.

Field	Type	Source
`params.expected`	string	Optional global expected value. Falls back to the row’s `expected_output`.
Dataset column	—	`expected_output` — required when `params.expected` is not set.

Use this when every row has a single ground-truth answer and partial matches shouldn’t count.

`contains_v1`

Scores based on whether a fixed substring appears anywhere in the completion.

Field	Type	Default	Notes
`params.needle`	string	—	Required. The substring to look for.
`params.reward_if_true`	float	`1.0`	Returned when the substring is present.
`params.reward_if_false`	float	`0.0`	Returned when the substring is absent.

The needle is global to the run — it does not read per-row fields. Reach for this when “did the agent mention this term?” is the entire metric.

`row_reward_v1`

Passes through a per-row reward value you’ve pre-computed and stored in the dataset.

Field	Type	Source
`params.default`	float	Fallback used when a row has no `reward`. Defaults to `0.0`.
Dataset column	—	`reward` — the per-row numeric reward the optimizer receives directly.

Use this when the metric already lives in your dataset — human labels, reward-model scores, or anything you’ve computed offline. The recipe adds nothing on top; it routes the row’s reward into the search loop.

`trajectory_imitation_v1`

Returns the row’s reward when the completion matches the expected output; otherwise returns a fallback value.

Field	Type	Default	Source
`params.expected`	string	—	Optional global expected value. Falls back to the row’s `expected_output`.
`params.reward_if_true`	float	`1.0`	Used when match succeeds and the row has no `reward`.
`params.reward_if_false`	float	`0.0`	Used when the completion doesn’t match.

Dataset rows need expected_output (required) and may carry a per-row reward used when the match succeeds.

Use this when you want the optimizer to imitate known-good outputs but weight rows differently (e.g. harder examples carry more reward). Rows without a stored reward fall back to reward_if_true.

`task_verifier_v1`

Scores against a task’s declared verification.hash — the sha256 of a known-good flag. The recipe sha256’s the stripped completion and returns reward_if_true (default 1.0) on match, reward_if_false (default 0.0) otherwise.

Field	Type	Default	Notes
`params.reward_if_true`	float	`1.0`	Returned when the sha256 matches.
`params.reward_if_false`	float	`0.0`	Returned on mismatch.
Task field	—	—	`task.verification.method` must be `"flag"` and `task.verification.hash` must start with `sha256:`.

Use this when the task itself carries the ground truth — CTF-style tasks with a flag the agent has to produce. It does not read dataset columns; it reads the task the trial was invoked against.

Picking a recipe

You have…	Reach for
Ground-truth answers per row.	`exact_match_v1`
A single target phrase the agent should produce.	`contains_v1`
Pre-computed rewards already in the dataset.	`row_reward_v1`
Ground-truth outputs plus per-row weights.	`trajectory_imitation_v1`
Flag-verified tasks (CTFs).	`task_verifier_v1`

For anything more complex — LLM-as-judge, multi-metric composition, graders — use local search with a custom evaluator or DreadnodeAgentAdapter wired to your own scorers.