Reward recipes
The hosted reward recipes, what each scores, their parameters, and the dataset fields they expect.
Hosted optimization jobs use a reward recipe to turn each trial’s completion into a score. Pick one by name when you submit a job:
dn optimize submit ... --reward-recipe exact_match_v1Pass params as a JSON object when the recipe needs configuration:
dn optimize submit ... --reward-recipe contains_v1 \ --reward-params '{"needle": "Dreadnode", "reward_if_true": 1.0, "reward_if_false": 0.0}'Every recipe receives the completion text plus the dataset row for the current trial. A recipe returns a single float reward the optimizer maximizes.
exact_match_v1
Section titled “exact_match_v1”Scores 1.0 when the completion exactly matches the expected answer (after whitespace strip),
0.0 otherwise.
| Field | Type | Source |
|---|---|---|
params.expected | string | Optional global expected value. Falls back to the row’s expected_output. |
| Dataset column | — | expected_output — required when params.expected is not set. |
Use this when every row has a single ground-truth answer and partial matches shouldn’t count.
contains_v1
Section titled “contains_v1”Scores based on whether a fixed substring appears anywhere in the completion.
| Field | Type | Default | Notes |
|---|---|---|---|
params.needle | string | — | Required. The substring to look for. |
params.reward_if_true | float | 1.0 | Returned when the substring is present. |
params.reward_if_false | float | 0.0 | Returned when the substring is absent. |
The needle is global to the run — it does not read per-row fields. Reach for this when “did the agent mention this term?” is the entire metric.
row_reward_v1
Section titled “row_reward_v1”Passes through a per-row reward value you’ve pre-computed and stored in the dataset.
| Field | Type | Source |
|---|---|---|
params.default | float | Fallback used when a row has no reward. Defaults to 0.0. |
| Dataset column | — | reward — the per-row numeric reward the optimizer receives directly. |
Use this when the metric already lives in your dataset — human labels, reward-model scores, or anything you’ve computed offline. The recipe adds nothing on top; it routes the row’s reward into the search loop.
trajectory_imitation_v1
Section titled “trajectory_imitation_v1”Returns the row’s reward when the completion matches the expected output; otherwise returns a
fallback value.
| Field | Type | Default | Source |
|---|---|---|---|
params.expected | string | — | Optional global expected value. Falls back to the row’s expected_output. |
params.reward_if_true | float | 1.0 | Used when match succeeds and the row has no reward. |
params.reward_if_false | float | 0.0 | Used when the completion doesn’t match. |
Dataset rows need expected_output (required) and may carry a per-row reward used when the
match succeeds.
Use this when you want the optimizer to imitate known-good outputs but weight rows differently
(e.g. harder examples carry more reward). Rows without a stored reward fall back to
reward_if_true.
task_verifier_v1
Section titled “task_verifier_v1”Scores against a task’s declared verification.hash — the sha256 of a known-good flag. The recipe
sha256’s the stripped completion and returns reward_if_true (default 1.0) on match,
reward_if_false (default 0.0) otherwise.
| Field | Type | Default | Notes |
|---|---|---|---|
params.reward_if_true | float | 1.0 | Returned when the sha256 matches. |
params.reward_if_false | float | 0.0 | Returned on mismatch. |
| Task field | — | — | task.verification.method must be "flag" and task.verification.hash must start with sha256:. |
Use this when the task itself carries the ground truth — CTF-style tasks with a flag the agent has to produce. It does not read dataset columns; it reads the task the trial was invoked against.
Picking a recipe
Section titled “Picking a recipe”| You have… | Reach for |
|---|---|
| Ground-truth answers per row. | exact_match_v1 |
| A single target phrase the agent should produce. | contains_v1 |
| Pre-computed rewards already in the dataset. | row_reward_v1 |
| Ground-truth outputs plus per-row weights. | trajectory_imitation_v1 |
| Flag-verified tasks (CTFs). | task_verifier_v1 |
For anything more complex — LLM-as-judge, multi-metric composition, graders — use
local search with a custom evaluator or
DreadnodeAgentAdapter wired to your
own scorers.