Skip to content

Reward recipes

The hosted reward recipes, what each scores, their parameters, and the dataset fields they expect.

Hosted optimization jobs use a reward recipe to turn each trial’s completion into a score. Pick one by name when you submit a job:

Terminal window
dn optimize submit ... --reward-recipe exact_match_v1

Pass params as a JSON object when the recipe needs configuration:

Terminal window
dn optimize submit ... --reward-recipe contains_v1 \
--reward-params '{"needle": "Dreadnode", "reward_if_true": 1.0, "reward_if_false": 0.0}'

Every recipe receives the completion text plus the dataset row for the current trial. A recipe returns a single float reward the optimizer maximizes.

Scores 1.0 when the completion exactly matches the expected answer (after whitespace strip), 0.0 otherwise.

FieldTypeSource
params.expectedstringOptional global expected value. Falls back to the row’s expected_output.
Dataset columnexpected_output — required when params.expected is not set.

Use this when every row has a single ground-truth answer and partial matches shouldn’t count.

Scores based on whether a fixed substring appears anywhere in the completion.

FieldTypeDefaultNotes
params.needlestringRequired. The substring to look for.
params.reward_if_truefloat1.0Returned when the substring is present.
params.reward_if_falsefloat0.0Returned when the substring is absent.

The needle is global to the run — it does not read per-row fields. Reach for this when “did the agent mention this term?” is the entire metric.

Passes through a per-row reward value you’ve pre-computed and stored in the dataset.

FieldTypeSource
params.defaultfloatFallback used when a row has no reward. Defaults to 0.0.
Dataset columnreward — the per-row numeric reward the optimizer receives directly.

Use this when the metric already lives in your dataset — human labels, reward-model scores, or anything you’ve computed offline. The recipe adds nothing on top; it routes the row’s reward into the search loop.

Returns the row’s reward when the completion matches the expected output; otherwise returns a fallback value.

FieldTypeDefaultSource
params.expectedstringOptional global expected value. Falls back to the row’s expected_output.
params.reward_if_truefloat1.0Used when match succeeds and the row has no reward.
params.reward_if_falsefloat0.0Used when the completion doesn’t match.

Dataset rows need expected_output (required) and may carry a per-row reward used when the match succeeds.

Use this when you want the optimizer to imitate known-good outputs but weight rows differently (e.g. harder examples carry more reward). Rows without a stored reward fall back to reward_if_true.

Scores against a task’s declared verification.hash — the sha256 of a known-good flag. The recipe sha256’s the stripped completion and returns reward_if_true (default 1.0) on match, reward_if_false (default 0.0) otherwise.

FieldTypeDefaultNotes
params.reward_if_truefloat1.0Returned when the sha256 matches.
params.reward_if_falsefloat0.0Returned on mismatch.
Task fieldtask.verification.method must be "flag" and task.verification.hash must start with sha256:.

Use this when the task itself carries the ground truth — CTF-style tasks with a flag the agent has to produce. It does not read dataset columns; it reads the task the trial was invoked against.

You have…Reach for
Ground-truth answers per row.exact_match_v1
A single target phrase the agent should produce.contains_v1
Pre-computed rewards already in the dataset.row_reward_v1
Ground-truth outputs plus per-row weights.trajectory_imitation_v1
Flag-verified tasks (CTFs).task_verifier_v1

For anything more complex — LLM-as-judge, multi-metric composition, graders — use local search with a custom evaluator or DreadnodeAgentAdapter wired to your own scorers.