Inputs
Configure what an evaluation runs on — a flat list of task references (task_names) or rows with per-item parameters (dataset).
Every evaluation needs to know which tasks to run and with what per-item context. Pick one of two inputs:
task_names— a flat list. Each entry becomes one evaluation item.dataset— rows with per-item parameters. Each row becomes one evaluation item.
Use task_names when every run of the task should be identical. Use dataset when you need per-row inputs — different tenants, difficulties, input URLs — fed into the task through instruction templates.
task_names — flat list
Section titled “task_names — flat list”Each entry is a task reference, optionally pinned to a version:
name: nightly-regressionmodel: openai/gpt-4.1-minitask_names:An unpinned name like flag-file-http resolves to the latest visible version when the worker loads the task. Use name@version when you need a stable regression target.
dataset — per-row parameters
Section titled “dataset — per-row parameters”A dataset is a list of rows. Each row must include task_name; anything else is a per-row field the task instruction can reference:
name: regression-by-tenantmodel: openai/gpt-4.1-miniconcurrency: 4dataset: rows: tenant: acme difficulty: 1 tenant: bravo difficulty: 2 tenant: acme difficulty: 3In the task’s instruction, {{tenant}} and {{difficulty}} fill at evaluation time. Only string, int, and null row values become template variables — see Instruction templates for the resolution rules.
The CLI does not expose row data directly; use --file evaluation.yaml for dataset-backed runs.
Rules you can’t work around
Section titled “Rules you can’t work around”Two asymmetries matter:
task_nameswins. If bothtask_namesanddatasetappear in the same request, the worker usestask_namesand ignores the dataset. Pick one.- Every dataset row needs
task_name. There is no mode wheretask_namespicks the tasks anddatasetsupplies per-row inputs. A dataset-backed run must carry the task reference on every row.
Using a registry dataset as input
Section titled “Using a registry dataset as input”Registry datasets are pulled and shaped into the manifest — there’s no direct ref resolution for the dataset: field today. The common pattern:
import yamlimport dreadnode as dnfrom dreadnode.datasets import Dataset
dn.pull_package(["dataset://acme/regression-inputs:1.0.0"])ds = Dataset("acme/regression-inputs", version="1.0.0")
rows = ds.to_pandas().to_dict(orient="records")manifest = { "name": "regression", "model": "openai/gpt-4.1-mini", "dataset": {"rows": rows},}yaml.safe_dump(manifest, open("evaluation.yaml", "w"))dn evaluation create --file evaluation.yaml --waitSee Datasets → Using in code for the full registry-consumer mechanics.