Scorers
Turn outputs into metrics with built-in scorers, composition algebra, and custom scoring functions.
from dreadnode.scorers import contains, detect_pii, system_prompt_leaked
mentions_platform = contains("dreadnode")pii_risk = detect_pii()prompt_leak = system_prompt_leaked()A scorer turns an output into a Metric. Use them to check that the agent’s response contained
the required content, didn’t leak secrets or PII, meets a pass/fail gate, or rolls up to a
single quality-and-safety number you can compare across runs.
Scorers are Python-first and live in the SDK. They plug into local evaluations, agent hooks, and optimization studies — the same scorer can serve as a metric in one context and a gate in another.
Built-in scorers
Section titled “Built-in scorers”The Python SDK ships with 100+ scorers across categories like security, PII detection, exfiltration, MCP/agentic safety, reasoning, and IDE workflows. Start with built-ins — they stay consistent across evaluations and are less likely to drift than one-off local scoring logic.
Use built-ins first. They are easier to compare across evaluations and less likely to drift than one-off local scoring logic.
Composition algebra
Section titled “Composition algebra”Combine scorers with operators and helpers:
&/|/~for logical composition+/-/*for arithmetic composition>>///to rename scorers (log all vs log primary)threshold(),normalize(),invert(),remap_range(),scale(),clip(),weighted_avg()
import dreadnode as dnfrom dreadnode.scorers import contains, detect_pii, normalize, weighted_avg
mentions = contains("agent")quality = normalize(mentions, known_max=1.0)safety = ~detect_pii()
overall = weighted_avg((quality, 0.6), (safety, 0.4)) >> "overall_score"combined = (quality & safety) // "quality_and_safety"The usual pattern is:
- build a few narrow scorers
- normalize them onto a comparable scale
- combine them into one or two rollout metrics that are easy to reason about
Threshold conditions for hooks
Section titled “Threshold conditions for hooks”Use scorer thresholds in agent hooks and conditions with .above(), .below(),
or .as_condition():
from dreadnode.scorers import contains
quality = contains("well-structured")must_pass = quality.above(0.5)just_record = quality.as_condition()Thresholds are especially useful when you want one scorer to do double duty:
- as a numeric metric in evaluations
- as a gate in hooks, reactions, or stop conditions
Build a custom scorer
Section titled “Build a custom scorer”import dreadnode as dn
@dn.scorer(name="length_bonus")def length_bonus(text: str) -> float: return 1.0 if len(text) > 120 else 0.0
metric = await length_bonus.score("Short response.")print(metric.value)Good custom scorers are:
- deterministic
- cheap enough to run repeatedly
- clearly bounded or normalized when they will be combined with other metrics
- named in a way that will still make sense in logs and evaluation summaries
If a scorer is intended to be a hard pass/fail condition, either wrap it with threshold(...) or
use assert_scores in the evaluation layer so the outcome is explicit.