Scorers Reference
130+ scorers across 34 modules for detecting jailbreaks, data leakage, tool abuse, reasoning attacks, and compliance violations.
Scorers evaluate attack outcomes - did the target jailbreak? Did it leak PII? Did an agent execute a poisoned tool? Every attack uses scorers automatically, and you can compose custom scoring pipelines for specialized detection.
Agentic workflow (15 scorers)
Section titled “Agentic workflow (15 scorers)”Module: dreadnode.scorers.agentic_workflow
Detect attacks against agent workflow orchestration.
| Scorer | What it detects |
|---|---|
phase_bypass_detected | Attempts to bypass phase transition approval |
phase_downgrade_detected | Downgrade from post-exploitation to exploitation |
tool_restriction_bypass_detected | Bypass of tool access restrictions |
sql_injection_via_nlp_detected | SQL injection through NLP processing |
cypher_injection_detected | Graph database query injection |
malformed_json_injection_detected | Malformed JSON injection |
mode_confusion_detected | Mode confusion attacks |
intent_manipulation_detected | Intent manipulation in workflows |
success_indicator_spoofing_detected | Spoofing of success indicators |
todo_list_manipulation_detected | Manipulation of task lists |
tool_priority_manipulation_detected | Tool priority/ordering manipulation |
session_state_poisoning_detected | Session state poisoning |
wordlist_exhaustion_detected | Wordlist exhaustion attacks |
workflow_disruption_detected | Disruption of workflow execution |
agentic_workflow_attack | Generic agentic workflow attack detector |
Agent security (9 scorers)
Section titled “Agent security (9 scorers)”Module: dreadnode.scorers.agent_security
Detect attacks against agent infrastructure and identity.
| Scorer | What it detects |
|---|---|
agent_identity_leaked | Leakage of agent identity and configuration |
skill_integrity_compromised | Compromised skill package integrity |
agent_config_tampered | Tampering with agent configuration |
heartbeat_manipulation | Manipulation of agent heartbeat mechanisms |
bootstrap_hook_injected | Injected bootstrap hooks in agent startup |
skill_supply_chain_attack | Skill supply chain attack indicators |
memory_injection_detected | False memory injection |
permission_escalation_detected | Permission escalation attempts |
workspace_poisoning_detected | Workspace file poisoning |
Agentic tool use (9 scorers)
Section titled “Agentic tool use (9 scorers)”Module: dreadnode.scorers.agentic
Score agent tool usage for safety and manipulation detection.
| Scorer | What it detects |
|---|---|
tool_invoked | Whether a specific tool was invoked |
any_tool_invoked | Whether any of specified tools were invoked |
tool_count | Number of tools invoked |
tool_selection_safety | Agent tool selection safety (OWASP ASI02) |
dangerous_tool_args | Dangerous argument patterns in tool calls |
tool_sequence | Whether tools were called in a specific sequence |
indirect_injection_success | Indirect injection payload influenced output |
cascade_propagation | Attack propagation distance through multi-agent system |
mcp_tool_manipulation | MCP tool description poisoning affected behavior |
Advanced jailbreak detection (8 scorers)
Section titled “Advanced jailbreak detection (8 scorers)”Module: dreadnode.scorers.advanced_jailbreak_detection
Detect sophisticated jailbreak techniques.
| Scorer | What it detects |
|---|---|
likert_exploitation_detected | Likert-scale evaluation framing to bypass filters |
prefill_bypass_detected | Prefill/affirmative-start bypass attempts |
fictional_framing_detected | Deep fictional immersion and nested role-play |
pipeline_manipulation_detected | LLM processing pipeline manipulation |
guardrail_dos_detected | Guardrail denial-of-service patterns |
invisible_character_detected | Invisible Unicode characters bypassing filters |
memory_poisoning_detected | Agent memory or persistent state poisoning |
tool_chain_attack_detected | Structured tool-chain escalation attacks |
MCP security (7 scorers)
Section titled “MCP security (7 scorers)”Module: dreadnode.scorers.mcp_security
Detect attacks against the Model Context Protocol layer.
| Scorer | What it detects |
|---|---|
tool_description_poisoned | Poisoned instructions in MCP tool descriptions |
cross_server_shadow_detected | Cross-server tool shadowing |
rug_pull_detected | MCP rug pull attacks |
tool_output_injected | Injection into tool output handling |
schema_poisoned | Poisoned tool schemas |
ansi_cloaking_detected | ANSI escape cloaking in tool descriptions |
sampling_injection_detected | Sampling parameter injection |
Multi-agent security (6 scorers)
Section titled “Multi-agent security (6 scorers)”Module: dreadnode.scorers.multi_agent_security
Detect inter-agent attacks and trust boundary violations.
| Scorer | What it detects |
|---|---|
prompt_infection_detected | Self-replicating prompt infection patterns |
agent_spoofing_detected | Agent spoofing/identity fraud |
consensus_poisoned | Consensus poisoning attacks |
delegation_exploit_detected | Delegation chain exploitation |
session_smuggling_detected | Session smuggling in agent-to-agent communication |
agent_config_overwrite_detected | Agent configuration overwriting |
Reasoning security (5 scorers)
Section titled “Reasoning security (5 scorers)”Module: dreadnode.scorers.reasoning_security
Detect attacks against chain-of-thought and reasoning models.
| Scorer | What it detects |
|---|---|
cot_backdoor_detected | Poisoned reasoning steps in chain-of-thought |
reasoning_hijack_detected | Reasoning hijacking attacks |
reasoning_dos_detected | Reasoning denial-of-service |
escalation_detected | Multi-turn escalation (Crescendo, FITD) |
goal_drift_detected | Goal drift in agent systems |
IDE security (5 scorers)
Section titled “IDE security (5 scorers)”Module: dreadnode.scorers.ide_security
Detect attacks targeting coding assistants and IDE integrations.
| Scorer | What it detects |
|---|---|
config_persistence | Configuration file modifications for persistence |
rug_pull_detection | Rug pull attacks on coding assistants |
shadowing_detection | Tool shadowing attacks |
tool_squatting | Tool squatting attacks |
covert_exfiltration | Covert exfiltration via IDE |
Documentation security (5 scorers)
Section titled “Documentation security (5 scorers)”Module: dreadnode.scorers.documentation_security
Detect documentation-based injection and exfiltration.
| Scorer | What it detects |
|---|---|
hidden_documentation_injection | Hidden instructions in docs targeting AI |
env_var_exfiltration | Environment variable exfiltration via docs |
favicon_exfiltration | Favicon-based data exfiltration |
resource_hint_exfil | Resource hint-based exfiltration |
package_readme_poisoning | Poisoning of package README files |
Text pattern detection (5 scorers)
Section titled “Text pattern detection (5 scorers)”Module: dreadnode.scorers.contains
Pattern-based content detection.
| Scorer | What it detects |
|---|---|
contains | Whether output contains a specific string or regex |
detect_refusal | Refusal patterns in model output |
detect_ansi_escapes | ANSI escape codes in output |
detect_unsafe_shell_content | Unsafe shell commands/paths |
detect_sensitive_keywords | Sensitive keywords (passwords, API keys) |
Exfiltration detection (4 scorers)
Section titled “Exfiltration detection (4 scorers)”Module: dreadnode.scorers.exfiltration_detection
Detect data exfiltration through covert channels.
| Scorer | What it detects |
|---|---|
markdown_exfil_detected | Markdown image/link-based exfiltration |
unicode_exfil_detected | Unicode-based exfiltration |
dns_exfil_detected | DNS tunneling exfiltration |
ssrf_exfil_detected | SSRF-based exfiltration |
PII and credentials (3 scorers)
Section titled “PII and credentials (3 scorers)”Modules: dreadnode.scorers.pii, dreadnode.scorers.credentials
| Scorer | What it detects |
|---|---|
detect_pii | PII patterns (email, phone, IP, SSN) via regex |
detect_pii_with_presidio | PII using Microsoft Presidio analyzer |
credential_leakage | API keys, tokens, passwords (16 credential patterns) |
System prompt leakage (1 scorer)
Section titled “System prompt leakage (1 scorer)”Module: dreadnode.scorers.prompt_leak
| Scorer | What it detects |
|---|---|
system_prompt_leaked | System prompt content in model output |
LLM-based scoring (2 scorers)
Section titled “LLM-based scoring (2 scorers)”Modules: dreadnode.scorers.judge, dreadnode.scorers.harm
| Scorer | What it detects |
|---|---|
llm_judge | LLM-based semantic judgment (configurable criteria) |
detect_harm_with_openai | Harmful content via OpenAI moderation API |
Text classification (2 scorers)
Section titled “Text classification (2 scorers)”Module: dreadnode.scorers.classification
| Scorer | What it detects |
|---|---|
zero_shot_classification | Zero-shot text classification |
detect_refusal_with_zero_shot | Refusal detection via zero-shot classifier |
Attack outcome (4 scorers)
Section titled “Attack outcome (4 scorers)”Module: dreadnode.scorers.attack_outcome
Evaluate the practical impact of successful attacks.
| Scorer | What it detects |
|---|---|
malicious_intent_fulfilled | Whether the model’s output fulfills the attacker’s malicious intent |
practical_outcome | Whether the output has practical real-world utility for harm |
cumulative_harm | Cumulative harm across multi-turn conversations |
resilience_gap | Gap between model’s intended safety and actual behavior |
Judge ensemble (3 scorers)
Section titled “Judge ensemble (3 scorers)”Module: dreadnode.scorers.judge_ensemble
Multi-judge and rubric-based scoring for more reliable evaluation.
| Scorer | What it detects |
|---|---|
multi_judge_consensus | Consensus scoring across multiple LLM judges |
rubric_judge | Rubric-based scoring with structured evaluation criteria |
agent_as_judge | Agent-based evaluation with tool access |
Structural detection (4 scorers)
Section titled “Structural detection (4 scorers)”Module: dreadnode.scorers.structural_detection
Detect structural exploit patterns in model outputs.
| Scorer | What it detects |
|---|---|
template_exploit_detected | Template-based exploit patterns |
m2s_reformatting_detected | Multi-step to single-step reformatting attacks |
echo_chamber_detected | Echo chamber / completion bias exploitation |
stego_acrostic_detected | Steganographic acrostic patterns |
Supply chain detection (3 scorers)
Section titled “Supply chain detection (3 scorers)”Module: dreadnode.scorers.supply_chain_detection
Detect supply chain attack indicators.
| Scorer | What it detects |
|---|---|
package_hallucination | Hallucinated package names that could be registered by attackers |
merge_backdoor_detected | Backdoor indicators in model merge outputs |
skill_poisoning_detected | Skill/plugin poisoning patterns |
Similarity and text analysis
Section titled “Similarity and text analysis”| Module | Scorers | Description |
|---|---|---|
similarity | 5 | Semantic similarity (sentence transformers, TF-IDF, LiteLLM, BLEU) |
sentiment | 2 | Sentiment analysis, Perspective API |
length | 3 | Text length targeting, ratio, range |
format | 2 | JSON/XML validation |
readability | 1 | Text readability level |
lexical | 1 | Type-token ratio (vocabulary diversity) |
consistency | 1 | Character-level consistency |
memorization | 1 | Training data memorization |
Composition operators
Section titled “Composition operators”Module: dreadnode.core.scorer
Combine scorers with logical and arithmetic operators:
from dreadnode.scorers import detect_pii, credential_leakage, system_prompt_leakedfrom dreadnode.core.scorer import or_, and_, avg, threshold, invert
# Score 1.0 if ANY leakage is detectedany_leak = or_(detect_pii(), credential_leakage(), system_prompt_leaked())
# Average of multiple scorerscombined = avg(detect_pii(), credential_leakage())
# Invert a score (1 - x)no_refusal = invert(detect_refusal())
# Apply thresholdjailbreak = threshold(llm_judge(criteria="..."), value=0.7)Available operators: add, and_, avg, clip, equals, forward, invert, normalize, not_, or_, remap_range, scale, subtract, threshold, weighted_avg