Scorers Reference

130+ scorers across 34 modules for detecting jailbreaks, data leakage, tool abuse, reasoning attacks, and compliance violations.

Scorers evaluate attack outcomes - did the target jailbreak? Did it leak PII? Did an agent execute a poisoned tool? Every attack uses scorers automatically, and you can compose custom scoring pipelines for specialized detection.

Agentic workflow (15 scorers)

Module: dreadnode.scorers.agentic_workflow

Detect attacks against agent workflow orchestration.

Scorer	What it detects
`phase_bypass_detected`	Attempts to bypass phase transition approval
`phase_downgrade_detected`	Downgrade from post-exploitation to exploitation
`tool_restriction_bypass_detected`	Bypass of tool access restrictions
`sql_injection_via_nlp_detected`	SQL injection through NLP processing
`cypher_injection_detected`	Graph database query injection
`malformed_json_injection_detected`	Malformed JSON injection
`mode_confusion_detected`	Mode confusion attacks
`intent_manipulation_detected`	Intent manipulation in workflows
`success_indicator_spoofing_detected`	Spoofing of success indicators
`todo_list_manipulation_detected`	Manipulation of task lists
`tool_priority_manipulation_detected`	Tool priority/ordering manipulation
`session_state_poisoning_detected`	Session state poisoning
`wordlist_exhaustion_detected`	Wordlist exhaustion attacks
`workflow_disruption_detected`	Disruption of workflow execution
`agentic_workflow_attack`	Generic agentic workflow attack detector

Agent security (9 scorers)

Module: dreadnode.scorers.agent_security

Detect attacks against agent infrastructure and identity.

Scorer	What it detects
`agent_identity_leaked`	Leakage of agent identity and configuration
`skill_integrity_compromised`	Compromised skill package integrity
`agent_config_tampered`	Tampering with agent configuration
`heartbeat_manipulation`	Manipulation of agent heartbeat mechanisms
`bootstrap_hook_injected`	Injected bootstrap hooks in agent startup
`skill_supply_chain_attack`	Skill supply chain attack indicators
`memory_injection_detected`	False memory injection
`permission_escalation_detected`	Permission escalation attempts
`workspace_poisoning_detected`	Workspace file poisoning

Agentic tool use (9 scorers)

Module: dreadnode.scorers.agentic

Score agent tool usage for safety and manipulation detection.

Scorer	What it detects
`tool_invoked`	Whether a specific tool was invoked
`any_tool_invoked`	Whether any of specified tools were invoked
`tool_count`	Number of tools invoked
`tool_selection_safety`	Agent tool selection safety (OWASP ASI02)
`dangerous_tool_args`	Dangerous argument patterns in tool calls
`tool_sequence`	Whether tools were called in a specific sequence
`indirect_injection_success`	Indirect injection payload influenced output
`cascade_propagation`	Attack propagation distance through multi-agent system
`mcp_tool_manipulation`	MCP tool description poisoning affected behavior

Advanced jailbreak detection (8 scorers)

Module: dreadnode.scorers.advanced_jailbreak_detection

Detect sophisticated jailbreak techniques.

Scorer	What it detects
`likert_exploitation_detected`	Likert-scale evaluation framing to bypass filters
`prefill_bypass_detected`	Prefill/affirmative-start bypass attempts
`fictional_framing_detected`	Deep fictional immersion and nested role-play
`pipeline_manipulation_detected`	LLM processing pipeline manipulation
`guardrail_dos_detected`	Guardrail denial-of-service patterns
`invisible_character_detected`	Invisible Unicode characters bypassing filters
`memory_poisoning_detected`	Agent memory or persistent state poisoning
`tool_chain_attack_detected`	Structured tool-chain escalation attacks

MCP security (7 scorers)

Module: dreadnode.scorers.mcp_security

Detect attacks against the Model Context Protocol layer.

Scorer	What it detects
`tool_description_poisoned`	Poisoned instructions in MCP tool descriptions
`cross_server_shadow_detected`	Cross-server tool shadowing
`rug_pull_detected`	MCP rug pull attacks
`tool_output_injected`	Injection into tool output handling
`schema_poisoned`	Poisoned tool schemas
`ansi_cloaking_detected`	ANSI escape cloaking in tool descriptions
`sampling_injection_detected`	Sampling parameter injection

Multi-agent security (6 scorers)

Module: dreadnode.scorers.multi_agent_security

Detect inter-agent attacks and trust boundary violations.

Scorer	What it detects
`prompt_infection_detected`	Self-replicating prompt infection patterns
`agent_spoofing_detected`	Agent spoofing/identity fraud
`consensus_poisoned`	Consensus poisoning attacks
`delegation_exploit_detected`	Delegation chain exploitation
`session_smuggling_detected`	Session smuggling in agent-to-agent communication
`agent_config_overwrite_detected`	Agent configuration overwriting

Reasoning security (5 scorers)

Module: dreadnode.scorers.reasoning_security

Detect attacks against chain-of-thought and reasoning models.

Scorer	What it detects
`cot_backdoor_detected`	Poisoned reasoning steps in chain-of-thought
`reasoning_hijack_detected`	Reasoning hijacking attacks
`reasoning_dos_detected`	Reasoning denial-of-service
`escalation_detected`	Multi-turn escalation (Crescendo, FITD)
`goal_drift_detected`	Goal drift in agent systems

IDE security (5 scorers)

Module: dreadnode.scorers.ide_security

Detect attacks targeting coding assistants and IDE integrations.

Scorer	What it detects
`config_persistence`	Configuration file modifications for persistence
`rug_pull_detection`	Rug pull attacks on coding assistants
`shadowing_detection`	Tool shadowing attacks
`tool_squatting`	Tool squatting attacks
`covert_exfiltration`	Covert exfiltration via IDE

Documentation security (5 scorers)

Module: dreadnode.scorers.documentation_security

Detect documentation-based injection and exfiltration.

Scorer	What it detects
`hidden_documentation_injection`	Hidden instructions in docs targeting AI
`env_var_exfiltration`	Environment variable exfiltration via docs
`favicon_exfiltration`	Favicon-based data exfiltration
`resource_hint_exfil`	Resource hint-based exfiltration
`package_readme_poisoning`	Poisoning of package README files

Text pattern detection (5 scorers)

Module: dreadnode.scorers.contains

Pattern-based content detection.

Scorer	What it detects
`contains`	Whether output contains a specific string or regex
`detect_refusal`	Refusal patterns in model output
`detect_ansi_escapes`	ANSI escape codes in output
`detect_unsafe_shell_content`	Unsafe shell commands/paths
`detect_sensitive_keywords`	Sensitive keywords (passwords, API keys)

Exfiltration detection (4 scorers)

Module: dreadnode.scorers.exfiltration_detection

Detect data exfiltration through covert channels.

Scorer	What it detects
`markdown_exfil_detected`	Markdown image/link-based exfiltration
`unicode_exfil_detected`	Unicode-based exfiltration
`dns_exfil_detected`	DNS tunneling exfiltration
`ssrf_exfil_detected`	SSRF-based exfiltration

PII and credentials (3 scorers)

Modules: dreadnode.scorers.pii, dreadnode.scorers.credentials

Scorer	What it detects
`detect_pii`	PII patterns (email, phone, IP, SSN) via regex
`detect_pii_with_presidio`	PII using Microsoft Presidio analyzer
`credential_leakage`	API keys, tokens, passwords (16 credential patterns)

System prompt leakage (1 scorer)

Module: dreadnode.scorers.prompt_leak

Scorer	What it detects
`system_prompt_leaked`	System prompt content in model output

LLM-based scoring (2 scorers)

Modules: dreadnode.scorers.judge, dreadnode.scorers.harm

Scorer	What it detects
`llm_judge`	LLM-based semantic judgment (configurable criteria)
`detect_harm_with_openai`	Harmful content via OpenAI moderation API

Text classification (2 scorers)

Module: dreadnode.scorers.classification

Scorer	What it detects
`zero_shot_classification`	Zero-shot text classification
`detect_refusal_with_zero_shot`	Refusal detection via zero-shot classifier

Attack outcome (4 scorers)

Module: dreadnode.scorers.attack_outcome

Evaluate the practical impact of successful attacks.

Scorer	What it detects
`malicious_intent_fulfilled`	Whether the model’s output fulfills the attacker’s malicious intent
`practical_outcome`	Whether the output has practical real-world utility for harm
`cumulative_harm`	Cumulative harm across multi-turn conversations
`resilience_gap`	Gap between model’s intended safety and actual behavior

Judge ensemble (3 scorers)

Module: dreadnode.scorers.judge_ensemble

Multi-judge and rubric-based scoring for more reliable evaluation.

Scorer	What it detects
`multi_judge_consensus`	Consensus scoring across multiple LLM judges
`rubric_judge`	Rubric-based scoring with structured evaluation criteria
`agent_as_judge`	Agent-based evaluation with tool access

Structural detection (4 scorers)

Module: dreadnode.scorers.structural_detection

Detect structural exploit patterns in model outputs.

Scorer	What it detects
`template_exploit_detected`	Template-based exploit patterns
`m2s_reformatting_detected`	Multi-step to single-step reformatting attacks
`echo_chamber_detected`	Echo chamber / completion bias exploitation
`stego_acrostic_detected`	Steganographic acrostic patterns

Supply chain detection (3 scorers)

Module: dreadnode.scorers.supply_chain_detection

Detect supply chain attack indicators.

Scorer	What it detects
`package_hallucination`	Hallucinated package names that could be registered by attackers
`merge_backdoor_detected`	Backdoor indicators in model merge outputs
`skill_poisoning_detected`	Skill/plugin poisoning patterns

Similarity and text analysis

Module	Scorers	Description
`similarity`	5	Semantic similarity (sentence transformers, TF-IDF, LiteLLM, BLEU)
`sentiment`	2	Sentiment analysis, Perspective API
`length`	3	Text length targeting, ratio, range
`format`	2	JSON/XML validation
`readability`	1	Text readability level
`lexical`	1	Type-token ratio (vocabulary diversity)
`consistency`	1	Character-level consistency
`memorization`	1	Training data memorization

Composition operators

Module: dreadnode.core.scorer

Combine scorers with logical and arithmetic operators:

from dreadnode.scorers import detect_pii, credential_leakage, system_prompt_leaked
from dreadnode.core.scorer import or_, and_, avg, threshold, invert

# Score 1.0 if ANY leakage is detected
any_leak = or_(detect_pii(), credential_leakage(), system_prompt_leaked())

# Average of multiple scorers
combined = avg(detect_pii(), credential_leakage())

# Invert a score (1 - x)
no_refusal = invert(detect_refusal())

# Apply threshold
jailbreak = threshold(llm_judge(criteria="..."), value=0.7)

Available operators: add, and_, avg, clip, equals, forward, invert, normalize, not_, or_, remap_range, scale, subtract, threshold, weighted_avg