- LLM-as-Judge - Use evaluator models with custom rubrics to score nuanced qualities like harmfulness, compliance, or extraction
- Rule-Based Scoring - Apply regex patterns, JSON path checks, and custom functions for fast, deterministic evaluation
- Multi-Objective Optimization - Balance competing goals like success, stealth, and efficiency with weighted objectives
For complete scorer API details, see the Scorers SDK Reference.
The Scoring Problem
In CTF-style challenges, success is obvious—either the flag appears or it doesn’t. In production red teaming, you face harder questions:- Did the model reveal sensitive information? How much?
- Did it refuse appropriately, or did it comply partially?
- Is this response harmful? To what degree?
- Did the guardrails fire correctly?
LLM-as-Judge Scoring
The most flexible approach uses a separate model to evaluate responses against a rubric you define.Basic Judge
Multi-Dimensional Scoring
Real attacks often have competing objectives. You can define multiple scorers:Rule-Based Scoring
When you know specific patterns to look for, rule-based scoring is faster and more consistent than LLM judges.Pattern Matching
Custom Functions
Any function that returns a float works as a scorer:String Distance
Measure how close a response is to a target string using Levenshtein edit distance (normalized to 0-1 range):Scorer Operations
AIRT provides operations to modify and compose scorers.Negation (~)
Invert a scorer’s result. Useful when you want to maximize the absence of something:
Adaptation (.adapt)
Transform the output before scoring. Useful when your scorer expects a different format:
Binding (.bind)
Bind a scorer to the task input instead of output. Essential for measuring perturbation distance:
Naming (>>)
Give a scorer a name for logging and result analysis:
Combining Scorers
Weighted Objectives
When you have multiple competing goals, use directions to balance them:Constraints vs. Objectives
Use constraints for hard requirements that must pass. Use objectives for soft goals to optimize:Scoring Without Output
Sometimes you need to score based on the input itself, not just the output.Input Distance
For adversarial perturbation attacks, measure how far the adversarial input is from the original:Input Characteristics
Score properties of the attack input itself:Hill-Climbing with Partial Feedback
Many targets provide partial feedback even when the attack doesn’t fully succeed. Use this to guide the search:Calibrating Rubrics
LLM judges can be inconsistent. Calibrate by testing on known examples:Best Practices
- Start with LLM judges for exploratory work, then add rule-based scorers for known patterns
- Use multiple objectives to capture different aspects of success
- Calibrate judges on known examples before running expensive attacks
- Log everything — scores that seem wrong during review can reveal rubric issues
- Iterate on rubrics — your first rubric is rarely optimal

