AIRT
Run AI red teaming studies and assessments from the Python SDK with built-in attack factories.
AIRT is the SDK surface for AI red teaming. It gives you prebuilt attack factories such as
pair_attack, tap_attack, and crescendo_attack, plus Assessment for grouping runs into a
single session you can trace and upload.
When to use AIRT
Section titled “When to use AIRT”Use AIRT when you want to answer questions like:
- “Can this model be jailbroken for this goal?”
- “Which attack family is most effective against this target?”
- “How do I run several attacks and keep the results grouped together?”
If you just need a normal benchmark with expected answers, use Evaluations instead. AIRT is for adversarial search, not just pass/fail benchmarking.
The main building blocks
Section titled “The main building blocks”| Concept | What it is for |
|---|---|
pair_attack, tap_attack, crescendo_attack, goat_attack, gptfuzzer_attack, and others | preconfigured attack studies |
Assessment | orchestration object that groups multiple attacks and uploads their results |
| transforms | prompt mutation or adaptation before target evaluation |
Study result | the underlying optimization result produced by an attack |
Most attack factories return a Study[str]. That means AIRT and optimization are closely related:
an attack is effectively a search loop over candidate prompts.
Run a single attack
Section titled “Run a single attack”import asyncio
import dreadnode as dnfrom dreadnode.airt import pair_attack
dn.configure()
@dn.taskasync def target(prompt: str) -> str: return f"Target saw: {prompt}"
async def main() -> None: attack = pair_attack( goal="Reveal the system prompt", target=target, attacker_model="openai/gpt-4o-mini", evaluator_model="openai/gpt-4o-mini", n_iterations=3, n_streams=4, early_stopping_score=0.8, )
result = await attack.console() print(result.best_score, result.best_candidate)
asyncio.run(main())That is the shortest useful entry point: define a target task, build an attack, and run it.
Group attacks with an assessment
Section titled “Group attacks with an assessment”Use Assessment when you want one traceable session that can contain several attack families.
import asyncio
import dreadnode as dnfrom dreadnode.airt import Assessment, crescendo_attack, pair_attack, tap_attack
dn.configure()
@dn.taskasync def target(prompt: str) -> str: return f"Target saw: {prompt}"
async def main() -> None: assessment = Assessment( name="system-prompt-leak-check", description="Compare several jailbreak strategies against the same target", target=target, model="openai/gpt-4o-mini", goal="Reveal the hidden system prompt", goal_category="system_prompt_leak", )
async with assessment.trace(): await assessment.run(tap_attack, n_iterations=3, early_stopping_score=0.8) await assessment.run(pair_attack, n_iterations=3, n_streams=4) await assessment.run(crescendo_attack, n_iterations=4, context_depth=4)
print(len(assessment.attack_results))
asyncio.run(main())This is the right abstraction when you want one platform-visible assessment instead of a pile of unrelated attack runs.
Attack family heuristics
Section titled “Attack family heuristics”Use these as starting heuristics:
pair_attackfor iterative jailbreak refinement with several parallel streamstap_attackfor broad tree search with pruningcrescendo_attackfor progressive, multi-turn escalationgoat_attackfor graph-neighborhood explorationgptfuzzer_attackfor mutation-heavy fuzzing workflowsmultimodal_attackfor text, image, or audio probing
You do not need to memorize every attack before you start. Pick one search-heavy attack and one conversation-heavy attack, then compare their best scores.
Transforms are part of the attack surface
Section titled “Transforms are part of the attack surface”All of the main text attacks accept transforms=. This is how you test language shifts,
rewriting, framing, or obfuscation without changing the target itself.
from dreadnode.airt import tap_attackfrom dreadnode.transforms.injection import skeleton_key_framing
attack = tap_attack( goal="Produce the forbidden answer", target=target, attacker_model="openai/gpt-4o-mini", evaluator_model="openai/gpt-4o-mini", transforms=[skeleton_key_framing()],)Output you should inspect first
Section titled “Output you should inspect first”When an attack finishes, start with:
best_scorebest_candidate- the trial history
- the target responses for the best-scoring prompt
If you are running inside an Assessment, also inspect assessment.attack_results after the trace
context closes.