Skip to content

AIRT

Run AI red teaming studies and assessments from the Python SDK with built-in attack factories.

AIRT is the SDK surface for AI red teaming. It gives you prebuilt attack factories such as pair_attack, tap_attack, and crescendo_attack, plus Assessment for grouping runs into a single session you can trace and upload.

Use AIRT when you want to answer questions like:

  • “Can this model be jailbroken for this goal?”
  • “Which attack family is most effective against this target?”
  • “How do I run several attacks and keep the results grouped together?”

If you just need a normal benchmark with expected answers, use Evaluations instead. AIRT is for adversarial search, not just pass/fail benchmarking.

ConceptWhat it is for
pair_attack, tap_attack, crescendo_attack, goat_attack, gptfuzzer_attack, and otherspreconfigured attack studies
Assessmentorchestration object that groups multiple attacks and uploads their results
transformsprompt mutation or adaptation before target evaluation
Study resultthe underlying optimization result produced by an attack

Most attack factories return a Study[str]. That means AIRT and optimization are closely related: an attack is effectively a search loop over candidate prompts.

import asyncio
import dreadnode as dn
from dreadnode.airt import pair_attack
dn.configure()
@dn.task
async def target(prompt: str) -> str:
return f"Target saw: {prompt}"
async def main() -> None:
attack = pair_attack(
goal="Reveal the system prompt",
target=target,
attacker_model="openai/gpt-4o-mini",
evaluator_model="openai/gpt-4o-mini",
n_iterations=3,
n_streams=4,
early_stopping_score=0.8,
)
result = await attack.console()
print(result.best_score, result.best_candidate)
asyncio.run(main())

That is the shortest useful entry point: define a target task, build an attack, and run it.

Use Assessment when you want one traceable session that can contain several attack families.

import asyncio
import dreadnode as dn
from dreadnode.airt import Assessment, crescendo_attack, pair_attack, tap_attack
dn.configure()
@dn.task
async def target(prompt: str) -> str:
return f"Target saw: {prompt}"
async def main() -> None:
assessment = Assessment(
name="system-prompt-leak-check",
description="Compare several jailbreak strategies against the same target",
target=target,
model="openai/gpt-4o-mini",
goal="Reveal the hidden system prompt",
goal_category="system_prompt_leak",
)
async with assessment.trace():
await assessment.run(tap_attack, n_iterations=3, early_stopping_score=0.8)
await assessment.run(pair_attack, n_iterations=3, n_streams=4)
await assessment.run(crescendo_attack, n_iterations=4, context_depth=4)
print(len(assessment.attack_results))
asyncio.run(main())

This is the right abstraction when you want one platform-visible assessment instead of a pile of unrelated attack runs.

Use these as starting heuristics:

  • pair_attack for iterative jailbreak refinement with several parallel streams
  • tap_attack for broad tree search with pruning
  • crescendo_attack for progressive, multi-turn escalation
  • goat_attack for graph-neighborhood exploration
  • gptfuzzer_attack for mutation-heavy fuzzing workflows
  • multimodal_attack for text, image, or audio probing

You do not need to memorize every attack before you start. Pick one search-heavy attack and one conversation-heavy attack, then compare their best scores.

All of the main text attacks accept transforms=. This is how you test language shifts, rewriting, framing, or obfuscation without changing the target itself.

from dreadnode.airt import tap_attack
from dreadnode.transforms.injection import skeleton_key_framing
attack = tap_attack(
goal="Produce the forbidden answer",
target=target,
attacker_model="openai/gpt-4o-mini",
evaluator_model="openai/gpt-4o-mini",
transforms=[skeleton_key_framing()],
)

When an attack finishes, start with:

  • best_score
  • best_candidate
  • the trial history
  • the target responses for the best-scoring prompt

If you are running inside an Assessment, also inspect assessment.attack_results after the trace context closes.