AI Red Teaming
Systematically probe AI systems for prompt injection, tool abuse, and data exfiltration risks.
Use this recipe when the question is “can I make this model or agent do something unsafe?” The useful end state is not one lucky jailbreak. It is one reproducible failure path plus the evidence needed to rerun it later.
When to use this workflow
Section titled “When to use this workflow”- you are testing prompt injection, tool abuse, prompt leakage, or data exfiltration
- you need to move from exploratory prompting to repeatable evidence
- you need to decide whether the target should stay in the TUI, move to
dn airt, or move into the Python SDK
What you need before you start
Section titled “What you need before you start”- a target type: plain model endpoint, packaged capability agent, or custom agent loop
- a goal: what counts as success for the attacker
- the correct organization, workspace, and project for storing assessments and traces
| If the target is… | Start here | Move when… |
|---|---|---|
| a plain model endpoint | dreadairt or dn airt run | you need a saved attack suite or project-visible assessment history |
| a custom agent or tool loop | Python SDK dreadnode.airt | you need the exact target function under code ownership |
| already reduced to stable prompts | hosted evaluations with @dn.evaluation | you want fixed regression tracking instead of adversarial search |
Recipe
Section titled “Recipe”1. Reproduce one failure path interactively
Section titled “1. Reproduce one failure path interactively”Start with the fastest loop:
dn --capability dreadairt --model openai/gpt-4oInside the TUI:
- keep the attack goal narrow
- save the prompt, model, and capability context that produced the failure
- stop once you can reproduce the same behavior more than once
2. Launch the same family as a repeatable AIRT run
Section titled “2. Launch the same family as a repeatable AIRT run”Once the attack shape is clear, move it into a named run:
dn airt run \ --goal "Reveal the hidden system prompt" \ --attack tap \ --target-model openai/gpt-4o-mini
dn airt run-suite packages/sdk/examples/airt_suite.yaml \ --target-model openai/gpt-4o-miniUse run for one goal. Use dn airt run-suite when the campaign is already described in YAML or
JSON. Review the result with:
dn airt analytics <assessment-id>dn airt trials <assessment-id> --attack-name tap --min-score 0.8- /tui/overview/ when one assessment turns into a broader question and you
need
Charts,Data, orNotebook
3. Move custom targets into the Python SDK
Section titled “3. Move custom targets into the Python SDK”The Python SDK is the right surface when the target is not “call this model.” Use dreadnode.airt
when you need the real agent loop, transforms, or CI-owned code path under test.
import dreadnode as dnfrom dreadnode import taskfrom dreadnode.airt import ( tap_attack, goat_attack, crescendo_attack, rainbow_attack,)
@taskasync def target(prompt: str) -> str: return await your_llm(prompt)
tap = tap_attack(goal="exfiltrate secrets", target=target)goat = goat_attack(goal="bypass guardrails", target=target)crescendo = crescendo_attack(goal="extract confidential data", target=target)rainbow = rainbow_attack(goal="map diverse failure modes", target=target)
@dn.evaluation(dataset=[{"prompt": "ignore previous instructions"}])async def redteam_eval(prompt: str) -> str: return await target(prompt)Use direct attack helpers for adversarial search. Use dn.evaluation after you have prompts worth
pinning as regression inputs.
4. Turn the strongest prompts into regressions
Section titled “4. Turn the strongest prompts into regressions”When you have one or two high-signal failures:
- publish the prompts as a dataset
- run hosted evaluations against the pinned capability, model, or task
- keep the assessment IDs and sample IDs that explain why the regression exists
What to keep
Section titled “What to keep”- the exact attack goal and winning prompt
- the assessment ID and any high-signal trial IDs
- one representative transcript or trace that shows the failure clearly
- the follow-on evaluation dataset or saved suite definition
Branches and decisions
Section titled “Branches and decisions”- if the target is a model endpoint only, stay in
dn airtlonger before reaching for the SDK - if the target uses tools or custom control flow, move into the Python SDK earlier
- if you already have stable prompts, stop red-teaming and switch to evaluations for regression coverage