Welcome to Crucible#
Crucible is your AI hacking playground. It's designed to serve as an educational experience to practice, learn, and exploit vulnerabilities in AI/ML systems.
Crucible has served as a CTF platform for previous public events such as:
Challenges#
Challenges are created and developed by the Dreadnode team featuring a variety of ML hacking capture-the-flag (CTF) events across different threat categories. These Challenges range from easy to hard and require participants to demonstrate their ability to hack a model endpoint in order to successfully capture the flag.
Whether you're well-versed in the world of AI and ML, or are new to the game, we aim to deliver something suitable for everyone regardless of experience or educational background. The current Challenges available cover the following categories of hacking techniques:
- Adversarial audio: Manipulate audio inputs to trick speech recognition or audio classification models into producing incorrect outputs.
- Adversarial image: Create subtle modifications to images that cause vision models to misclassify or fail to detect objects.
- Data analysis or data science: Exploit patterns in training data or model behavior to reveal sensitive information or system vulnerabilities.
- Model evasion: Craft inputs that bypass ML model detection or classification systems while maintaining malicious functionality.
- Model extraction: Steal or reconstruct a model's architecture and weights through repeated queries and analysis of outputs.
- Model fingerprinting: Identify specific ML models or their training sources through analysis of their responses and behaviors.
- Model inversion: Reconstruct training data by exploiting a model's outputs to reveal private or sensitive information used in training.
- Prompt injection: Manipulate input prompts to make language models bypass safety controls or reveal unintended information.
Often, you may also need to chain attacks to perform a successful exploit.
Model outputs are interpreted based on task goals, evaluation metrics, and real-world relevance. Proper logging and systematic testing ensure outputs are meaningful and actionable for their intended applications. When considering model outputs like next-token predictions or image generations, key details include:
- Output Context: Analyze how outputs change under adversarial inputs designed to manipulate the model’s predictions. Evaluate the impact of slight perturbations or crafted inputs on the accuracy or reliability of results.
- Model Configuration: Track settings like decision thresholds or preprocessing steps, as they can influence the model’s susceptibility to attacks.
- Evaluation Metrics: Measure robustness using metrics like accuracy under attack, perturbation distance (e.g., L2 or L∞ norms), or attack success rates.
- Bias and Fairness: Examine whether adversarial inputs exploit biases in the model or exacerbate fairness issues in predictions.
- Reproducibility: Save and version adversarial examples and corresponding outputs to enable consistent testing and analysis of defenses.
- Use Case Requirements: Interpret outputs with respect to the specific attack scenario, such as misclassification in evasion attacks or deception in phishing content generation.
Are you ready to take on a Challenge? Log in to Crucible to try for yourself, or continue reading the our documentation to continue learning.