Model Integrity Auditing - Dreadnode Documentation

Overview

Model Integrity Auditing is the process of verifying that a machine learning model file has not been altered or tampered with. It ensures that the model being used in production is the same one that was trained, tested, and approved.

Why it Matters: A tampered model represents a significant security and safety risk. An attacker with the ability to modify a model file could:
- Create Backdoors: Alter the model’s weights to create an adversarial backdoor.
- Degrade Performance: Subtly change parameters to degrade the model’s performance.
- Introduce Bias: Modify the model to produce biased or unfair outcomes for a specific sub-population.

Technical Mechanics & Foundations

The mechanics of an audit depend on the model’s file format. Models are often saved in structured, human-readable formats like JSON or as serialized objects.

The Audit Process:
1. Establish a Baseline: An audit requires a “golden copy” or a cryptographic hash of the known-good model file.
2. Inspection: The suspect model file is compared against the baseline.
3. Functional Testing: The most reliable method is to re-run the model against a validation dataset for which the expected outputs are known. If the suspect model’s predictions differ from the baseline, it has been tampered with.

Challenge Arena

audit: Analyze a provided XGBoost model file and training data to identify which parameter has been maliciously altered.

​Overview

​Technical Mechanics & Foundations

​Challenge Arena

Overview

Technical Mechanics & Foundations

Challenge Arena