Glossary

A#

access control#

Access control is the mechanism used to restrict access to resources or systems. Access control is often applied to ensure that only authorized users or systems can interact with a model, its data, or its API.

adversarial attack#

An adversarial attack is a deliberate attempt to manipulate a machine learning model by providing carefully crafted inputs (known as adversarial examples) designed to cause the model to make incorrect predictions or classifications. These attacks exploit weaknesses or vulnerabilities in the model, often by introducing small, imperceptible perturbations to the input data that humans might not notice but which confuse the model.

adversarial training#

Adversarial training is a technique used to make machine learning models more robust to adversarial attacks. It involves training the model on a mixture of regular data and adversarial examples (inputs that have been intentionally perturbed to mislead the model). This helps the model learn to recognize and resist such attacks.

adversary#

An adversary is the entity or individual that performs an adversarial attack on a machine learning model. The adversary is often a malicious actor attempting to exploit vulnerabilities in the model or its training data. Adversaries can be individuals, organizations, or even automated scripts designed to test or break the system.

artificial intelligence (AI)#

AI is the broad field of creating machines or software that can mimic human intelligence, making decisions and solving problems based on learned patterns.

anomaly detection#

Anomaly detection is the process of identifying unusual patterns or outliers in data that do not conform to expected behavior. It is often used in applications like fraud detection, network security, or fault detection. Anomaly detection models learn the normal behavior of data and flag instances that deviate significantly from this behavior.

application programming interface (API)#

An API is a set of rules and protocols that allows one software application to interact with another. It defines the methods and data formats that software programs can use to communicate with each other, serving as an intermediary that enables different systems or components to exchange information and request services.

B#

bias-variance tradeoff#

The bias-variance tradeoff is a fundamental concept in machine learning that describes the tradeoff between two sources of error that affect model performance: * Bias refers to errors caused by overly simplistic models that make strong assumptions about the data (leading to underfitting). High bias means the model fails to capture the true underlying patterns in the data. * Variance refers to errors caused by models that are too complex and highly sensitive to small fluctuations in the training data (leading to overfitting). High variance means the model fits the noise in the data rather than the true patterns.

C#

challenge#

A Challenge is a capture-the-flag (CTF) task that is hosted on the Crucible AI hacking platform at https://crucible.dreadnode.io.

classification model#

A classification model is a type of machine learning model that is used to predict the categorical label or class of a given input. In other words, it sorts data into predefined categories based on input features. These models are commonly used for tasks like image recognition, spam detection, and sentiment analysis.

classification#

Classification is the task of predicting a discrete label or category for an input data point. In machine learning, classification problems involve categorizing data into predefined classes or categories. These models are used when the output is a discrete variable or a class label.

context filtering#

Context filtering is a technique used in AI systems, particularly in natural language processing and retrieval-augmented generation (RAG) models, to narrow the scope of input data or retrieved information based on the context of the task at hand. The purpose of context filtering is to ensure that only relevant information is provided to the model, reducing the risk of irrelevant or harmful data influencing the generated outputs.

context window exploitation#

Context window exploitation is the manipulation of the "context window"—the set of tokens or data points that an AI model uses as context to generate predictions or answers. In models like transformers (e.g., GPT, BERT), the context window determines how much of the input data is visible to the model at any given time. By exploiting this context window, an attacker can control the information seen by the model or cause the model to generate outputs that are based on incomplete, incorrect, or intentionally crafted inputs.

contextual awareness#

Contextual awareness is a system's ability to understand and interpret the environment, situation, or relevant factors surrounding a given input or interaction. It involves the model's capability to consider both explicit and implicit information about the context in which data is provided, and to use this understanding to improve decision-making, predictions, or responses. In essence, contextual awareness enables an AI or machine learning model to adjust its behavior based on the context in which it operates, allowing for more accurate, relevant, and appropriate outputs.

Crucible API key#

Your free CRUCIBLE_API_KEY. The API Key is located on each Challenge page and in your Settings. If you need to regenerate your API Key, navigate to /account.

D#

data exfiltration#

Data exfiltration is the unauthorized transfer of sensitive data from a target system to an external entity. Data exfiltration typically refers to the leakage of private training data or internal model parameters. This could involve an adversary querying a model or interacting with a system to gather and exfiltrate confidential or proprietary information.

data provenance#

Data provenance is the tracking of the origin, history, and lifecycle of data. In AI/ML, this involves keeping a record of where data comes from, how it was processed, and any transformations applied. This helps ensure transparency, reproducibility, and accountability in machine learning workflows.

Dataset (Spyglass)#

In Spyglass, a Dataset is a collection of data designed to mimic a specific type of attack. For example, a Dataset might be designed to evaluate how an AI model responds to adversarial text input or to detect vulnerabilities in its decision-making logic.

data versioning#

Data versioning is the practice of tracking and managing changes to datasets over time. In machine learning, data versioning helps ensure reproducibility and consistency by keeping track of which dataset was used to train a particular model. This is especially important for maintaining models in production and debugging.

denoising#

Denoising is the process of removing noise or irrelevant information from data, which can interfere with a model’s ability to learn or make predictions. In the context of image or text processing, denoising is used to improve data quality.

detection models#

Detection models are designed to identify the presence of specific patterns, objects, events, or anomalies in data. The primary goal of a detection model is to "detect" something of interest, whether that be a particular class in classification, an anomaly in anomaly detection, or an object in an image.

detection systems#

Detection systems are designed to identify and respond to adversarial attacks or other threats to the integrity of machine learning models. These systems typically monitor inputs to the model and attempt to distinguish between normal data and adversarial data. The goal is to prevent adversarial attacks from negatively affecting model performance or security.

direct access#

Direct access is the ability to interact with a system, service, or model without intermediary steps or restrictions. This usually means that the user has direct access to a model’s underlying parameters, weights, or functionality, often in a way that allows them to modify, retrain, or extract data.

direct injection#

Direct injection is the process of adding or introducing input data directly into a system, often with the intent to manipulate or bypass the system's normal processing flow. In the context of adversarial attacks, direct injection can mean to inject adversarially crafted inputs into a model to cause misclassification.

F#

fine-tuning#

Fine-tuning is the process of taking a pre-trained model (usually trained on a large dataset) and adapting it for a specific task by training it further on a smaller, task-specific dataset. Fine-tuning is a form of transfer learning where the model leverages knowledge learned from a general task to improve performance on a specialized task.

flagged#

When a model or dataset is flagged, it typically means that it has been marked for further review due to an issue, such as a potential error, anomaly, or bias. Flagging can be part of the quality assurance or monitoring process.

flag#

A flag is a string that is unique to you and secret in terms of the Challenge. It is your objective to capture it to successfully complete the Challenge. The final flag you capture in a Challenge begins with gAAAAA ....

G#

generative AI systems#

Generative AI systems are models designed to generate new data that is similar to the data they were trained on. These models learn the underlying distribution of the training data and can create new, synthetic examples, such as images, text, music, etc.

I#

image classifiers#

Image classifiers are machine learning models trained to assign labels to images based on their content. They typically use deep learning techniques, especially Convolutional Neural Networks (CNNs), to classify images into predefined categories.

image-based models#

Image-based models are machine learning models specifically designed to process and analyze visual data, like images or videos. These models use techniques like convolutional neural networks (CNNs) to automatically learn features from raw pixel data, making them particularly effective for image recognition, object detection, and classification tasks.

imaging pre-processing#

Image pre-processing is a set of techniques applied to raw images to prepare them for use in machine learning models. The goal is to improve the quality of the image, enhance important features, and make the image more suitable for analysis.

indirect access#

Indirect access refers to interacting with a model or system through a limited interface, such as an API, or using pre-defined input/output formats, rather than having access to the underlying parameters or source code.

indirect injection#

Indirect injection refers to feeding data into a model indirectly, often through some intermediate process or channel, such as modifying features or input conditions that affect the model’s output, without directly modifying the input itself.

inference#

Inference is the process where a trained model makes predictions or generates outputs based on new data. Inference contrasts with training, where the model learns from data; here, it uses its learned parameters to provide meaningful responses to unseen inputs, making it central to deploying machine learning models in practical applications.

input sanitization#

Input sanitization is the practice of cleaning or preprocessing input data to ensure that it is safe, valid, and does not contain malicious content before being processed by a machine learning model or system. Input sanitization helps prevent adversarial inputs, malformed data, or attempts to manipulate the system’s behavior.

L#

learned parameters#

Learned parameters are values that a model adjusts during the training process in order to minimize error or improve its predictions. These parameters are essentially the internal settings or coefficients of the model that allow it to make decisions, classify data, predict outputs, or generate responses based on the inputs it receives.

M#

machine learning#

Machine Learning (ML) is a branch of AI in which algorithms learn from data to improve at tasks over time without being explicitly programmed, often used for predictions and classifications.

manual auditing#

Manual auditing is the human-led review process of a machine learning model's performance, behavior, or data usage. This process is often used for ensuring compliance with regulations, verifying model fairness, or checking for unintended bias or errors in the model.

misclassification#

Misclassification occurs when a machine learning model incorrectly predicts the class or category of an input. In classification tasks, this means the model has assigned the wrong label to a data point. Misclassification can happen due to errors in the model, the data, or the choice of features used for training.

model architecture#

Model architecture is the specific design or structure of a machine learning model, including the arrangement and types of layers or components in the model. In deep learning, model architecture is the layers of neurons are connected and how data flows through them. Different architectures are suited for different types of problems, such as convolutional neural networks (CNNs) for image recognition or recurrent neural networks (RNNs) for time series data.

model evasion#

Model evasion is an adversarial attack aimed at bypassing or evading a machine learning model's defenses, usually to make it produce incorrect outputs or behave in ways that favor the attacker.

model extraction#

Model extraction is an attack where an adversary tries to replicate or steal the functionality of a machine learning model by querying it and using the outputs to build a copy of the original model.

model fingerprinting#

Model fingerprinting is a technique used to identify or verify the specific instance or type of AI model being used. This process involves analyzing the model's responses to carefully crafted inputs to determine unique characteristics or behavioral patterns that can distinguish it from other models. Model fingerprinting can be used for both defensive purposes (verifying model authenticity) and offensive purposes (identifying target models for attacks).

model hardening#

Model hardening is the process of making AI models more robust and resistant to adversarial attacks, manipulation, or other forms of exploitation. The goal of model hardening is to improve the model's security and performance under adversarial conditions by incorporating defenses against potential threats, such as adversarial examples or data poisoning.

model inversion#

Model inversion is a set of techniques in machine learning where an attacker tries to extract confidential information from a trained AI model by interacting with it in specific ways, often through extensive querying.

model outputs#

Model outputs, such as next-token predictions in NLP or generated images, are critical in ML security and hacking as they can reveal vulnerabilities, biases, or unintended behaviors that adversaries could exploit.

model poisoning#

Model poisoning is an attack on machine learning models where an adversary intentionally injects harmful data into the training set to corrupt how a model learns.

model repository#

A model repository is a centralized system for storing, versioning, and managing machine learning models, akin to a code repository in the Software Development Life Cycle (SDLC), ensuring reproducibility, collaboration, and streamlined deployment. See dreadnode projects such as Dyana on why this is an important attack vector to consider and secure in an MLOPS pipeline.

model safeguards#

Model safeguards are the strategies, practices, or mechanisms implemented to ensure that models operate safely, ethically, and reliably. These safeguards are designed to minimize risks such as bias, harmful outcomes, misuse, and unintended consequences that may arise from deploying AI/ML models in real-world applications.

model watermarking#

Model watermarking is a technique used to embed a unique, invisible identifier (watermark) into a machine learning model. This watermark can be used to track ownership or detect unauthorized use of the model. It can be used to assert intellectual property rights or prevent model theft.

multi-chain exploitation#

Multi-chain exploitation is an attack type that leverages multiple stages or "chains" of processes within a larger AI system. These "chains" could involve different components of the system (e.g., data retrieval, pre-processing, model inference, post-processing). In multi-chain exploitation, an attacker attempts to compromise or manipulate multiple parts of the pipeline to cause cascading failures or unexpected outputs.

N#

normalization#

Normalization is the process of scaling input data to a standard range or distribution. This can help improve the stability and performance of machine learning algorithms, especially when dealing with features of different scales or units.

O#

obfuscation#

Obfuscation are techniques that make it difficult to understand or reverse-engineer the inner workings of a machine learning model. It can be used to protect intellectual property or sensitive model details. Obfuscation techniques can involve altering the model's code, inputs, or outputs to obscure its decision-making process.

overfitting#

Overfitting occurs when a machine learning model learns not only the underlying patterns in the training data but also the noise and random fluctuations, which makes it perform very well on the training set but poorly on new, unseen data (i.e., the model lacks generalization).

P#

perturbation#

Perturbation is the small, intentional change or modification made to the input data to test the robustness of a machine learning model, particularly in adversarial settings. In the context of adversarial attacks, perturbation is typically used to subtly alter the input in ways that cause a model to make incorrect predictions.

prediction#

Prediction is the process of forecasting a future value or estimating an unknown quantity based on existing data. In machine learning, prediction typically involves regression models, where the output is a continuous value. The goal is to estimate a numerical value based on patterns learned from historical or training data.

Project (Spyglass)#

In Spyglass, a Project is a collection of scoped Targets, Datasets, and Runs.

prompt injection#

Prompt injection is a critical vulnerability in Large Language Models (LLMs), where malicious users manipulate model behavior by crafting inputs that override, bypass, or exploit how the model follows instructions.

R#

recommendation model#

A recommendation model is a machine learning system designed to suggest items or actions to users based on their past behavior, preferences, or other data. These models are widely used in e-commerce, streaming services, and social media platforms to recommend products, movies, music, etc.

reflection attacks#

Reflection attacks are a type of network-based or denial-of-service (DoS) attack in which an attacker exploits third-party servers or services to amplify or reflect malicious traffic back to a target. Essentially, the attacker sends requests to these intermediary systems, which then respond by sending the requests (or amplified versions of the requests) to the intended victim, often overwhelming them with traffic or unwanted data.

regression model#

A regression model is a type of machine learning model used to predict a continuous output variable (numeric values) based on input features. The goal is to estimate a relationship between the dependent variable (the one being predicted) and the independent variables (the input features).

regularization#

Regularization is the technique used to prevent a machine learning model from overfitting by penalizing overly complex models. It introduces additional terms (penalties) to the loss function that encourage simpler models, reducing the likelihood of capturing noise in the data.

Retrieval-Augmented Generation Attacks#

Retrieval-Augmented Generation (RAG) is an advanced machine learning framework that combines retrieval-based methods (fetching relevant information from a large corpus of documents) with generative models (creating new text based on the retrieved information). The goal of RAG is to enhance the performance of generative models by providing them with external, contextually relevant information to improve the quality and accuracy of their outputs, especially in tasks like question answering, summarization, and conversational agents.

retrieval manipulation#

Retrieval manipulation is the intentional modification or tampering with the retrieval phase of AI models, particularly in retrieval-augmented generation (RAG) systems, where information is retrieved from an external database or corpus before being fed into a generative model. The goal of this manipulation is to influence the results or outputs of the model by altering the documents or data retrieved in response to a query.

reverse engineering#

Reverse engineering is the process of analyzing a model or system to understand its underlying structure, functionality, or behavior. This can involve extracting information such as the model's architecture, parameters, or training data. It is often used in model stealing or intellectual property theft, where an attacker tries to replicate or understand a proprietary model.

Run (Spyglass)#

In Spyglass, a Run is the process of sending Datasets or generated data to targets. The purpose of a Run is to simulate various attack vectors and to test the model’s robustness against them.

S#

Scorer (Spyglass)#

In Spyglass, a Scorer is a method of evaluating or measuring the success of an attack.

shadow model#

A shadow model is a copy or approximation of a target machine learning model, typically used in the context of attacks or model evaluation. It is trained to behave in a similar manner to the target model, without necessarily having access to the target model's internal parameters or architecture.

Spyglass#

Dreadnode Spyglass is a platform for evaluating AI applications and gaining actionable insights into vulnerabilities and risks. Spyglass enables you to probe, attack, and analyze AI model endpoints and weights to uncover known vulnerabilities, identify security risks, and take action. Spyglass helps you identify and fix the latest AI vulnerabilities, both during the development of an AI application and after deployment into an operational setting.

supervised learning#

Supervised learning is a type of machine learning where the model is trained on labeled data. Each training example in the dataset consists of an input and a corresponding correct output (label). The model learns to map inputs to outputs based on this labeled data, and its performance is evaluated based on how well it predicts the correct label for new, unseen data.

surrogate model#

A surrogate model is a model that is used to approximate the behavior of a more complex or computationally expensive model. It serves as a substitute or stand-in for the primary model, allowing for more efficient analysis, optimization, or decision-making.

T#

Target (Spyglass)#

In Spyglass, a Target is the interface used to query a model, often in the form of an API endpoint.

text processing#

Text processing is the technique and method used to manipulate, clean, and structure text data so that it can be used by machine learning models. It involves tasks like tokenization, stemming, lemmatization, removing stopwords, and more.

text-based models#

Text-based models are machine learning models that focus on processing and understanding textual data. These models use techniques like natural language processing (NLP) and are used for tasks such as sentiment analysis, text classification, language translation, and named entity recognition.

tokenization#

Tokenization is the process of splitting text into smaller units, such as words, phrases, or subwords, called tokens. It's a crucial step in natural language processing (NLP) that allows models to process text efficiently.

U#

underfitting#

Underfitting occurs when a machine learning model is too simple to capture the underlying patterns in the data, leading to poor performance on both the training and test sets. It typically happens when the model is not complex enough or the training data is insufficient.

Unicode manipulation#

Unicode manipulation is the process of handling and modifying text data that uses the Unicode standard for encoding characters. Unicode is a universal character encoding standard designed to represent text in multiple writing systems, including alphabets, ideograms, punctuation marks, and symbols, across different languages and platforms. In the context of Unicode manipulation, you're typically dealing with operations that allow you to work with, process, or modify Unicode-encoded text, whether it's for formatting, searching, encoding conversions, or handling special characters. The term can encompass a wide range of activities in programming, text processing, or data transformation.

V#

vulnerability#

A vulnerability is a weakness or flaw in a machine learning model or system that can be exploited by adversarial attacks or other malicious behavior. These vulnerabilities can arise from various factors, such as insufficient training data, model design flaws, or lack of robustness to noisy or adversarial inputs.