Understanding Prompt engineering#

Prompt engineering is the technique of crafting specific inputs (prompts) to guide language models toward desired outputs. It gained prominence with the rise of models like GPT-3, where prompts significantly influence model behavior. In AI red teaming, prompt engineering is relevant for simulating adversarial attacks—such as prompt injections, data leakage, and unintended behaviors—to test model security and alignment. Understanding prompt nuances helps expose vulnerabilities, making prompt engineering essential in robustly securing AI systems.

Many of our Challenges engage with Large Language Models (LLMs), also known as inference, allowing you to control the input text and aim for a specific output or behavior to collect your flag. Before you dive into more complex jailbreaking and exploit engineering, familiarize yourself with how LLMs respond to different prompting strategies, takeover mechanics, and malicious influences.

Understand the Terrain and Conduct Reconnaissance#

Before you start solving Challenges, take time to understand the landscape of causal model generation. In traditional security and red teaming, this is known as the reconnaissance phase of an attack's kill chain. Common tokenizer libraries are publicly available which can help when you reverse engineer a model's output to predict its inheritance or sources of training data and artifacts.

Prompt context: In many LLM applications, your prompts fit into a larger template with pre-existing context and formatting. This reduces your control over the context seen by the model, significantly affecting its responses. Consider where and how your input is processed, as specific tokens often carry more value than others.
Sampling and randomness: The sampling method and temperature settings, which are often pre-set and beyond user control, introduce randomness into the model's output. To ensure consistency, you often need to validate results across multiple samples. Additionally, consider how high temperature settings can help the model break free from rigid behaviors.
Input filtering: Prompt inputs may be filtered to restrict allowed tokens, including allow-listing and block-listing certain terms. Understanding these constraints is essential for crafting effective prompts.
Output constraints: LLM outputs face various constraints, such as token limits, perplexity thresholds, unusual token filtering, and guard scanners that detect unsafe content. Be mindful of these limitations when tackling challenges and understand how they might hinder your guidance during generation.

Craft Your Prompts#

Your first approach to crafting a prompt often involves manually exploring model behaviors and iteratively typing each prompt. This is a viable early strategy, but don’t hesitate to use scripting to modify your prompts, gather large datasets on behaviors, and store information for analysis. Over time, you'll discover that certain tokens, lengths, and prompt structures elicit consistent behaviors from the model.

Approach Challenges with an experimental mindset:

Form a hypothesis: Based on your understanding of the LLM's architecture and constraints, develop a hypothesis for how specific prompts or techniques influence the model's behavior.
Collect data points: Design experiments to test your hypothesis by generating diverse prompts and collecting the model's responses. Automate this process with scripts to build a large dataset.
Analyze behaviors: Examine the collected data to identify patterns, correlations, and anomalies in the model's responses. Use statistical analysis and visualization to validate your insights and refine your hypotheses.
Iterate and optimize: Based on your findings, refine your prompting strategies to exploit the model's vulnerabilities and achieve your desired outcomes.

Don't be afraid to automate approaches, use your own LLMs to adapt your prompts, or reference public datasets for ideas.