Message Transforms - Dreadnode Documentation

Transforms are Rigging’s message processing engine, providing a powerful pipeline system that lets you modify messages and generation parameters before they reach the model, then optionally post-process the results. They’re the backbone of Rigging’s flexibility—enabling everything from custom message formats and training data preparation to seamless compatibility with any model architecture. Think of transforms as bidirectional converters that can reshape conversations for specific models, inject context dynamically, apply formatting rules, and then restore or further process the results. Whether you’re adapting conversations for fine-tuning, implementing custom tool calling formats, or building sophisticated preprocessing pipelines, transforms give you complete control over how your data flows through the generation process. Rigging uses transforms extensively under the hood (especially for tool calling modes), but the real power comes when you build custom transforms tailored to your specific requirements.

Basic Usage

Every transform follows a consistent three-phase protocol that makes them predictable and composable: Transform Protocol:

Pre-processing - Modify messages and generation parameters
Generation - Model processes the transformed content
Post-processing - Optional cleanup or further transformation of results

The transform function signature is straightforward - it receives the current messages and parameters, then returns the modified versions along with an optional post-processing function:

import rigging as rg

async def my_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, rg.PostTransform | None]:  # (1)!
    suffix = " - Talk like a pirate!"

    # Modify messages before generation
    for message in [m for m in messages if m.role == "user"]:  # (2)!
        message.content = f"{message.content}{suffix}"

    # Optional: return a function to undo changes
    async def post_transform(chat: rg.Chat) -> rg.Chat:  # (3)!
        print(f"Pre:\n{chat.conversation}")

        for message in chat.all:
            message.content = message.content.replace(suffix, "")

        print(f"Post:\n{chat.conversation}")
        return chat

    return messages, params, post_transform  # (4)!

# Apply in a chat pipeline
chat = (
    await rg.get_generator("gpt-4")
    .chat("Hello, how are you?")
    .transform(my_transform)  # (5)!
    .run()
)

# Pre:
# [user]: Hello, how are you? - Talk like a pirate!

# [assistant]: Arr matey! I be shipshape and Bristol fashion, ...

# Post:
# [user]: Hello, how are you?

# [assistant]: Arr matey! I be shipshape and Bristol fashion, ...

Transform signature: takes messages and params, returns modified versions plus optional post-processor
Pre-processing: modify only user messages in this example
Post-processing function: receives the complete Chat after generation
Return the modified messages, params, and post-processor
Apply the transform to the pipeline - it runs automatically during generation

Transform MutabilityTransforms can modify messages in-place (as shown above) or create new message objects. Both approaches work, but in-place modification is more memory efficient for large conversations.

Tool Transforms Under the Hood

When you use tools with the mode parameter, Rigging automatically applies sophisticated transforms that handle the entire tool calling lifecycle. These built-in transforms are prime examples of the power of the transform system—they seamlessly convert between native Python function calls and model-specific formats. Tool Transform Process:

System Injection - Adds tool definitions and usage instructions to system messages
Format Conversion - Transforms native ToolCall objects to JSON/XML representations
Response Parsing - Converts model outputs back to structured ToolCall objects
State Restoration - Removes injected content and restores original message structure

import rigging as rg

@rg.tool
def get_weather(city: str) -> str:
    return f"Weather in {city}: 72°F and sunny"

# Rigging automatically uses transforms based on mode
chat = (
    await rg.get_generator("meta-llama/llama-3.1-8b")  # (1)!
    .chat("What's the weather in Paris?")
    .using(get_weather, mode="json-with-tag")  # (2)!
    .run()
)

Models without native function calling need transform-based tool support
Rigging automatically applies tools_to_json_with_tag_transform based on the mode

Transform-Based Tool Modes:

"json-with-tag" - JSON wrapped in XML tags (most reliable for open source models)
"json" - Raw JSON format (compact but can be harder to parse)
"json-in-xml" - XML structure with JSON parameters
"xml" - Pure XML format (verbose but very explicit)

# Model sees and generates:
{
  "name": "get_weather",
  "arguments": {"city": "Paris"}
}

Manual Transform Application

You can apply tool transforms manually to existing conversations, which is particularly useful for training data preparation or format standardization. This lets you convert conversations that used native function calling into text-based formats for fine-tuning:

import rigging as rg

def add_numbers(a: int, b: int) -> int:
    """Adds two numbers together."""
    return a + b

chat = (
    await rg.get_generator("openai/gpt-4o-mini")
    .chat("What is 2 + 3?")
    .using(add_numbers) # Pass the function directly
    .run()
)

# Transform the chat to JSON with tool tags
transformed = await chat.transform(  # (1)!
    rg.transform.tools_to_json_transform  # (2)!
)

print(transformed.conversation)

# [system]: # Tools

# You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.

# <tools>
# {"type":"function","function":{"name":"add_numbers","description":"Adds two numbers together.","parameters":{"additionalProperties":false,"properties":{"a":{"title":"A","type":"integer"},"b":{"title":"B","type":"integer"}},"required":["a","b"],"type":"object"}}}
# </tools>

# To call a function, respond with the following format:

# {"name": <function-name>, "arguments": <args-dict>}

# [user]: What is 2 + 3?

# [assistant]: {"name": "add_numbers", "parameters": {"a": 2, "b": 3}}

# [user]: <tool-response id="call_BzyKRU20SyyQvVjBjmsOqkji">5</tool-response>

# [assistant]: 2 + 3 equals 5.

Apply transforms to existing chats after generation
Use the built-in JSON transform for training data preparation

Transform vs Pipeline Application

Pipeline: .transform() on ChatPipeline applies during generation
Chat: .transform() on Chat applies to existing conversations
Both use the same transform functions but at different stages of the workflow

Custom Transforms

Creating your own transforms unlocks the full power of Rigging’s message processing system. Custom transforms enable use cases like dynamic context injection, format adaptation, parameter tuning, and sophisticated preprocessing pipelines. Common Transform Patterns:

Context Enhancement - Add dynamic information to messages
Format Adaptation - Convert between different message formats
Parameter Tuning - Adjust generation parameters based on content
Content Filtering - Remove or modify sensitive information
Training Preparation - Add special tokens or formatting for fine-tuning

Message Preprocessing

The most common transform pattern is modifying message content before it reaches the model:

import rigging as rg

async def add_context_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, None]:
    # Add context to user messages
    for message in messages:
        if message.role == "user":  # (1)!
            context = get_current_context()  # (2)!
            message.content = f"Context: {context}\n\n{message.content}"  # (3)!

    return messages, params, None  # (4)!

chat = (
    await rg.get_generator("gpt-4")
    .chat("Help me with this task")
    .transform(add_context_transform)
    .run()
)

Target only user messages for context injection
Fetch dynamic context from your application (database, API, etc.)
Prepend context while preserving original message structure
No post-processing needed for simple content modification

Parameter Modification

Transforms can dynamically adjust generation parameters based on message content, enabling adaptive generation strategies:

async def creative_boost_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, None]:
    # Boost creativity for brainstorming
    if any("brainstorm" in msg.content.lower() for msg in messages):  # (1)!
        params.temperature = min((params.temperature or 0.7) + 0.3, 1.0)  # (2)!

    return messages, params, None

Check if any message contains brainstorming keywords
Safely increase temperature while respecting the 1.0 maximum limit

Parameter SafetyWhen modifying generation parameters, always handle None values gracefully and respect parameter constraints (like temperature ∈ [0, 1]). The original params object may have None values for unset parameters.

Reversible Transforms

One of the most powerful features of the transform system is bidirectional processing. Post-transforms let you restore original content, apply additional processing, or implement sophisticated clean-up logic. This is essential for privacy-preserving workflows, debugging, and training data preparation:

async def anonymize_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, rg.PostTransform]:

    # Store mapping for restoration
    name_mapping = {}  # (1)!

    for message in messages:
        # Replace names with placeholders
        anonymized_content, mapping = replace_names_with_placeholders(message.content)  # (2)!
        name_mapping.update(mapping)
        message.content = anonymized_content

    async def restore_names(chat: rg.Chat) -> rg.Chat:  # (3)!
        # Put real names back in the response
        for message in chat.all:  # (4)!
            message.content = restore_names_from_mapping(message.content, name_mapping)
        return chat

    return messages, params, restore_names  # (5)!

Store the mapping between real names and placeholders for later restoration
Your custom function to detect and replace names (could use NER, regex, etc.)
Post-transform function receives the complete Chat after generation
Process all messages in the conversation, including the model’s response
Return the post-transform function as the third element

State Capture in Post-TransformsPost-transform functions capture variables from their enclosing scope. Be careful with large objects or sensitive data that might be held in memory longer than expected.

Training Data Preparation

Transforms excel at preparing conversations for model fine-tuning and training. They can add special tokens, format conversations for specific architectures, and work seamlessly with Rigging’s tokenization system to preserve message slices throughout the process:

import rigging as rg

async def training_format_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, None]:

    # Add special tokens for training
    for message in messages:
        if message.role == "user":  # (1)!
            message.content = f"<|user|>\n{message.content}<|end|>"
        elif message.role == "assistant":  # (2)!
            message.content = f"<|assistant|>\n{message.content}<|end|>"

    return messages, params, None

# Use with tokenization
tokenizer = rg.get_tokenizer("gpt2")
tokenized = await tokenizer.tokenize_chat(  # (3)!
    chat,
    transform=training_format_transform
)

Add role-specific tokens for user messages
Format assistant messages with appropriate tokens
Tokenization preserves slices even after transformation

Transform + Tokenization PipelineWhen you apply transforms during tokenization, the system processes them in this order:

Apply transform to modify message content
Tokenize the transformed content
Map original message slices to token positions

This ensures that training signals and annotations survive the entire pipeline.

Slice Preservation

One of Rigging’s most sophisticated features is how transforms automatically preserve message slices when modifying content. The slice system intelligently updates positions to stay aligned with the transformed text, ensuring that parsed models, tool calls, and metadata remain properly linked:

import rigging as rg

async def slice_aware_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, None]:

    for message in messages:
        # Check existing slices
        for slice_info in message.slices:  # (1)!
            slice_content = message.content[slice_info.slice_]
            print(f"Found {slice_info.type}: {slice_content}")

        # Modify content - slices automatically adjust
        message.content = f"[PROCESSED] {message.content}"  # (2)!

    return messages, params, None

Access existing slices to understand what’s already annotated
When content changes, slice positions automatically recalculate

Slice Position UpdatesSlice positions automatically update when you modify message content. The system uses intelligent text matching to maintain the connection between slices and their content, ensuring that parsed models, tool calls, and metadata stay linked to the right text even after transformation.

Use Cases and Best Practices

Transforms enable sophisticated message processing patterns across many domains:

Fine-Tuning Data Preparation

async def fine_tuning_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, None]:

    # Add instruction formatting
    for message in messages:
        if message.role == "system":
            message.content = f"### Instruction\n{message.content}"
        elif message.role == "user":
            message.content = f"### Input\n{message.content}"
        elif message.role == "assistant":
            message.content = f"### Response\n{message.content}"

    return messages, params, None

# Prepare datasets with consistent formatting
training_chats = []
for raw_chat in dataset:
    formatted = await raw_chat.transform(fine_tuning_transform)
    training_chats.append(formatted)

Multi-Model Adaptation

async def model_specific_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams,
    model_type: str
) -> tuple[list[rg.Message], rg.GenerateParams, None]:

    if model_type == "claude":
        # Claude prefers Human/Assistant formatting
        for message in messages:
            if message.role == "user":
                message.content = f"Human: {message.content}"
            elif message.role == "assistant":
                message.content = f"Assistant: {message.content}"

    elif model_type == "llama":
        # Llama uses specific instruction formatting
        for message in messages:
            if message.role == "system":
                message.content = f"<<SYS>>\n{message.content}\n<</SYS>>"

    return messages, params, None

Privacy and Security

async def privacy_transform(
    messages: list[rg.Message],
    params: rg.GenerateParams
) -> tuple[list[rg.Message], rg.GenerateParams, rg.PostTransform]:

    sensitive_patterns = {
        r'\b\d{3}-\d{2}-\d{4}\b': '[SSN]',  # Social Security Numbers
        r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b': '[EMAIL]',  # Emails
        r'\b\d{16}\b': '[CARD]'  # Credit card numbers
    }

    original_content = {}

    for message in messages:
        original_content[id(message)] = message.content
        for pattern, replacement in sensitive_patterns.items():
            message.content = re.sub(pattern, replacement, message.content)

    async def restore_content(chat: rg.Chat) -> rg.Chat:
        # Restore original content for storage/logging
        for message in chat.all:
            if id(message) in original_content:
                message.content = original_content[id(message)]
        return chat

    return messages, params, restore_content

Security ConsiderationsWhen implementing privacy transforms, be extra careful about:

Memory leaks - Sensitive data captured in closures
Logging - Ensure logs don’t contain sensitive information
Error handling - Exceptions might expose original content
State management - Clean up sensitive data after use

​Basic Usage

​Tool Transforms Under the Hood

​Manual Transform Application

​Custom Transforms

​Message Preprocessing

​Parameter Modification

​Reversible Transforms

​Training Data Preparation

​Slice Preservation

​Use Cases and Best Practices

​Fine-Tuning Data Preparation

​Multi-Model Adaptation

​Privacy and Security

Basic Usage

Tool Transforms Under the Hood

Manual Transform Application

Custom Transforms

Message Preprocessing

Parameter Modification

Reversible Transforms

Training Data Preparation

Slice Preservation

Use Cases and Best Practices

Fine-Tuning Data Preparation

Multi-Model Adaptation

Privacy and Security