Introduction - Dreadnode Documentation

Rigging is a lightweight LLM framework to make using language models in production code as simple and effective as possible. Here are the highlights:

Structured Pydantic models can be used interchangeably with unstructured text output.
LiteLLM as the default generator giving you instant access to a huge array of models.
Define prompts as python functions with type hints and docstrings.
Simple tool use, even for models which don’t support them at the API.
Store different models and configs as simple connection strings just like databases.
Integrated tracing support with Logfire to track activity.
Chat templating, forking, continuations, generation parameter overloads, stripping segments, etc.
Async batching and fast iterations for large scale generation.
Metadata, callbacks, and data format conversions.
Modern python with type hints, async support, pydantic validation, serialization, etc.

import rigging as rg

@rg.prompt(generator_id="gpt-4")
async def get_authors(count: int = 3) -> list[str]:
    """Provide famous authors."""

print(await get_authors())

# ['William Shakespeare', 'J.K. Rowling', 'Jane Austen']

Basic Chats

Let’s start with a very basic generation example that doesn’t include any parsing features, continuations, etc. You want to chat with a model and collect it’s response. We first need to get a Generator object. We’ll use get_generator which will resolve an identifier string to the underlying generator class object.

import rigging as rg

generator = rg.get_generator("claude-3-sonnet-20240229")
pipeline = generator.chat(
    [
        {"role": "system", "content": "You are a wizard harry."},
        {"role": "user", "content": "Say hello!"},
    ]
)
chat = await pipeline.run()
print(chat.all)
# [
#   Message(role='system', parts=[], content='You are a wizard harry.'),
#   Message(role='user', parts=[], content='Say hello!'),
# ]

The default Rigging generator is LiteLLM, which wraps a large number of providers and models. We assume for these examples that you have API tokens set as environment variables for these models. You can refer to the LiteLLM docs for supported providers and their key format. If you’d like, you can change any of the model IDs we use and/or add ,api_key=[sk-1234] to the end of any of the generator IDs to specify them inline.

You’ll see us use this shorthand import syntax throughout our code, it’s totally optional but makes things look nice.
This is actually shorthand for litellm!anthropic/claude-3-sonnet-20240229, where litellm is the provider. We just default to that generator and you don’t have to be explicit. You can find more information about this in the generators docs.
From version 2 onwards, Rigging is fully async. You can use await to trigger generation and get your results, or use await_.

Generators have an easy chat() method which you’ll use to initiate the conversations. You can supply messages in many different forms from dictionary objects, full Messagerigging.message.Message] classes, or a simple str which will be converted to a user message.

import rigging as rg

generator = rg.get_generator("claude-3-sonnet-20240229")
pipeline = generator.chat( # (1)!
    [
        {"role": "system", "content": "You are a wizard harry."},
        {"role": "user", "content": "Say hello!"},
    ]
)
chat = await pipeline.run()
print(chat.all)
# [
#   Message(role='system', parts=[], content='You are a wizard harry.'),
#   Message(role='user', parts=[], content='Say hello!'),
#   Message(role='assistant', parts=[], content='Hello! How can I help you today?'),
# ]

1. generator.chat is actually just a helper for chat(generator, ...), they do the same thing.

ChatPipeline vs ChatYou’ll notice we name the result of chat() as pipeline. The naming might be confusing, but chats go through 2 phases. We first stage them into a pipeline, where we operate and prepare them before we actually trigger generation with run().Calling .chat() doesn’t trigger any generation, but calling any of these run methods will:

rigging.chat.ChatPipeline.run
rigging.chat.ChatPipeline.run_many
rigging.chat.ChatPipeline.run_batch
rigging.chat.ChatPipeline.run_over

In this case, we have nothing additional we want to add to our chat pipeline, and we are only interested in generating exactly one response message. We simply call .run() to execute the generation process and collect our final Chat object.

import rigging as rg

generator = rg.get_generator("claude-3-sonnet-20240229")
pipeline = generator.chat(
    [
        {"role": "system", "content": "You are a wizard harry."},
        {"role": "user", "content": "Say hello!"},
    ]
)
chat = await pipeline.run()
print(chat.all)
# [
#   Message(role='system', parts=[], content='You are a wizard harry.'),
#   Message(role='user', parts=[], content='Say hello!'),
#   Message(role='assistant', parts=[], content='Hello! How can I help you today?'),
# ]

View more about Chat objects and their properties over here. In general, chats give you access to exactly what messages were passed into a model, and what came out the other side.

IDE SetupRigging has been built with full type support which provides clear guidance on what methods return what types, and when they return those types. It’s recommended that you operate in a development environment which can take advantage of this information. Rigging will almost “fall” into place and you won’t be guessing about objects as you work.

Prompts

Operating chat pipelines manually is very flexible, but can feel a bit verbose. Rigging supports the concept of “prompt functions” where you to define the interaction with an LLM as a python function signature, and convert that to a callable object which abstracts the pipeline away from you.

import rigging as rg

@rg.prompt(generator_id="claude-3-sonnet-20240229")
async def say_hello(name: str) -> rg.Chat:
    """Say hello to {{ name }}"""

chat = await say_hello("Harry")

Prompts are very powerful. You can take control over any of the inputs in your docstring, gather outputs as structured objects, lists, dataclasses, and collect the underlying Chat object. Check out Prompt Functions for more information.

Conversations

Both ChatPipeline and Chat objects provide freedom for forking off the current state of messages, or continuing a stream of messages after generation has occurred. In general:

ChatPipeline.fork will clone the current chat pipeline and let you maintain both the new and original object for continued processing.
Chat.fork will produce a fresh ChatPipeline from all the messages prior to the previous generation (useful for “going back” in time).
Chat.continue_ is similar to fork (actually a wrapper) which tells fork to include the generated messages as you move on (useful for “going forward” in time).

In other words, the abstraction of going back and forth in a “conversation” would be continuously calling Chat.continue_ after each round of generation.

import rigging as rg

pipeline = (
    rg.get_generator("gpt-4o-mini")
    .chat("Hello, how are you?")
)

# We can fork before generation has occurred
specific = await pipeline.fork("Be specific please.").run()
poetic = await pipeline.fork("Be as poetic as possible").with_(temperature=1.5).run() # (1)!

print(specific.conversation)

# [user]: Hello, how are you?
# [user]: Be specific please.
# [assistant]: I'm just a computer program, so I don't have feelings ...

print(poetic.conversation)

# [user]: Hello, how are you?
# [user]: Be as poetic as possible
# [assistant]: In the gentle embrace of sun-drenched morn, I find myself a ...

# We can also continue after generation
next_pipeline_chat = poetic.continue_("That's good, tell me a joke") # (2)!
update = await next_pipeline_chat.run()

print(update.conversation)

# [user]: Hello, how are you?
# [user]: Be as poetic as possible
# [assistant]: In the gentle embrace of sun-drenched morn, I find myself ...
# [user]: That's good, tell me a joke
# [assistant]: Sure! Here’s a light-hearted joke for you:

In this case the temperature change will only be applied to the poetic path because fork has created a clone of our chat pipeline.
For convenience, we can usually just pass str objects in place of full messages, which underneath will be converted to a Message object with the user role.

Basic Parsing

Now let’s assume we want to ask the model for a piece of information, and we want to make sure this item conforms to a pre-defined structure. Underneath rigging uses Pydantic XML which itself is built on Pydantic. We’ll cover more about constructing models in a later section, but don’t stress the details for now.

XML vs JSONRigging is opinionated with regard to using XML to weave unstructured data with structured contents as the underlying LLM generates text responses, at least when it comes to raw text content. If you want to take advantage of structured JSON parsing provided by model providers or inference tools, Tools are a great way to do that.You can read more about XML tag use from Anthropic who have done extensive research with their models.

To begin, let’s define a FunFact model which we’ll have the LLM fill in. Rigging exposes a Model base class which you should inherit from when defining structured inputs. This is a lightweight wrapper around pydantic-xml’s BaseXMLModel with some added features and functionality to make it easy for Rigging to manage. However, everything these models support (for the most part) is also supported in Rigging.

import rigging as rg

class FunFact(rg.Model): # (1)!
    fact: str

chat = await rg.get_generator('gpt-3.5-turbo').chat(
    f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()

fun_fact = chat.last.parse(FunFact)

print(fun_fact.fact)
# The Eiffel Tower can be 15 cm taller during the summer due to the expansion of the iron in the heat.

1. This is what pydantic XML refers to as a “primitive” class as it is simply and single typed value placed between the tags. See more about primitive types, elements, and attributes in the Pydantic XML Docs We need to show the target LLM how to format it’s response, so we’ll use the .xml_example() class method which all models support. By default this will simple emit empty XML tags of our model:

Provide a fun fact between <fun-fact></fun-fact> tags.

Customizing Model Tags

Tags for a model are auto-generated based on the name of the class. You are free to override these by passing tag=[value] into your class definition like this:

class LongNameForThing(rg.Model, tag="short"):
    ...

We wrap up the generation and extract our parsed object by calling .parse() on the last message of our generated chat. This will process the contents of the message, extract the first matching model which parses successfully, and return it to us as a python object.

import rigging as rg

class FunFact(rg.Model):
    fact: str

chat = await rg.get_generator('gpt-3.5-turbo').chat(
    f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()

fun_fact = chat.last.parse(FunFact) # (1)!

print(fun_fact.fact)
# The Eiffel Tower can be 15 cm taller during the summer due to the expansion of the iron in the heat.

1. Because we’ve defined FunFact as a class, the result if .parse() is type-hinted object of that class. In our code, all the properties of FunFact will be available just like we created the object directly. Notice that we don’t have to worry about the model being verbose in it’s response, as we’ve communicated that the text between the <fun-fact></fun-fact> tags is the relevant place to put it’s answer. If the model includes a thought process, supplemental information, or anything else, we can simply ignore it.

Strict Parsing

In the example above, we don’t handle the case where the model fails to properly conform to our desired output structure. If the last message content is invalid in some way, our call to parse will result in an exception from rigging. Rigging is designed at it’s core to manage this process, and we have a few options:

We can extend our chat pipeline with .until_parsed_as() which will cause the run() function to internally check if parsing is succeeding before returning the chat back to you.
We can make the parsing optional by switching to .try_parse(). The type of the return value with automatically switch to #!python FunFact | None and you can handle cases where parsing failed.

chat = (
    await
    rg.get_generator('gpt-3.5-turbo')
    .chat(f"Provide a fun fact between {FunFact.xml_example()} tags.")
    .until_parsed_as(FunFact)
    .run()
)

fun_fact = chat.last.parse(FunFact) # This call should never fail

print(fun_fact or "Failed to get fact")

Double ParsingWe still have to call .parse() on the message despite using .until_parsed_as(). This is a limitation of type hinting as we’d have to turn ChatPipeline and Chat into generic types, which could carry that information forward. It’s a small price we pay for big code complexity savings. However, the use of .until_parsed_as() will cause the generated messages to have parsed models in their .parts. So if you don’t need to access the typed object immediately, you can be confident serializing the chat object and the model will be there when you need it.

Max Depth Concept

When control is passed into a chat pipeline with .until_parsed_as(), a .then() callback is registered internally to operate during generation. When model output is received, the callback will attempt to parse, and if it fails, it will re-trigger generation. This process will repeat until the model produces a valid output or the maximum “depth” is reached. Often you might find yourself constantly getting MaxDepthError exceptions. This is usually a sign that the LLM doesn’t have enough information about the desired output, or complexity in your model is too high. You have a few options for graceful handling these situations:

You can adjust the max_depth to a higher value. This will allow the model to try more times before failing.
Pass allow_failed to your run() method and check the .failed property after generation
Use an custom callback with .then() to get more external control over the process.

Parsing Multiple Models

Assuming we wanted to extend our example to produce a set of interesting facts, we have a couple of options:

Simply use run_many() and generate N examples individually
Rework our code slightly and let the model provide us multiple facts at once.

chats = await rg.get_generator('gpt-3.5-turbo').chat(
    f"Provide a fun fact between {FunFact.xml_example()} tags."
).run_many(3)

for chat in chats:
    print(chat.last.parse(FunFact).fact)

Parsing with Prompts

The use of Prompt functions can make parsing even easier. We can refactor our previous example and have rigging parse out FunFacts directly for us:

import rigging as rg

class FunFact(rg.Model):
    fact: str

@rg.prompt(generator_id="gpt-3.5-turbo")
def get_fun_fact() -> FunFact:
    """Provide a fun fact."""

fun_facts = await get_fun_fact.run_many(3)

Tools

Tools exposed to LLMs are super simple with Rigging. You can define a python function and make it available straight in the chat pipeline.

import rigging as rg

def add_numbers(x: float, y: float) -> float:
    return x + y

chat = (
    await
    rg.get_generator("gpt-4o-mini")
    .chat("What is 1337 + 42?")
    .using(add_numbers)
    .run()
)

print(chat.conversation)

# [user]: What is 1337 + 42?
#
# [assistant]:
#  |- add_numbers({"x":1337,"y":42})
#
# [tool]: 1379
#
# [assistant]: 1337 + 42 equals 1379.

You can add as many tools as you’d like, document them and their parameters, and we support complex argument types like pydantic models and dataclasses. Your function can return standard objects to cast into strings, Message objects, or even content parts for multi-modal generation (ContentImageUrl) Check out Tools for more information.

Tools + Prompts

You can combine prompts and tools to achieve “multi-agent” behavior:

import rigging as rg
from typing import Annotated

Joke = Annotated[str, rg.Ctx("joke")]

@rg.prompt(generator_id="gpt-4o-mini")
async def generate_jokes(count: int) -> list[Joke]:
    "Write {{count}} short hilarious jokes."


@rg.prompt(generator_id="gpt-4o", tools=[generate_jokes])
async def write_joke() -> Joke:
    """
    Generate some jokes, then choose the best.
    You must return just a single joke.
    """

joke = await write_joke()

Underneath the generate_jokes prompt will be presented as an available tool when gpt-4o is working on tasks, and rigging with handle all the inference and type processing for you.

Keep Going

Check out the topics section for more in-depth explanations and examples.

​Basic Chats

​Prompts

​Conversations

​Basic Parsing

​Customizing Model Tags

​Strict Parsing

​Max Depth Concept

​Parsing Multiple Models

​Parsing with Prompts

​Tools

​Tools + Prompts

​Keep Going

Basic Chats

Prompts

Conversations

Basic Parsing

Customizing Model Tags

Strict Parsing

Max Depth Concept

Parsing Multiple Models

Parsing with Prompts

Tools

Tools + Prompts

Keep Going