- Structured Pydantic models can be used interchangeably with unstructured text output.
- LiteLLM as the default generator giving you instant access to a huge array of models.
- Define prompts as python functions with type hints and docstrings.
- Simple tool use, even for models which don’t support them at the API.
- Store different models and configs as simple connection strings just like databases.
- Integrated tracing support with Logfire to track activity.
- Chat templating, forking, continuations, generation parameter overloads, stripping segments, etc.
- Async batching and fast iterations for large scale generation.
- Metadata, callbacks, and data format conversions.
- Modern python with type hints, async support, pydantic validation, serialization, etc.
Basic Chats
Let’s start with a very basic generation example that doesn’t include any parsing features, continuations, etc. You want to chat with a model and collect it’s response. We first need to get aGenerator
object. We’ll use get_generator
which will resolve an identifier string to the underlying generator class object.
The default Rigging generator is LiteLLM, which wraps a large number of providers and models. We assume for these examples that you have API tokens set as environment variables for these models. You can refer to the LiteLLM docs for supported providers and their key format. If you’d like, you can change any of the model IDs we use and/or add
,api_key=[sk-1234]
to the end of any of the generator IDs to specify them inline.- You’ll see us use this shorthand import syntax throughout our code, it’s totally optional but makes things look nice.
- This is actually shorthand for
litellm!anthropic/claude-3-sonnet-20240229
, wherelitellm
is the provider. We just default to that generator and you don’t have to be explicit. You can find more information about this in the generators docs. - From version 2 onwards, Rigging is fully async. You can use
await
to trigger generation and get your results, or useawait_
.
chat()
method which you’ll use to initiate the conversations. You can supply messages in many different forms from dictionary objects, full Message
rigging.message.Message] classes, or a simple str
which will be converted to a user message.
generator.chat
is actually just a helper for chat(generator, ...)
, they do the same thing.
ChatPipeline vs ChatYou’ll notice we name the result of
chat()
as pipeline
. The naming might be confusing,
but chats go through 2 phases. We first stage them into a pipeline, where we operate
and prepare them before we actually trigger generation with run()
.Calling .chat()
doesn’t trigger any generation, but calling any of these run methods will:rigging.chat.ChatPipeline.run
rigging.chat.ChatPipeline.run_many
rigging.chat.ChatPipeline.run_batch
rigging.chat.ChatPipeline.run_over
.run()
to execute the generation process and collect our final Chat
object.
IDE SetupRigging has been built with full type support which provides clear guidance on what methods return what types, and when they return those types. It’s recommended that you operate in a development environment which can take advantage of this information. Rigging will almost “fall” into place and you won’t be guessing about objects as you work.
Prompts
Operating chat pipelines manually is very flexible, but can feel a bit verbose. Rigging supports the concept of “prompt functions” where you to define the interaction with an LLM as a python function signature, and convert that to a callable object which abstracts the pipeline away from you.Conversations
BothChatPipeline
and Chat
objects provide freedom
for forking off the current state of messages, or continuing a stream of messages after generation has occurred.
In general:
ChatPipeline.fork
will clone the current chat pipeline and let you maintain both the new and original object for continued processing.Chat.fork
will produce a freshChatPipeline
from all the messages prior to the previous generation (useful for “going back” in time).Chat.continue_
is similar tofork
(actually a wrapper) which tellsfork
to include the generated messages as you move on (useful for “going forward” in time).
Chat.continue_
after each round of generation.
- In this case the temperature change will only be applied to the poetic path because
fork
has created a clone of our chat pipeline. - For convenience, we can usually just pass
str
objects in place of full messages, which underneath will be converted to aMessage
object with theuser
role.
Basic Parsing
Now let’s assume we want to ask the model for a piece of information, and we want to make sure this item conforms to a pre-defined structure. Underneath rigging uses Pydantic XML which itself is built on Pydantic. We’ll cover more about constructing models in a later section, but don’t stress the details for now.XML vs JSONRigging is opinionated with regard to using XML to weave unstructured data with structured contents as the underlying LLM generates text responses, at least when it comes to raw text content. If you want to take advantage of structured JSON parsing provided by model providers or inference tools, Tools are a great way to do that.You can read more about XML tag use from Anthropic who have done extensive research with their models.
FunFact
model which we’ll have the LLM fill in. Rigging exposes a Model
base class which you should inherit from when defining structured inputs. This is a lightweight wrapper around pydantic-xml’s BaseXMLModel
with some added features and functionality to make it easy for Rigging to manage. However, everything these models support (for the most part) is also supported in Rigging.
.xml_example()
class method which all models support. By default this will simple emit empty XML tags of our model:
Customizing Model Tags
Tags for a model are auto-generated based on the name of the class. You are free to override these by passingtag=[value]
into your class definition like this:
.parse()
on the last message of our generated chat. This will process the contents of the message, extract the first matching model which parses successfully, and return it to us as a python object.
FunFact
as a class, the result if .parse()
is type-hinted object of that class. In our code, all the properties of FunFact
will be available just like we created the object directly.
Notice that we don’t have to worry about the model being verbose in it’s response, as we’ve communicated that the text between the <fun-fact></fun-fact>
tags is the relevant place to put it’s answer. If the model includes a thought process, supplemental information, or anything else, we can simply ignore it.
Strict Parsing
In the example above, we don’t handle the case where the model fails to properly conform to our desired output structure. If the last message content is invalid in some way, our call toparse
will result in an exception from rigging. Rigging is designed at it’s core to manage this process, and we have a few options:
- We can extend our chat pipeline with
.until_parsed_as()
which will cause therun()
function to internally check if parsing is succeeding before returning the chat back to you. - We can make the parsing optional by switching to
.try_parse()
. The type of the return value with automatically switch to#!python FunFact | None
and you can handle cases where parsing failed.
Double ParsingWe still have to call
.parse()
on the message despite using .until_parsed_as()
. This is a limitation of type hinting as we’d have to turn ChatPipeline
and Chat
into generic types, which could carry that information forward. It’s a small price we pay for big code complexity savings. However, the use of .until_parsed_as()
will cause the generated messages to have parsed models in their .parts
. So if you don’t need to access the typed object immediately, you can be confident serializing the chat object and the model will be there when you need it.Max Depth Concept
When control is passed into a chat pipeline with.until_parsed_as()
, a .then()
callback is registered internally to operate during generation. When model output is received, the callback will attempt to parse, and if it fails, it will re-trigger generation. This process will repeat until the model produces a valid output or the maximum “depth” is reached.
Often you might find yourself constantly getting MaxDepthError
exceptions. This is usually a sign that the LLM doesn’t have enough information about the desired output, or complexity in your model is too high. You have a few options for graceful handling these situations:
- You can adjust the
max_depth
to a higher value. This will allow the model to try more times before failing. - Pass
allow_failed
to yourrun()
method and check the.failed
property after generation - Use an custom callback with
.then()
to get more external control over the process.
Parsing Multiple Models
Assuming we wanted to extend our example to produce a set of interesting facts, we have a couple of options:- Simply use
run_many()
and generate N examples individually - Rework our code slightly and let the model provide us multiple facts at once.
Parsing with Prompts
The use ofPrompt
functions can make parsing even easier. We can refactor our previous example and have rigging parse out FunFacts directly for us:
Tools
Tools exposed to LLMs are super simple with Rigging. You can define a python function and make it available straight in the chat pipeline.Message
objects, or even content parts for multi-modal generation (ContentImageUrl
)
Check out Tools for more information.
Tools + Prompts
You can combine prompts and tools to achieve “multi-agent” behavior:generate_jokes
prompt will be presented as an available tool when gpt-4o
is working on tasks, and rigging with handle all the inference and type processing for you.