Callbacks and Mapping
Rigging is designed to give control over how the generation process works, and what occurs after.
Callbacks and Mapping
Rigging is designed to give control over how the generation process works, and what occurs after. In fact, higher level functions like .using()
and [.until_parsed_as()
][rigging.chat.ChatPipeline] leverage a generic callback system underneath to guide generation. Let’s walk through them.
Watch Callbacks
Pipelines, Prompts, and Generators hold a list of passive callbacks which will be passed Chat
or Completion
objects as they are generated. Watch callbacks are useful for logging, monitoring, or other passive actions that don’t directly affect the generation process. Register them with any of the following:
Generator.watch()
ChatPipeline.watch()
CompletionPipeline.watch()
Prompt.watch()
We also provide various helpers in the rigging.watch
module for writing to files or databases like elastic.
Until Callbacks
If you want to gain control over the generation process before it completes, you can use the [ChatPipeline.until()
] or CompletionPipeline.until()
methods.
These allow you to register a callback function which participates in generation and can decide whether generation should proceed, and exactly how it does so. For chat interfaces, these functions also get fine control over the contents of the chat while callbacks are resolving. This is how we can provide feedback to an LLM model during generation like validation errors when parsing fails (attempt_recovery
).
- Returning
True
from this callback tells Rigging to go back to the generator with the supplied messages and rerun the generation step. Whether you’re appended messages are used is dependent on theattempt_recovery=True
onChatPipeline.until()
. In this instance our request to include a cat will be appending to the intermediate messages while generation completes. We can essentially provide feedback to the model about how it should attempt to satisfy the callback function. - Our use of
drop_dialog=False
here allows us to see the intermediate steps of resolving our callbacks in the final Chat. It’s up to you whether you want these intermediate messages included or not. The default is to drop them once the callbacks resolve.
“Using .until on CompletionPipeline”
The interface for a CompletionPipeline
is very similar to ChatPipeline
, except that you are only allowed to make a statement about whether generation should retry. You are not currently allowed to inject additional text as intermediate context while your callback is attempting to resolve.
Allowing Failures
If you want to allow the generation process to avoid raising an exception when the maximum rounds is exhausted, you can configure on_failed
on the pipeline, or pass it directly to various run methods of a ChatPipeline
or CompletionPipeline
. For single runs, pass allow_failed=True
to .run()
.
This breaks any guarantees about the validity of final chat objects, but you can check their status with the Chat.failed
or Completion.failed
properties.
In the case of on_failed='skip'
, the final outputs of any run method could be anywhere from an empty list to a complete list of the requested batch/many.
Defining Failures
By default Rigging will catch ExhaustedMaxRoundsError
and treat those exceptions as a soft failure you can configure with on_failed
. However, you can also add different exceptions to a pipeline with .catch()
which will be caught and treated as soft failures.
For instance, some APIs might raise exceptions if you cross some threshold for content moderation, and you don’t want these exceptions to interupt large scale pipelines.
- Here we’re adding a custom exception to the pipeline that will be caught and treated as a soft failure. In the case of litellm raising an APIError, those chats will be marked as failed and included in the final output. You can access the raised error with the
Chat.error
property.
Then Callbacks
You might prefer to have your callbacks execute after generation completes, and operate on the Chat/Completion objects from there. This is functionally very similar to ChatPipeline.until()
and might be preferred to expose more of the parsing internals to your code as opposed to the opaque nature of other callback types. Use the ChatPipeline.then()
to register any number of callbacks before executing `ChatPipeline.run().
tip “Branching Chats”
A common use case for .then()
is to branch the conversation based on the output of the
of previous generations. You can continue to chain .then()
and .run()
calls to create
a set of generations that collapse back to the final call when they complete.
Map Callbacks
Rigging also allows you to map process a group of Chats all at once. This is particularly useful for instances of uses of .run_many()
and .run_batch()
.
You also might want to take certain actions depending on the state of a set of Chats all at once. For instance, attempting re-generation if a certain % of Chats didn’t meet some criteria.
“Ordering”
map()
callbacks are always executed before then()
callbacks. Order is preserved based on when they were installed into the ChatPipeline
.