Skip to content

Using in code

Download a published model, load weights and tokenizer with LocalModel, and feed it into a generator or evaluation.

The SDK gives you two entry points to a published model: downloading the artifact into local storage, and loading the weights and tokenizer through LocalModel or HuggingFace.

GoalUse
Download a registry model so code can load itdn.pull_package(["model://org/name:version"])
Open a registry model already cached locallyModel("org/name", version=...) or dn.load_package("model://...")
Load a HuggingFace model into local storagedn.load_model("meta-llama/Llama-3.1-8B-Instruct", task=...)
Publish a local source back to the registrydn.push_model("./path") (see Publishing)
import dreadnode as dn
from dreadnode.models import Model
dn.pull_package(["model://acme/support-assistant:1.2.0"])
model = Model("acme/support-assistant", version="1.2.0")

dn.load_package is the alternate entry point when the package is already local:

model = dn.load_package("model://acme/[email protected]")

Both return a Model — the published-artifact handle. Its properties (name, version, framework, task, architecture, files) read from the manifest without further network calls.

Model.to_hf() reconstructs the artifact directory on disk and hands it to HuggingFace from_pretrained:

hf_model = model.to_hf()
tokenizer = model.tokenizer()

Extra keyword arguments are forwarded. Common ones:

import torch
hf_model = model.to_hf(
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=False,
)

to_hf() dispatches to the right HF class based on task from the manifest (AutoModelForCausalLM, AutoModelForSequenceClassification, etc.). When task is missing, it falls back to AutoModel.

For raw filesystem access — serving with vLLM, converting the checkpoint, or running tools that expect a model directory — call model_path():

/tmp/dn_model_support-assistant_XXXXXX/
path = model.model_path()
# model.safetensors
# config.json
# tokenizer.json
# ...

The directory is materialized on first access and reused on subsequent calls against the same object.

Load a HuggingFace model into local storage

Section titled “Load a HuggingFace model into local storage”
import dreadnode as dn
local_model = dn.load_model("meta-llama/Llama-3.1-8B-Instruct", task="text-generation")
hf_model = local_model.to_hf(torch_dtype="bfloat16", device_map="auto")

load_model caches the HuggingFace download in Dreadnode storage. The first call downloads; subsequent calls read from disk. Pass model_name to override the local storage name.

To publish that cached model back to the Dreadnode registry, re-emit it as a directory with a model.yaml and push — see Publishing.

Wrap the loaded weights in a TransformersGenerator to get a chat interface:

from dreadnode.generators.generator.transformers_ import TransformersGenerator
gen = TransformersGenerator.from_obj(hf_model, tokenizer)
chat = await gen.chat("Summarize this ticket: ...").run()
print(chat.last.content)

See dreadnode.generators for the full generator-construction API.

Registry model artifacts are stored bytes, not inference endpoints. To run an evaluation against a published model, either serve it yourself (vLLM, Ray Serve, a managed endpoint) and pass the resulting model identifier to dn evaluation create --model ..., or load the weights locally and evaluate inline:

from dreadnode.evaluations import Evaluation
from dreadnode.generators.generator.transformers_ import TransformersGenerator
hf_model = model.to_hf(torch_dtype="bfloat16", device_map="auto")
tokenizer = model.tokenizer()
gen = TransformersGenerator.from_obj(hf_model, tokenizer)
async def task(prompt: str) -> str:
chat = await gen.chat(prompt).run()
return chat.last.content
evaluation = Evaluation(task=task, dataset=rows)

See Evaluations → Local for the SDK-side evaluation shape.

model.name # "acme/support-assistant"
model.version # "1.2.0"
model.framework # "safetensors"
model.task # "text-generation" or None
model.architecture # "LlamaForCausalLM" or None
model.files # list of artifact paths inside the package
model.manifest # ModelManifest (Pydantic)

All metadata reads, no network after the initial pull.