Using in code
Load dataset rows in Python for evaluations, training, and AIRT suites — from HuggingFace, local sources, or published versions.
The SDK gives you two entry points to a dataset: loading a source (from HuggingFace or a local directory) into content-addressable storage, and opening a published package already in the registry.
| Goal | Use |
|---|---|
| Cache a HuggingFace dataset or read a local source | dn.load_dataset(path_or_hf_id, split=...) |
| Download a registry dataset so code can load it | dn.pull_package(["dataset://org/name:version"]) |
| Open a registry dataset already cached locally | dn.load_package("dataset://org/name@version") or Dataset("org/name") |
| Publish a local source back to the registry | dn.push_dataset("./path") (see Publishing) |
The loaded object is a LocalDataset (or its subclass Dataset). Both expose the same conversion helpers: to_pandas(), to_hf(), and direct load() for PyArrow.
Cache a HuggingFace dataset
Section titled “Cache a HuggingFace dataset”import dreadnode as dn
local_ds = dn.load_dataset("squad", split="train[:500]")print(local_ds.to_pandas().head())load_dataset forwards extra keyword arguments to HuggingFace’s datasets.load_dataset. Rows land in Dreadnode’s content-addressable store — re-running the same call reads from disk instead of re-downloading.
Read a local dataset source
Section titled “Read a local dataset source”If the path points at a directory containing dataset.yaml, load_dataset reads it directly:
local_ds = dn.load_dataset("./support-prompts")train_df = local_ds.to_pandas(split="train")See Authoring for the directory layout.
Open a published dataset
Section titled “Open a published dataset”Pull the registry version first, then open it by name:
import dreadnode as dnfrom dreadnode.datasets import Dataset
dn.pull_package(["dataset://acme/support-prompts:1.2.0"])dataset = Dataset("acme/support-prompts", version="1.2.0")
df = dataset.to_pandas()dn.load_package is equivalent when you already have the package locally:
Both return a Dataset, which shares the full LocalDataset API. Omitting the version opens the latest cached version — fine for inspection, risky for reproducibility.
Convert to a DataFrame or HF Dataset
Section titled “Convert to a DataFrame or HF Dataset”df = dataset.to_pandas(split="train")hf_ds = dataset.to_hf(split="train")to_hf() returns a HuggingFace datasets.Dataset — use this for .map(), .filter(), and training loops that expect the HF API. to_pandas() is handier for exploration, notebooks, and custom preprocessing.
For direct PyArrow access, call dataset.load(split="train").
Feed an evaluation
Section titled “Feed an evaluation”Evaluation expects inline rows or a dataset file path — it doesn’t take a Dataset object directly. Convert first:
from dreadnode.evaluations import Evaluation
rows = dataset.to_pandas().to_dict(orient="records")evaluation = Evaluation(task="acme.tasks.classify_intent", dataset=rows)For hosted evaluations, the rows still go into the manifest inline — pull the dataset, shape the rows, and write them into the dataset block. See Evaluations → Inputs for the per-row input mechanics.
Feed a training job
Section titled “Feed a training job”Training job configs take DatasetRef objects keyed by pinned reference:
from dreadnode.app.api.models import DatasetRef, TinkerSFTJobConfig
config = TinkerSFTJobConfig( dataset_ref=DatasetRef(name="support-prompts", version="1.2.0"), eval_dataset_ref=DatasetRef(name="support-eval", version="1.0.0"), batch_size=8, lora_rank=16, learning_rate=1e-4, steps=100,)The training control plane resolves each reference against the registry — you don’t pull_package first. See Supervised fine-tuning or Reinforcement learning for the full submission flow.
Feed an AIRT suite
Section titled “Feed an AIRT suite”Adversarial datasets are loaded like any other published dataset:
from dreadnode.datasets import Dataset
goals = Dataset("acme/airt-goals", version="1.0.0").to_pandas()
for _, row in goals.iterrows(): # drive your attack loop with row["goal"], row["category"], etc. ...See AI Red Teaming → Datasets for AIRT-specific dataset conventions and goal schemas.
Properties worth knowing
Section titled “Properties worth knowing”dataset.name # "acme/support-prompts"dataset.version # "1.2.0"dataset.format # "parquet"dataset.row_count # 48_213dataset.splits # ["train", "validation", "test"] or Nonedataset.schema # {"ticket_id": "string", "intent": "string", ...}dataset.files # list of artifact paths inside the packagedataset.manifest # DatasetManifest (Pydantic)These are all metadata reads — they hit the local manifest, not the network.
What to reach for next
Section titled “What to reach for next”- Publish your own dataset → Authoring then Publishing
- Find datasets to load → Catalog
- Full SDK API →
dreadnode.datasets