Skip to content

Quickstart

Author a dataset directory, publish a version to your organization, and reference it from an evaluation.

Package a parquet file as a Dreadnode dataset, push it, and pin the result in an evaluation — all from the CLI.

  • The Dreadnode CLI authenticated (dn login) — see Authentication
  • Python with pyarrow and pandas installed
  • One dataset in tabular shape (parquet, csv, json, or jsonl)
support-prompts/
dataset.yaml
train.parquet

A minimal dataset.yaml:

dataset.yaml
name: support-prompts
version: 0.1.0
summary: Sampled support tickets for intent evaluation.
format: parquet

name and version are optional — the directory name fills in for name, and version defaults to 0.1.0. Fill them in anyway; the registry record is easier to read with them set. See the manifest reference for every field.

Terminal window
dn dataset inspect ./support-prompts
format: parquet
rows: 1,234
Schema
┏━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Column ┃ Type ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ ticket_id │ string │
│ body │ string │
│ intent │ string │
└────────────┴───────────┘

inspect reads dataset.yaml, loads each artifact to confirm it parses, and infers schema and row count when the manifest omits them. Use it as your local pre-flight — if this fails, the push will too.

Terminal window
dn dataset push ./support-prompts
Pushed acme/[email protected] (sha256:9ab81fc1...)

The version goes to your organization (acme here) and is visible only to that org by default. The qualified name is org/name@version.

import dreadnode as dn
from dreadnode.datasets import Dataset
dn.pull_package(["dataset://acme/support-prompts:0.1.0"])
dataset = Dataset("acme/support-prompts", version="0.1.0")
df = dataset.to_pandas()
print(df.head())

pull_package downloads the version you just pushed; Dataset(...) opens it by name. See Using in code for every entry point and the difference between pull_package and load_package.

Edit the dataset source, bump version in dataset.yaml, and push again:

Terminal window
# dataset.yaml
version: 0.2.0
Terminal window
dn dataset push ./support-prompts

Older versions stay in the registry. Point downstream configs at @0.2.0 when you’re ready to adopt the change.

  • Use HuggingFace data or add splits → Authoring
  • Make the dataset public, retire a version, or restrict visibility → Publishing
  • Feed the dataset into evaluations, training, or AIRT → Using in code
  • Browse what’s already in the registry → Catalog
  • Every CLI verb → dn dataset