Datasets

Browse, publish, and manage shared dataset artifacts in the Dreadnode platform registry.

Datasets are versioned artifacts published into an organization registry so teams and agents can reuse the same inputs for evaluations, training, and repeatable experiments.

In the App IA, this page lives under Hub.

This page is about published datasets in the platform registry. If you need to load dataset rows in code, see SDK Data.

What the page is for

The platform dataset page is the place to:

browse dataset versions from your organization and the public catalog
search, sort, and filter by tags, license, task category, format, and size
inspect metadata such as row count, file format, and visibility
download, publish, unpublish, or delete versions you own

Each card groups versions under one dataset name so you can move between releases without losing context.

Workflow

Datasets are the durable input side of the platform.

curate the local dataset source
inspect it before publishing
publish a version to the Hub
pin that exact version in evaluations, training, or optimization
pull or download it later when you need the bytes locally again

The App page is primarily the decision surface in steps 3 and 4: which dataset exists, which version is current, and which version should another workflow consume.

Visibility and references

Concept	What it means
org-scoped	visible only inside the owning organization
public	visible in the combined catalog across orgs
canonical name	shown as `<owner>/<name>` when the dataset comes from another org
pinned reference	use `org/name@version` for reproducible evaluations, training jobs, and automation

The All view mixes public and org-local datasets. The org-only view limits the page to artifacts owned by the current organization.

What agents should assume

Prefer explicit versions. A dataset card may show many releases, but automation should pin one.
Use metadata like row_count, format, task_categories, and size_category to choose the right artifact before download or job submission.
Treat a published dataset as durable registry state. Inline evaluation rows or local ad hoc files are separate workflows.
Use the owning org when changing visibility or deleting a version.

Common workflows

dreadnode dataset inspect ./datasets/support-prompts
dreadnode dataset push ./datasets/support-prompts --public
dreadnode dataset download acme/[email protected] --split train --output ./train.jsonl

Use Packages and Registry for publish and download operations, SDK Data when you need rows inside evaluation code, and the SDK API Client when you need registry lookups or dataset inspection from Python.