Skip to content

Datasets

Browse, publish, and manage shared dataset artifacts in the Dreadnode platform registry.

Datasets are versioned artifacts published into an organization registry so teams and agents can reuse the same inputs for evaluations, training, and repeatable experiments.

In the App IA, this page lives under Hub.

This page is about published datasets in the platform registry. If you need to load dataset rows in code, see SDK Data.

The platform dataset page is the place to:

  • browse dataset versions from your organization and the public catalog
  • search, sort, and filter by tags, license, task category, format, and size
  • inspect metadata such as row count, file format, and visibility
  • download, publish, unpublish, or delete versions you own

Each card groups versions under one dataset name so you can move between releases without losing context.

Datasets are the durable input side of the platform.

  1. curate the local dataset source
  2. inspect it before publishing
  3. publish a version to the Hub
  4. pin that exact version in evaluations, training, or optimization
  5. pull or download it later when you need the bytes locally again

The App page is primarily the decision surface in steps 3 and 4: which dataset exists, which version is current, and which version should another workflow consume.

ConceptWhat it means
org-scopedvisible only inside the owning organization
publicvisible in the combined catalog across orgs
canonical nameshown as <owner>/<name> when the dataset comes from another org
pinned referenceuse org/name@version for reproducible evaluations, training jobs, and automation

The All view mixes public and org-local datasets. The org-only view limits the page to artifacts owned by the current organization.

  • Prefer explicit versions. A dataset card may show many releases, but automation should pin one.
  • Use metadata like row_count, format, task_categories, and size_category to choose the right artifact before download or job submission.
  • Treat a published dataset as durable registry state. Inline evaluation rows or local ad hoc files are separate workflows.
  • Use the owning org when changing visibility or deleting a version.
Terminal window
dreadnode dataset inspect ./datasets/support-prompts
dreadnode dataset push ./datasets/support-prompts --public
dreadnode dataset download acme/[email protected] --split train --output ./train.jsonl

Use Packages and Registry for publish and download operations, SDK Data when you need rows inside evaluation code, and the SDK API Client when you need registry lookups or dataset inspection from Python.