Catalog
Find datasets in the registry, filter by facets, pin references, and pull versions locally.
Once a dataset is in the registry, anyone in the organization (and every org, for public datasets) can find it, pin a version, and pull it. The Hub and the CLI are two views of the same data.
List datasets in your organization
Section titled “List datasets in your organization”dn dataset listacme/[email protected] private - Labeled support tickets for intent classification.acme/[email protected] private - Prompt-injection canaries for regression checks.acme/[email protected] public - Multilingual question answering.Add --include-public to see every organization’s public datasets alongside yours:
dn dataset list --include-public--search <text> filters on name or description; --limit N caps the result count; --json emits the raw response for scripting.
Inspect a dataset
Section titled “Inspect a dataset”dn dataset info acme/support-promptsacme/[email protected] private - Labeled support tickets for intent classification. versions: 1.2.0, 1.1.0, 1.0.0, 0.1.0info shows the latest version’s summary and the full version history. Pass a specific version to fetch that record (dn dataset info acme/[email protected]).
Pinned references
Section titled “Pinned references”org/name@version is the canonical way to refer to a dataset. Every downstream consumer resolves this same shape:
| Where | Example |
|---|---|
| Training job config | DatasetRef(name="support-prompts", version="1.2.0") |
| SDK pull | dn.pull_package(["dataset://acme/support-prompts:1.2.0"]) |
| SDK load | dn.load_package("dataset://acme/[email protected]") |
| CLI pull | dn dataset pull acme/[email protected] |
Evaluation manifests don’t resolve dataset refs directly — they take inline rows (see Evaluations → Inputs). Pull the dataset and shape the rows into the manifest when you need a registry dataset as eval input.
Omit @version for “latest visible” — handy for interactive inspection, but avoid it in automation. A moving latest turns reruns into moving targets.
When the dataset lives in your own organization, the org/ prefix is optional. The CLI, SDK, and evaluation manifests resolve bare names against your active org.
Pull a dataset locally
Section titled “Pull a dataset locally”Without --output, the CLI prints a pre-signed URL you can use with curl, a browser, or a restore script:
# Download URL (expires 2026-04-21T18:23:00Z):# https://...Pull one split instead of the whole artifact:
Splits must exist in the manifest — dn dataset info lists them. When the dataset has no splits, --split is not needed.
Browse in the Hub
Section titled “Browse in the Hub”The Hub shows the same listings with facet filters (tags, license, task categories, format, size category), a per-version detail panel with schema and file list, and an activity feed of recent downloads across the org. The Hub and dn dataset list reflect the same registry — authoring happens through the CLI or SDK, discovery happens through either.
What to reach for next
Section titled “What to reach for next”- Cut a new version or change visibility → Publishing
- Consume the pulled dataset in Python → Using in code
- Every CLI verb →
dn dataset