Catalog

Find datasets in the registry, filter by facets, pin references, and pull versions locally.

Once a dataset is in the registry, anyone in the organization (and every org, for public datasets) can find it, pin a version, and pull it. The Hub and the CLI are two views of the same data.

List datasets in your organization

dn dataset list

acme/[email protected] private - Labeled support tickets for intent classification.
acme/[email protected] private - Prompt-injection canaries for regression checks.
acme/[email protected] public - Multilingual question answering.

Add --include-public to see every organization’s public datasets alongside yours:

dn dataset list --include-public

--search <text> filters on name or description; --limit N caps the result count; --json emits the raw response for scripting.

Inspect a dataset

dn dataset info acme/support-prompts

acme/[email protected] private - Labeled support tickets for intent classification.
  versions: 1.2.0, 1.1.0, 1.0.0, 0.1.0

info shows the latest version’s summary and the full version history. Pass a specific version to fetch that record (dn dataset info acme/[email protected]).

Pinned references

org/name@version is the canonical way to refer to a dataset. Every downstream consumer resolves this same shape:

Where	Example
Training job config	`DatasetRef(name="support-prompts", version="1.2.0")`
SDK pull	`dn.pull_package(["dataset://acme/support-prompts:1.2.0"])`
SDK load	`dn.load_package("dataset://acme/[email protected]")`
CLI pull	`dn dataset pull acme/[email protected]`

Evaluation manifests don’t resolve dataset refs directly — they take inline rows (see Evaluations → Inputs). Pull the dataset and shape the rows into the manifest when you need a registry dataset as eval input.

Omit @version for “latest visible” — handy for interactive inspection, but avoid it in automation. A moving latest turns reruns into moving targets.

When the dataset lives in your own organization, the org/ prefix is optional. The CLI, SDK, and evaluation manifests resolve bare names against your active org.

Pull a dataset locally

dn dataset pull acme/[email protected] --output ./data.parquet

Without --output, the CLI prints a pre-signed URL you can use with curl, a browser, or a restore script:

dn dataset pull acme/[email protected]
# Download URL (expires 2026-04-21T18:23:00Z):
# https://...

Pull one split instead of the whole artifact:

dn dataset pull acme/[email protected] --split test --output ./test.parquet

Splits must exist in the manifest — dn dataset info lists them. When the dataset has no splits, --split is not needed.

Browse in the Hub

The Hub shows the same listings with facet filters (tags, license, task categories, format, size category), a per-version detail panel with schema and file list, and an activity feed of recent downloads across the org. The Hub and dn dataset list reflect the same registry — authoring happens through the CLI or SDK, discovery happens through either.

What to reach for next

Cut a new version or change visibility → Publishing
Consume the pulled dataset in Python → Using in code
Every CLI verb → dn dataset