Skip to content

Catalog

Find datasets in the registry, filter by facets, pin references, and pull versions locally.

Once a dataset is in the registry, anyone in the organization (and every org, for public datasets) can find it, pin a version, and pull it. The Hub and the CLI are two views of the same data.

Terminal window
dn dataset list
acme/[email protected] private - Labeled support tickets for intent classification.
acme/[email protected] private - Prompt-injection canaries for regression checks.
acme/[email protected] public - Multilingual question answering.

Add --include-public to see every organization’s public datasets alongside yours:

Terminal window
dn dataset list --include-public

--search <text> filters on name or description; --limit N caps the result count; --json emits the raw response for scripting.

Terminal window
dn dataset info acme/support-prompts
acme/[email protected] private - Labeled support tickets for intent classification.
versions: 1.2.0, 1.1.0, 1.0.0, 0.1.0

info shows the latest version’s summary and the full version history. Pass a specific version to fetch that record (dn dataset info acme/[email protected]).

org/name@version is the canonical way to refer to a dataset. Every downstream consumer resolves this same shape:

WhereExample
Training job configDatasetRef(name="support-prompts", version="1.2.0")
SDK pulldn.pull_package(["dataset://acme/support-prompts:1.2.0"])
SDK loaddn.load_package("dataset://acme/[email protected]")
CLI pulldn dataset pull acme/[email protected]

Evaluation manifests don’t resolve dataset refs directly — they take inline rows (see Evaluations → Inputs). Pull the dataset and shape the rows into the manifest when you need a registry dataset as eval input.

Omit @version for “latest visible” — handy for interactive inspection, but avoid it in automation. A moving latest turns reruns into moving targets.

When the dataset lives in your own organization, the org/ prefix is optional. The CLI, SDK, and evaluation manifests resolve bare names against your active org.

Terminal window
dn dataset pull acme/[email protected] --output ./data.parquet

Without --output, the CLI prints a pre-signed URL you can use with curl, a browser, or a restore script:

Terminal window
dn dataset pull acme/[email protected]
# Download URL (expires 2026-04-21T18:23:00Z):
# https://...

Pull one split instead of the whole artifact:

Terminal window
dn dataset pull acme/[email protected] --split test --output ./test.parquet

Splits must exist in the manifest — dn dataset info lists them. When the dataset has no splits, --split is not needed.

The Hub shows the same listings with facet filters (tags, license, task categories, format, size category), a per-version detail panel with schema and file list, and an activity feed of recent downloads across the org. The Hub and dn dataset list reflect the same registry — authoring happens through the CLI or SDK, discovery happens through either.