Skip to content

Versions & metrics

Compare model releases side-by-side, attach evaluation metrics, promote with aliases, and retire versions.

Once a model name has two or more versions, the registry stops being a filing cabinet and starts being a release-management surface. Compare, annotate, promote, delete — the mechanics on this page.

Terminal window
dn model compare support-assistant 1.0.0 1.1.0 1.2.0
support-assistant version comparison
┃ ┃ 1.0.0 ┃ 1.1.0 ┃ 1.2.0 ┃
┇ framework ┇ safetensors ┇ safetensors ┇ safetensors ┇
┇ task ┇ text-generation ┇ text-generation ┇ text-generation ┇
┇ architecture ┇ LlamaForCausalLM ┇ LlamaForCausalLM ┇ LlamaForCausalLM ┇
┇ base model ┇ meta-llama/Llama-3.1-8B ┇ meta-llama/Llama-3.1-8B ┇ meta-llama/Llama-3.1-8B ┇
┇ size ┇ 14850.3 MB ┇ 14850.3 MB ┇ 14892.1 MB ┇
┇ aliases ┇ - ┇ staging ┇ champion ┇
┇ intent_accuracy ┇ 0.812 ┇ 0.847 ┇ 0.873 ┇
┇ f1 ┇ 0.79 ┇ 0.83 ┇ 0.86 ┇

compare takes 2–5 versions. Every attached metric gets its own row, so the tradeoffs across releases fit on one screen.

Add --json for machine-readable output. The Hub renders the same comparison visually with metric charts over version history.

Metrics are version-level key/value pairs you attach after a model is published — typically the output of an evaluation run you want to record alongside the weights.

Terminal window
dn model metrics [email protected] \
intent_accuracy=0.873 \
f1=0.86 \
pass_at_1=0.71
Updated acme/[email protected]: intent_accuracy=0.873, f1=0.86, pass_at_1=0.71

Values that parse as integers or floats are stored as numbers; anything else is stored as a string. Updates merge — metrics you don’t mention are preserved.

A common pattern: run an evaluation, then record the top-line scores back onto the model version so the registry entry reflects how it did:

Terminal window
# Score the model against your evaluation suite (locally or hosted), then:
dn model metrics [email protected] \
intent_accuracy=0.873 f1=0.86

The dn model compare table then shows the eval scores beside framework, architecture, and aliases. Hosted evaluations reach the model through its inference endpoint — see Using in code for loading a registry artifact into a generator or serving it externally.

Aliases are human-friendly labels that float across versions. Use them when a release has a role — champion, staging, latest-stable — and you want to promote without rewriting downstream configs.

Terminal window
dn model alias [email protected] champion
champion → acme/[email protected]

Setting an alias that already exists on another version moves it — there is exactly one champion per model name. Remove an alias:

Terminal window
dn model alias [email protected] champion --remove

Aliases + metrics + comparison give you the full promotion loop:

  1. Train a new version (@1.2.0) and push it.
  2. Run your evaluation suite against the new version.
  3. dn model metrics [email protected] ... with the scores.
  4. dn model compare support-assistant 1.1.0 1.2.0 — confirm it’s actually better on the metrics you care about.
  5. dn model alias [email protected] champion — move the alias; downstream consumers that follow champion start loading the new version.

If something regresses in production, move the alias back: dn model alias [email protected] champion.

Terminal window
dn model delete acme/[email protected]

delete requires a version — there’s no “delete the whole family” verb. The CLI confirms before deleting; pass --yes for automation:

Terminal window
dn model delete acme/[email protected] --yes

Deletion is permanent. Inference and training configs that pin the deleted version will fail to resolve. Run dn model compare <name> <versions...> first — the aliases row shows which version a champion or staging label is currently attached to, so you can reassign before deleting. Aliases on a deleted version disappear with it.