Versions & metrics
Compare model releases side-by-side, attach evaluation metrics, promote with aliases, and retire versions.
Once a model name has two or more versions, the registry stops being a filing cabinet and starts being a release-management surface. Compare, annotate, promote, delete — the mechanics on this page.
Compare versions
Section titled “Compare versions”dn model compare support-assistant 1.0.0 1.1.0 1.2.0support-assistant version comparison┃ ┃ 1.0.0 ┃ 1.1.0 ┃ 1.2.0 ┃┇ framework ┇ safetensors ┇ safetensors ┇ safetensors ┇┇ task ┇ text-generation ┇ text-generation ┇ text-generation ┇┇ architecture ┇ LlamaForCausalLM ┇ LlamaForCausalLM ┇ LlamaForCausalLM ┇┇ base model ┇ meta-llama/Llama-3.1-8B ┇ meta-llama/Llama-3.1-8B ┇ meta-llama/Llama-3.1-8B ┇┇ size ┇ 14850.3 MB ┇ 14850.3 MB ┇ 14892.1 MB ┇┇ aliases ┇ - ┇ staging ┇ champion ┇┇ intent_accuracy ┇ 0.812 ┇ 0.847 ┇ 0.873 ┇┇ f1 ┇ 0.79 ┇ 0.83 ┇ 0.86 ┇compare takes 2–5 versions. Every attached metric gets its own row, so the tradeoffs across releases fit on one screen.
Add --json for machine-readable output. The Hub renders the same comparison visually with metric charts over version history.
Attaching metrics
Section titled “Attaching metrics”Metrics are version-level key/value pairs you attach after a model is published — typically the output of an evaluation run you want to record alongside the weights.
intent_accuracy=0.873 \ f1=0.86 \ pass_at_1=0.71Updated acme/[email protected]: intent_accuracy=0.873, f1=0.86, pass_at_1=0.71Values that parse as integers or floats are stored as numbers; anything else is stored as a string. Updates merge — metrics you don’t mention are preserved.
Metrics in downstream workflows
Section titled “Metrics in downstream workflows”A common pattern: run an evaluation, then record the top-line scores back onto the model version so the registry entry reflects how it did:
# Score the model against your evaluation suite (locally or hosted), then: intent_accuracy=0.873 f1=0.86The dn model compare table then shows the eval scores beside framework, architecture, and aliases. Hosted evaluations reach the model through its inference endpoint — see Using in code for loading a registry artifact into a generator or serving it externally.
Aliases
Section titled “Aliases”Aliases are human-friendly labels that float across versions. Use them when a release has a role — champion, staging, latest-stable — and you want to promote without rewriting downstream configs.
champion → acme/[email protected]Setting an alias that already exists on another version moves it — there is exactly one champion per model name. Remove an alias:
Promote a release
Section titled “Promote a release”Aliases + metrics + comparison give you the full promotion loop:
- Train a new version (
@1.2.0) and push it. - Run your evaluation suite against the new version.
dn model metrics [email protected] ...with the scores.dn model compare support-assistant 1.1.0 1.2.0— confirm it’s actually better on the metrics you care about.dn model alias [email protected] champion— move the alias; downstream consumers that followchampionstart loading the new version.
If something regresses in production, move the alias back: dn model alias [email protected] champion.
Retire a version
Section titled “Retire a version”delete requires a version — there’s no “delete the whole family” verb. The CLI confirms before deleting; pass --yes for automation:
Deletion is permanent. Inference and training configs that pin the deleted version will fail to resolve. Run dn model compare <name> <versions...> first — the aliases row shows which version a champion or staging label is currently attached to, so you can reassign before deleting. Aliases on a deleted version disappear with it.
What to reach for next
Section titled “What to reach for next”- Push a new version → Publishing
- Browse the registry and pin references → Catalog
- Load a promoted version in code → Using in code
- Every CLI verb →
dn model