Troubleshooting
Diagnose common issues with self-hosted Dreadnode installations.
Start here when something isn’t working. Sections are organized by what you see, not what’s broken — pick the symptom that matches.
Diagnostic commands
Section titled “Diagnostic commands”These are useful regardless of the problem. Assume dreadnode as the release name
throughout — substitute yours if different.
# All pods for the releasekubectl -n <namespace> get pods -l app.kubernetes.io/instance=dreadnode
# Events (scheduling failures, image pull errors, probe failures)kubectl -n <namespace> get events --sort-by='.lastTimestamp'
# API logskubectl -n <namespace> logs deploy/dreadnode-api
# API init container logs (migrations run here)kubectl -n <namespace> logs deploy/dreadnode-api -c migrations
# Health checkcurl http(s)://<your-domain>/api/v1/healthPods stuck in Pending
Section titled “Pods stuck in Pending”The pod can’t be scheduled. Check events:
kubectl -n <namespace> describe pod <pod-name>“no nodes available to schedule pods” or “Insufficient cpu/memory” — Your cluster
doesn’t have enough allocatable resources. The small preset totals roughly 4 vCPU and
8 Gi across all components. Free up resources or add nodes.
“pod has unbound immediate PersistentVolumeClaims” — No StorageClass can provision the requested PVC. Check that a StorageClass exists:
kubectl get storageclassIf empty, install a storage provisioner (local-path, EBS CSI, Rook, etc.) before deploying Dreadnode. The preflight checks catch this, but only if you ran them.
Pods in CrashLoopBackOff
Section titled “Pods in CrashLoopBackOff”The container starts and immediately exits. Check logs for the crashing container.
API pod: init container crash
Section titled “API pod: init container crash”The migrations init container runs alembic upgrade head before the API starts.
If it fails, the pod shows Init:CrashLoopBackOff and the API never boots.
kubectl -n <namespace> logs deploy/dreadnode-api -c migrationsconnection refused or could not translate host name — The API can’t reach
PostgreSQL. If using in-cluster Postgres, check that the dreadnode-postgresql
StatefulSet has a Ready pod. If using an external database, verify the host, port, and
network connectivity from inside the cluster.
password authentication failed or FATAL: role "..." does not exist — Wrong
credentials. For in-cluster Postgres, the password lives in the dreadnode-postgresql
Secret. If you deleted and recreated the Secret without deleting the PVC, the password
on disk no longer matches. Delete the PVC and let both regenerate together.
ValidationError or missing required env — A required environment variable is
missing or malformed. The API validates its config with Pydantic on startup. The error
message names the exact field. Check the ConfigMap and Secrets for the API pod.
API pod: main container crash
Section titled “API pod: main container crash”If the init container succeeds but the main container crashes:
kubectl -n <namespace> logs deploy/dreadnode-apiLook for Python tracebacks. The most common cause is a config value that passes validation but fails at runtime — a ClickHouse host that resolves but rejects connections, an S3 endpoint that times out, etc.
StatefulSet pods (PostgreSQL, ClickHouse, MinIO)
Section titled “StatefulSet pods (PostgreSQL, ClickHouse, MinIO)”kubectl -n <namespace> logs sts/dreadnode-postgresqlkubectl -n <namespace> logs sts/dreadnode-clickhousekubectl -n <namespace> logs sts/dreadnode-minioIf a stateful pod crashes after a reinstall, the most likely cause is a password mismatch: the Secret was regenerated but the PVC still holds data encrypted with the old password. Delete both the PVC and the Secret, then let the chart recreate them:
kubectl -n <namespace> delete pvc data-dreadnode-postgresql-0kubectl -n <namespace> delete secret dreadnode-postgresql# Then: helm upgrade (or redeploy via Admin Console)Pods in ImagePullBackOff
Section titled “Pods in ImagePullBackOff”The container runtime can’t pull the image.
kubectl -n <namespace> describe pod <pod-name>“unauthorized” or “authentication required” — The Replicated pull secret is missing
or invalid. Check that the enterprise-pull-secret Secret exists in the namespace:
kubectl -n <namespace> get secret enterprise-pull-secretIf missing, the license may not have been applied correctly. For Helm CLI installs,
verify you logged in to the registry (helm registry login registry.replicated.com).
For Embedded Cluster / KOTS, the license is injected automatically — check the Admin
Console for license status.
“manifest unknown” or “not found” — The image tag doesn’t exist in the registry. This usually means the chart version and the published images are out of sync. Verify you’re installing a version that was promoted to your channel.
UI loads but API calls fail
Section titled “UI loads but API calls fail”You can see the Dreadnode login page, but interactions fail (login doesn’t work, pages
show errors, network tab shows 404 or 502 on /api/* requests).
Check ingress routing. The frontend and API share a single hostname
(<your-domain>). The ingress must route /api/* to the API service and / to the
frontend service. If you see 404s on /api/*, the ingress isn’t routing correctly.
kubectl -n <namespace> get ingressVerify the API ingress has the correct host and paths configured.
Check the API pod is Ready. If the API pod isn’t passing health checks, the ingress controller won’t route traffic to it:
kubectl -n <namespace> get pods -l app.kubernetes.io/name=dreadnode-apiLogin fails silently
Section titled “Login fails silently”You enter credentials, the page reloads, but you’re not logged in. No error message.
Scheme mismatch. This is almost always caused by global.scheme being set to
https while you’re connecting over plain HTTP. The API sets Secure on authentication
cookies when scheme is https. Browsers silently refuse to store Secure cookies over
HTTP connections.
Fix: either connect over HTTPS, or set global.scheme: http and redeploy.
CORS mismatch. If you’re accessing the platform on a URL that doesn’t match
global.domain (e.g., via IP address or a different hostname), the browser blocks
cross-origin cookie writes. Access the platform on the exact domain you configured.
Signup says “invite required” on a fresh install
Section titled “Signup says “invite required” on a fresh install”A previous install left PostgreSQL data behind. The platform sees existing users and enforces invite-only signups. If this is supposed to be a fresh install, delete the PostgreSQL PVC and redeploy:
kubectl -n <namespace> delete pvc data-dreadnode-postgresql-0kubectl -n <namespace> delete secret dreadnode-postgresqlTLS issues
Section titled “TLS issues”Browser shows certificate warning
Section titled “Browser shows certificate warning”The TLS Secret exists but the certificate doesn’t cover the hostname you’re visiting.
The cert must cover both <your-domain> and storage.<your-domain>. Check the
certificate’s SANs:
kubectl -n <namespace> get secret dreadnode-tls -o jsonpath='{.data.tls\.crt}' \ | base64 -d | openssl x509 -noout -text | grep -A1 "Subject Alternative Name"Ingress not terminating TLS
Section titled “Ingress not terminating TLS”Verify the TLS Secret is in the correct namespace and the ingress references it:
kubectl -n <namespace> get ingress -o yaml | grep -A3 tlsIf the ingress shows no TLS block, check that global.tls.secretName is set in your
values overlay and you redeployed after setting it.
TLS terminates upstream (load balancer, service mesh)
Section titled “TLS terminates upstream (load balancer, service mesh)”If a cloud load balancer or service mesh handles TLS before traffic reaches the cluster,
set global.scheme: https and global.tls.skipCheck: true. This tells the chart to
emit https:// URLs without requiring a TLS Secret in the namespace.
S3 / MinIO issues
Section titled “S3 / MinIO issues”Presigned URL errors
Section titled “Presigned URL errors”The platform generates presigned S3 URLs for file downloads. If these fail, check that
storage.<your-domain> resolves and is reachable from the user’s browser — presigned
URLs point at the external S3 endpoint, not the internal one.
For in-cluster MinIO, verify the MinIO ingress exists and routes correctly:
kubectl -n <namespace> get ingress dreadnode-minio“Access Denied” or “NoSuchBucket”
Section titled ““Access Denied” or “NoSuchBucket””The API creates buckets (python-packages, org-data, user-data-logs) on startup.
If the MinIO pod was unhealthy when the API started, the buckets may not exist. Restart
the API pod after MinIO is Ready:
kubectl -n <namespace> rollout restart deploy/dreadnode-apiSupport bundles
Section titled “Support bundles”Support bundles collect logs, cluster state, and diagnostic information into a single archive you can share with us for debugging.
From the Admin Console (Embedded Cluster / KOTS): Go to Troubleshoot and click Generate a support bundle.
From the CLI (Helm installs):
kubectl support-bundle --load-cluster-specs -n <namespace>This requires the troubleshoot kubectl plugin.
The bundle spec is baked into the chart as a Secret with the
troubleshoot.sh/kind: support-bundle label — the plugin discovers it automatically.
The bundle includes pod logs (up to 720 hours, 10,000 lines per pod), Helm release history, cluster resource state, and reachability probes for in-cluster data stores. Credentials are automatically redacted.