Skip to content

Troubleshooting

Diagnose common issues with self-hosted Dreadnode installations.

Start here when something isn’t working. Sections are organized by what you see, not what’s broken — pick the symptom that matches.

These are useful regardless of the problem. Assume dreadnode as the release name throughout — substitute yours if different.

Terminal window
# All pods for the release
kubectl -n <namespace> get pods -l app.kubernetes.io/instance=dreadnode
# Events (scheduling failures, image pull errors, probe failures)
kubectl -n <namespace> get events --sort-by='.lastTimestamp'
# API logs
kubectl -n <namespace> logs deploy/dreadnode-api
# API init container logs (migrations run here)
kubectl -n <namespace> logs deploy/dreadnode-api -c migrations
# Health check
curl http(s)://<your-domain>/api/v1/health

The pod can’t be scheduled. Check events:

Terminal window
kubectl -n <namespace> describe pod <pod-name>

“no nodes available to schedule pods” or “Insufficient cpu/memory” — Your cluster doesn’t have enough allocatable resources. The small preset totals roughly 4 vCPU and 8 Gi across all components. Free up resources or add nodes.

“pod has unbound immediate PersistentVolumeClaims” — No StorageClass can provision the requested PVC. Check that a StorageClass exists:

Terminal window
kubectl get storageclass

If empty, install a storage provisioner (local-path, EBS CSI, Rook, etc.) before deploying Dreadnode. The preflight checks catch this, but only if you ran them.

The container starts and immediately exits. Check logs for the crashing container.

The migrations init container runs alembic upgrade head before the API starts. If it fails, the pod shows Init:CrashLoopBackOff and the API never boots.

Terminal window
kubectl -n <namespace> logs deploy/dreadnode-api -c migrations

connection refused or could not translate host name — The API can’t reach PostgreSQL. If using in-cluster Postgres, check that the dreadnode-postgresql StatefulSet has a Ready pod. If using an external database, verify the host, port, and network connectivity from inside the cluster.

password authentication failed or FATAL: role "..." does not exist — Wrong credentials. For in-cluster Postgres, the password lives in the dreadnode-postgresql Secret. If you deleted and recreated the Secret without deleting the PVC, the password on disk no longer matches. Delete the PVC and let both regenerate together.

ValidationError or missing required env — A required environment variable is missing or malformed. The API validates its config with Pydantic on startup. The error message names the exact field. Check the ConfigMap and Secrets for the API pod.

If the init container succeeds but the main container crashes:

Terminal window
kubectl -n <namespace> logs deploy/dreadnode-api

Look for Python tracebacks. The most common cause is a config value that passes validation but fails at runtime — a ClickHouse host that resolves but rejects connections, an S3 endpoint that times out, etc.

StatefulSet pods (PostgreSQL, ClickHouse, MinIO)

Section titled “StatefulSet pods (PostgreSQL, ClickHouse, MinIO)”
Terminal window
kubectl -n <namespace> logs sts/dreadnode-postgresql
kubectl -n <namespace> logs sts/dreadnode-clickhouse
kubectl -n <namespace> logs sts/dreadnode-minio

If a stateful pod crashes after a reinstall, the most likely cause is a password mismatch: the Secret was regenerated but the PVC still holds data encrypted with the old password. Delete both the PVC and the Secret, then let the chart recreate them:

Terminal window
kubectl -n <namespace> delete pvc data-dreadnode-postgresql-0
kubectl -n <namespace> delete secret dreadnode-postgresql
# Then: helm upgrade (or redeploy via Admin Console)

The container runtime can’t pull the image.

Terminal window
kubectl -n <namespace> describe pod <pod-name>

“unauthorized” or “authentication required” — The Replicated pull secret is missing or invalid. Check that the enterprise-pull-secret Secret exists in the namespace:

Terminal window
kubectl -n <namespace> get secret enterprise-pull-secret

If missing, the license may not have been applied correctly. For Helm CLI installs, verify you logged in to the registry (helm registry login registry.replicated.com). For Embedded Cluster / KOTS, the license is injected automatically — check the Admin Console for license status.

“manifest unknown” or “not found” — The image tag doesn’t exist in the registry. This usually means the chart version and the published images are out of sync. Verify you’re installing a version that was promoted to your channel.

You can see the Dreadnode login page, but interactions fail (login doesn’t work, pages show errors, network tab shows 404 or 502 on /api/* requests).

Check ingress routing. The frontend and API share a single hostname (<your-domain>). The ingress must route /api/* to the API service and / to the frontend service. If you see 404s on /api/*, the ingress isn’t routing correctly.

Terminal window
kubectl -n <namespace> get ingress

Verify the API ingress has the correct host and paths configured.

Check the API pod is Ready. If the API pod isn’t passing health checks, the ingress controller won’t route traffic to it:

Terminal window
kubectl -n <namespace> get pods -l app.kubernetes.io/name=dreadnode-api

You enter credentials, the page reloads, but you’re not logged in. No error message.

Scheme mismatch. This is almost always caused by global.scheme being set to https while you’re connecting over plain HTTP. The API sets Secure on authentication cookies when scheme is https. Browsers silently refuse to store Secure cookies over HTTP connections.

Fix: either connect over HTTPS, or set global.scheme: http and redeploy.

CORS mismatch. If you’re accessing the platform on a URL that doesn’t match global.domain (e.g., via IP address or a different hostname), the browser blocks cross-origin cookie writes. Access the platform on the exact domain you configured.

Signup says “invite required” on a fresh install

Section titled “Signup says “invite required” on a fresh install”

A previous install left PostgreSQL data behind. The platform sees existing users and enforces invite-only signups. If this is supposed to be a fresh install, delete the PostgreSQL PVC and redeploy:

Terminal window
kubectl -n <namespace> delete pvc data-dreadnode-postgresql-0
kubectl -n <namespace> delete secret dreadnode-postgresql

The TLS Secret exists but the certificate doesn’t cover the hostname you’re visiting. The cert must cover both <your-domain> and storage.<your-domain>. Check the certificate’s SANs:

Terminal window
kubectl -n <namespace> get secret dreadnode-tls -o jsonpath='{.data.tls\.crt}' \
| base64 -d | openssl x509 -noout -text | grep -A1 "Subject Alternative Name"

Verify the TLS Secret is in the correct namespace and the ingress references it:

Terminal window
kubectl -n <namespace> get ingress -o yaml | grep -A3 tls

If the ingress shows no TLS block, check that global.tls.secretName is set in your values overlay and you redeployed after setting it.

TLS terminates upstream (load balancer, service mesh)

Section titled “TLS terminates upstream (load balancer, service mesh)”

If a cloud load balancer or service mesh handles TLS before traffic reaches the cluster, set global.scheme: https and global.tls.skipCheck: true. This tells the chart to emit https:// URLs without requiring a TLS Secret in the namespace.

The platform generates presigned S3 URLs for file downloads. If these fail, check that storage.<your-domain> resolves and is reachable from the user’s browser — presigned URLs point at the external S3 endpoint, not the internal one.

For in-cluster MinIO, verify the MinIO ingress exists and routes correctly:

Terminal window
kubectl -n <namespace> get ingress dreadnode-minio

The API creates buckets (python-packages, org-data, user-data-logs) on startup. If the MinIO pod was unhealthy when the API started, the buckets may not exist. Restart the API pod after MinIO is Ready:

Terminal window
kubectl -n <namespace> rollout restart deploy/dreadnode-api

Support bundles collect logs, cluster state, and diagnostic information into a single archive you can share with us for debugging.

From the Admin Console (Embedded Cluster / KOTS): Go to Troubleshoot and click Generate a support bundle.

From the CLI (Helm installs):

Terminal window
kubectl support-bundle --load-cluster-specs -n <namespace>

This requires the troubleshoot kubectl plugin. The bundle spec is baked into the chart as a Secret with the troubleshoot.sh/kind: support-bundle label — the plugin discovers it automatically.

The bundle includes pod logs (up to 720 hours, 10,000 lines per pod), Helm release history, cluster resource state, and reachability probes for in-cluster data stores. Credentials are automatically redacted.