Meaning#
A MySQL instance’s mysqld_exporter sidecar reports the server unreachable, so the mysqld process is down or not accepting connections.
Fires when:
max by (pod) (mysql_up{namespace="safetywing-<env>-infra"}) == 0for: 5m, severity page, tier component.
Impact#
- The affected instance serves no queries.
- If the primary is down, MOCO must fail over before writes can resume; expect a short write outage.
- If a replica is down, read capacity and replication redundancy are reduced.
Diagnosis#
kubectl config use-context hetzner
# Cluster + member roles (which pod is primary vs replica)
kubectl get mysqlcluster -n safetywing-<env>-infra
kubectl moco status -n safetywing-<env>-infra <cluster>
# Pod state and recent events (OOMKill, evictions, probe failures)
kubectl get pods -n safetywing-<env>-infra -l app.kubernetes.io/name=mysql -o wide
kubectl describe pod -n safetywing-<env>-infra <pod>
# Container logs — mysqld, the MOCO agent, and the exporter
kubectl logs -n safetywing-<env>-infra <pod> -c mysqld --tail=200
kubectl logs -n safetywing-<env>-infra <pod> -c agent --tail=200
kubectl logs -n safetywing-<env>-infra <pod> -c mysqld-exporter --tail=100# Confirm which pod(s) are down
max by (pod) (mysql_up{namespace="safetywing-<env>-infra"})Mitigation#
- Check pod events for the root cause: OOMKilled (raise memory limits in the
MySQLCluster.spec.podTemplate), node pressure/eviction, or failed PVC mount. - If disk is full, mysqld will refuse to start — see MysqlDiskFillingUp and expand the PVC first.
- If the process crashed but the pod is healthy, restart it:MOCO recreates the pod; verify it rejoins via
kubectl delete pod -n safetywing-<env>-infra <pod>kubectl moco status. - If the primary is down and not recovering, let MOCO fail over to a healthy replica; confirm a new primary was elected in
kubectl moco status. Investigate the old primary before reintroducing it. - If MOCO cannot reconcile, inspect the operator:
kubectl logs -n moco-system deploy/moco-controller-manager --tail=200