Meaning#

A MySQL instance’s mysqld_exporter sidecar reports the server unreachable, so the mysqld process is down or not accepting connections.

Fires when:

max by (pod) (mysql_up{namespace="safetywing-<env>-infra"}) == 0

for: 5m, severity page, tier component.

Impact#

  • The affected instance serves no queries.
  • If the primary is down, MOCO must fail over before writes can resume; expect a short write outage.
  • If a replica is down, read capacity and replication redundancy are reduced.

Diagnosis#

kubectl config use-context hetzner

# Cluster + member roles (which pod is primary vs replica)
kubectl get mysqlcluster -n safetywing-<env>-infra
kubectl moco status -n safetywing-<env>-infra <cluster>

# Pod state and recent events (OOMKill, evictions, probe failures)
kubectl get pods -n safetywing-<env>-infra -l app.kubernetes.io/name=mysql -o wide
kubectl describe pod -n safetywing-<env>-infra <pod>

# Container logs — mysqld, the MOCO agent, and the exporter
kubectl logs -n safetywing-<env>-infra <pod> -c mysqld --tail=200
kubectl logs -n safetywing-<env>-infra <pod> -c agent --tail=200
kubectl logs -n safetywing-<env>-infra <pod> -c mysqld-exporter --tail=100
# Confirm which pod(s) are down
max by (pod) (mysql_up{namespace="safetywing-<env>-infra"})

Mitigation#

  1. Check pod events for the root cause: OOMKilled (raise memory limits in the MySQLCluster .spec.podTemplate), node pressure/eviction, or failed PVC mount.
  2. If disk is full, mysqld will refuse to start — see MysqlDiskFillingUp and expand the PVC first.
  3. If the process crashed but the pod is healthy, restart it:
    kubectl delete pod -n safetywing-<env>-infra <pod>
    MOCO recreates the pod; verify it rejoins via kubectl moco status.
  4. If the primary is down and not recovering, let MOCO fail over to a healthy replica; confirm a new primary was elected in kubectl moco status. Investigate the old primary before reintroducing it.
  5. If MOCO cannot reconcile, inspect the operator:
    kubectl logs -n moco-system deploy/moco-controller-manager --tail=200

References#