Meaning#

A mysql-data-* PersistentVolumeClaim is running low on free space. If it fills completely, mysqld will fail writes and may refuse to start.

Fires when:

min by (persistentvolumeclaim) (
  kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"mysql-data-.*"}
  / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=~"mysql-data-.*"}
) < (1 - <ratio>)

for: 15m, severity ticket, tier component.

Impact#

  • A full data volume causes write errors and can crash the instance (MysqlInstanceDown).
  • Common culprits: accumulated binary logs, an oversized dataset, or relay logs/temp files on a lagging replica.

Diagnosis#

kubectl config use-context hetzner
kubectl get mysqlcluster -n safetywing-<env>-infra
kubectl get pvc -n safetywing-<env>-infra

# Inspect on-disk usage from inside the mysqld container
kubectl exec -n safetywing-<env>-infra <pod> -c mysqld -- df -h /var/lib/mysql
kubectl exec -n safetywing-<env>-infra <pod> -c mysqld -- \
  sh -c 'du -sh /var/lib/mysql/* | sort -h | tail -20'

# Binary log inventory
kubectl moco mysql -n safetywing-<env>-infra -u moco-admin <cluster> -- \
  -e "SHOW BINARY LOGS;"
# Largest tables
kubectl moco mysql -n safetywing-<env>-infra -u moco-admin <cluster> -- \
  -e "SELECT table_schema, table_name, ROUND((data_length+index_length)/1024/1024) AS mb FROM information_schema.tables ORDER BY mb DESC LIMIT 20;"
# Fraction free per PVC
min by (persistentvolumeclaim) (
  kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"mysql-data-.*"}
  / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=~"mysql-data-.*"}
)

Mitigation#

  1. Expand the PVC (preferred — the StorageClass supports volume expansion). Increase the volume request in the MySQLCluster volumeClaimTemplates; MOCO/Kubernetes resizes the PVC online:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: mysql-data
          spec:
            resources:
              requests:
                storage: <larger-size>
    Apply via GitOps, then confirm with kubectl get pvc -n safetywing-<env>-infra.
  2. Prune binary logs if they dominate usage (only purge logs already applied by all replicas):
    PURGE BINARY LOGS BEFORE NOW() - INTERVAL 1 DAY;
    For a durable fix, tune binlog_expire_logs_seconds in the cluster MySQL config.
  3. Replica behind: a lagging replica accumulates relay logs — clearing the lag (MysqlReplicationLagHigh) lets them be purged.
  4. Reclaim table space: drop unused data or run OPTIMIZE TABLE on bloated tables (note: requires temporary extra space, so resize first if very full).

References#