Meaning#
Free disk space on at least one Elasticsearch data node has dropped below 15%, approaching the flood-stage watermark (default 95% used). At flood stage Elasticsearch makes indices on the affected node read-only to protect the disk. This is a cluster-wide platform alert and carries no environment label, only cluster.
Fires when:
min(elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes) < 0.15for: 15m, severity ticket, tier platform.
Impact#
As watermarks are crossed, ES stops allocating new shards to the node (high watermark, can cause YELLOW), and at flood stage applies the index.blocks.read_only_allow_delete block — writes to affected indices fail while reads continue. Logging/observability ingestion and application indices stop accepting new data until disk is freed and the block is cleared.
Diagnosis#
ES REST API (key in Vault kv/global/elasticsearch, exposed as ES_URL/ES_API_KEY):
# Per-node disk usage
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cat/nodes?v&h=name,disk.used_percent,disk.used,disk.avail,disk.total"
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cat/allocation?v"
# Largest indices
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cat/indices?v&s=store.size:desc"
# Health + any read-only blocks already applied
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cluster/health?pretty"
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_all/_settings/index.blocks.read_only_allow_delete?pretty"Cluster / operator side (kubectl config use-context hetzner):
kubectl get elasticsearch -n elastic
kubectl get pods -n elastic -l common.k8s.elastic.co/type=elasticsearch
kubectl get pvc -n elastic
kubectl exec -n elastic <es-pod> -- df -h /usr/share/elasticsearch/dataPromQL:
min(elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes)
elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytesMitigation#
Free space first. Delete or roll over old indices/data streams via ILM, or delete obsolete indices:
DELETE $ES_URL/<index>. Confirm space recovered with_cat/allocation.If data is needed but the node is full, expand the PVC: edit the
volumeClaimTemplatesstorage request in theElasticsearchCRnodeSets(the StorageClass must allow volume expansion); ECK and the CSI driver grow the PVC. Verify withkubectl get pvc -n elasticanddf -hin the pod.Add a data node to the
nodeSetsif the cluster as a whole is undersized; shards rebalance off the full node.Once disk is back below the flood-stage watermark, clear the read-only block so writes resume:
curl -s -X PUT -H "Authorization: ApiKey $ES_API_KEY" -H 'Content-Type: application/json' \ "$ES_URL/_all/_settings" \ -d '{"index.blocks.read_only_allow_delete": null}'Confirm writes work again and health returns to GREEN.
Fix the root cause: tune ILM rollover/delete policies so indices do not grow unbounded and refill the disk.
References#
- ECK operator docs / volume expansion: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-volume-claim-templates.html
- Disk-based shard allocation (watermarks): https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#disk-based-shard-allocation
- Fix watermark errors: https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-watermark-errors.html
- ILM overview: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html