Meaning#
The Elasticsearch cluster health is YELLOW: all primary shards are assigned, but one or more replica shards are unassigned. Data is fully available, but redundancy is reduced. This is a cluster-wide platform alert and carries no environment label, only cluster.
Fires when:
elasticsearch_cluster_health_status{color="yellow"} == 1for: 30m, severity ticket, tier platform.
Impact#
No outage. Reads and writes continue to work. The risk is reduced fault tolerance: if a node holding a primary now fails, the cluster could go RED because there is no replica to promote. Performance for read-heavy indices may also drop while replicas are missing.
Diagnosis#
ES REST API (key in Vault kv/global/elasticsearch, exposed as ES_URL/ES_API_KEY):
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cluster/health?pretty"
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cluster/health?level=indices&pretty"
# Unassigned replicas (prirep column = r)
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cat/shards?v" | grep -i UNASSIGNED
# Why a given replica is unassigned
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cluster/allocation/explain?pretty"
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cat/nodes?v&h=name,heap.percent,disk.used_percent"
curl -s -H "Authorization: ApiKey $ES_API_KEY" "$ES_URL/_cat/indices?v"Cluster / operator side (kubectl config use-context hetzner):
kubectl get elasticsearch -n elastic
kubectl get pods -n elastic -l common.k8s.elastic.co/type=elasticsearch
kubectl get pvc -n elasticPromQL:
elasticsearch_cluster_health_status{color="yellow"} == 1
elasticsearch_cluster_health_unassigned_shards
elasticsearch_cluster_health_number_of_nodesMitigation#
- YELLOW is usually transient and self-heals as ES re-allocates replicas after a node restart or rolling update. If a node recently restarted, give it a few minutes and re-check
_cluster/health. - Confirm the expected number of data nodes are present (
_cat/nodes,elasticsearch_cluster_health_number_of_nodes). A single-node nodeSet cannot allocate replicas at all — in that case YELLOW is expected and the replica count should be 0 for those indices. - If replicas stay unassigned, run
_cluster/allocation/explain. Common causes: disk past the high watermark (ES will not place new shards on a full node — see ElasticsearchDiskWatermark), orindex.routing.allocationrules preventing placement. - Free disk or expand PVCs if a node is over the high watermark; replicas allocate automatically once it drops below.
- If an index requests more replicas than there are data nodes, lower the replica count or scale the
nodeSetsin theElasticsearchCR. - Review ILM if oversized/old indices are consuming the disk that blocks replica placement.
References#
- ECK operator docs: https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html
- Cluster health API: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html
- Fix a red or yellow cluster: https://www.elastic.co/guide/en/elasticsearch/reference/current/red-yellow-cluster-status.html
- Allocation explain API: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-allocation-explain.html