Fix: Elasticsearch Cluster Health Red Status
Part of: Database Errors
Quick Answer
Fix Elasticsearch cluster health red status by resolving unassigned shards, disk watermark issues, node failures, and shard allocation problems.
The Error
You check your Elasticsearch cluster health and see:
curl -X GET "localhost:9200/_cluster/health?pretty"{
"cluster_name": "my-cluster",
"status": "red",
"number_of_nodes": 3,
"active_primary_shards": 45,
"unassigned_shards": 10
}A red status means one or more primary shards are not allocated. Data in those shards is unavailable for search and indexing. This is the most critical cluster health state.
Why This Happens
Elasticsearch is a distributed system that stores each index as one or more primary shards plus zero or more replica shards, then assigns each shard to a node in the cluster. Cluster status is computed from shard state: green means every primary and every replica is assigned, yellow means every primary is assigned but at least one replica isn’t, and red means at least one primary is unassigned. Because a missing primary means data is unreadable and unwriteable, red is the only state that actually loses functionality — yellow is degraded but fully operational.
A primary shard becomes unassigned for a small set of reasons. The node that hosted it left the cluster (graceful shutdown, crash, network partition). The shard’s data on disk was corrupted or deleted out from under Elasticsearch. The cluster’s allocation rules (awareness, filtering, total_shards_per_node) refuse to place the shard anywhere. Or the disk watermark thresholds blocked allocation because every eligible node is too full. Each cause has a different fix, and applying the wrong fix — restarting a healthy cluster, force-allocating a stale primary, or deleting an index that could have been restored — can permanently lose data that was actually still recoverable.
The most common trigger by far is disk watermark exhaustion. Elasticsearch refuses to allocate new shards to any node above the low watermark (default 85%), actively moves shards off any node above the high watermark (90%), and marks every index read-only once any node exceeds the flood-stage watermark (95%). When all your nodes cross the low watermark at the same time — usually after a sustained ingest burst or a forgotten log index — primaries can’t be reassigned anywhere, and the cluster turns red even though no node has actually failed. Restarting Elasticsearch in that state changes nothing because the constraint is on disk, not on process state.
Diagnostic Timeline
Run this in order. Most red statuses resolve at minute 2 once you’ve identified the reason from the explain API.
- Minute 0 — Confirm the scope.
curl localhost:9200/_cluster/health?level=indices&pretty. This breaks status down per index. If only one or two indices are red, you have a localized shard problem; if every index is red, you have a cluster-wide issue (disk, master election, network partition). - Minute 1 — List unassigned shards with their reasons.
curl 'localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state'. Look at theunassigned.reasoncolumn:NODE_LEFT,ALLOCATION_FAILED,CLUSTER_RECOVERED,DANGLING_INDEX_IMPORTED,INDEX_CREATED. Each maps to a different fix. - Minute 2 — Ask Elasticsearch why.
curl 'localhost:9200/_cluster/allocation/explain?pretty'. This is the most important diagnostic command in the entire system. It picks one unassigned shard and explains in plain text which nodes were considered, which constraints rejected each one, and what would have to change for allocation to succeed. - Minute 3 — Check disk usage per node.
curl 'localhost:9200/_cat/allocation?v'. Ifdisk.percentis above 85 on every node, the watermark is your problem and no other fix will work until you free space or raise the threshold. - Minute 4 — Check master election.
curl 'localhost:9200/_cat/master?v'. If the master is unstable (changing rapidly) or absent, cluster state can’t update and shard allocation stalls. Checkpending_tasksfor a long queue:curl 'localhost:9200/_cluster/pending_tasks?pretty'. - Minute 5 — Look at recent node leaves.
curl 'localhost:9200/_cat/nodes?v'shows current members. Compare against your expected count. A node that left during a deploy and didn’t rejoin is the most common trigger forNODE_LEFTreasons. - Minute 6 — If a primary is truly lost, decide on data loss.
allocate_stale_primarywithaccept_data_loss: truebrings the shard back from an older copy. This is irreversible and the data between the stale copy and the most recent write is gone. Take a snapshot of what remains before running it.
Fix 1: Identify Unassigned Shards
First, find out which shards are unassigned and why:
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state"This shows each unassigned shard with its reason code. Common reasons:
- NODE_LEFT — The node hosting the shard left the cluster
- ALLOCATION_FAILED — Elasticsearch tried to allocate but failed
- CLUSTER_RECOVERED — Shard from a previous cluster state
- INDEX_CREATED — New index, shards not yet assigned
For detailed allocation explanation:
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"This tells you exactly why Elasticsearch can’t allocate a specific shard and what you need to fix.
Fix 2: Reroute Unassigned Shards Manually
If shards are stuck as unassigned, you can force allocation:
curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_stale_primary": {
"index": "my-index",
"shard": 0,
"node": "node-1",
"accept_data_loss": true
}
}
]
}'Warning: allocate_stale_primary with accept_data_loss: true may result in data loss if the shard data on that node is outdated. Use this only when the original node is permanently gone.
For replica shards, use allocate_replica instead:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "my-index",
"shard": 0,
"node": "node-2"
}
}
]
}'Fix 3: Fix Disk Watermark Issues
Elasticsearch stops allocating shards when disk usage exceeds thresholds:
- Low watermark (85%): No new shards allocated to this node
- High watermark (90%): Elasticsearch starts moving shards off this node
- Flood stage (95%): Indices become read-only
Check disk usage:
curl -X GET "localhost:9200/_cat/nodes?v&h=name,disk.used_percent,disk.avail"Free up disk space:
# Delete old indices
curl -X DELETE "localhost:9200/logs-2024-01-*"
# Force merge to reduce segment count
curl -X POST "localhost:9200/my-index/_forcemerge?max_num_segments=1"
# Clear the fielddata cache
curl -X POST "localhost:9200/_cache/clear"If indices are stuck in read-only mode after the flood stage, unlock them:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
"index.blocks.read_only_allow_delete": null
}'Pro Tip: Set up disk monitoring alerts before you hit watermarks. The default thresholds are conservative — adjust them if your nodes have large disks where 85% still leaves hundreds of GB free.
Fix 4: Recover from Node Failures
If a node crashed or was shut down, restart it:
sudo systemctl start elasticsearchCheck the node’s logs for the crash reason:
tail -100 /var/log/elasticsearch/my-cluster.logIf the node can’t rejoin, verify:
- Cluster name matches in
elasticsearch.yml - Discovery settings point to the correct seed nodes
- Network binding allows communication between nodes
# elasticsearch.yml
cluster.name: my-cluster
node.name: node-1
network.host: 0.0.0.0
discovery.seed_hosts: ["node-1:9300", "node-2:9300", "node-3:9300"]After the node rejoins, shard recovery starts automatically. Monitor progress:
curl -X GET "localhost:9200/_cat/recovery?v&active_only=true"Fix 5: Prevent Split-Brain
Split-brain occurs when nodes can’t communicate and form separate clusters, each believing it’s the primary. This causes data inconsistency and red status when the clusters reconnect.
Configure minimum master nodes properly. In Elasticsearch 7+, this is handled automatically with the initial master nodes setting:
# elasticsearch.yml
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]For a 3-node cluster, Elasticsearch requires a quorum of 2 master-eligible nodes to elect a master. Never run a production cluster with only 2 master-eligible nodes — a single node failure loses quorum.
Common Mistake: Setting
discovery.zen.minimum_master_nodesin Elasticsearch 7+ has no effect. This setting was removed. The cluster auto-configures quorum based oncluster.initial_master_nodes.
Fix 6: Tune JVM Heap Settings
Insufficient JVM heap causes garbage collection pauses that make nodes appear to leave the cluster:
# Check current heap usage
curl -X GET "localhost:9200/_cat/nodes?v&h=name,heap.percent,heap.max"Set heap size in jvm.options:
-Xms4g
-Xmx4gRules for heap sizing:
- Set
-Xmsand-Xmxto the same value to avoid resizing pauses - Never exceed 50% of available RAM — the other 50% is for the filesystem cache
- Never exceed ~30GB — beyond this, the JVM can’t use compressed object pointers
- For nodes with 64GB RAM, use
-Xms31g -Xmx31g
Check for GC issues in the logs:
grep "GC overhead" /var/log/elasticsearch/my-cluster.log
grep "breaker" /var/log/elasticsearch/my-cluster.logFix 7: Adjust Replica Configuration
If you have a single-node cluster with replicas configured, the cluster stays yellow or red because replicas can’t be assigned to the same node as the primary:
# Check index settings
curl -X GET "localhost:9200/my-index/_settings?pretty" | grep number_of_replicasFor single-node clusters, set replicas to 0:
curl -X PUT "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 0
}
}'For all future indices, set a default template:
curl -X PUT "localhost:9200/_template/default" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["*"],
"settings": {
"number_of_replicas": 0
}
}'For multi-node clusters, ensure you have enough nodes to host all replicas. The formula: you need at least number_of_replicas + 1 nodes.
Fix 8: Restore from Snapshot
If shard data is corrupted and can’t be recovered, restore from a snapshot:
# List available snapshots
curl -X GET "localhost:9200/_snapshot/my-backup/_all?pretty"
# Close the index before restoring
curl -X POST "localhost:9200/my-index/_close"
# Restore specific index
curl -X POST "localhost:9200/_snapshot/my-backup/snapshot-2024-03-10/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "my-index",
"ignore_unavailable": true
}'If you don’t have snapshots, you may need to delete the corrupted index and re-index the data from your primary data source:
# Last resort: delete and recreate
curl -X DELETE "localhost:9200/corrupted-index"Set up automated snapshots to prevent this scenario:
# Register a snapshot repository
curl -X PUT "localhost:9200/_snapshot/my-backup" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/mnt/backups/elasticsearch"
}
}'Still Not Working?
Check cluster settings overrides. Transient and persistent cluster settings override
elasticsearch.yml. Runcurl localhost:9200/_cluster/settings?prettyto see active overrides. A forgottencluster.routing.allocation.enable: noneset during a maintenance window is a frequent culprit — it disables all shard allocation cluster-wide.Look for shard allocation filters. Settings like
index.routing.allocation.exclude._namecan prevent shards from being assigned. Check withcurl localhost:9200/my-index/_settings?pretty. Awareness attributes (cluster.routing.allocation.awareness.attributes) can also exclude every node if your nodes don’t have the required attribute set.Verify network connectivity between nodes. Test with
curl node-2:9200from each node. Elasticsearch uses port 9200 for HTTP and 9300 for inter-node communication. A firewall rule that blocks 9300 lets nodes appear to start but prevents cluster formation.Check for pending cluster tasks. Run
curl localhost:9200/_cluster/pending_tasks?pretty. A large queue indicates the master node is overwhelmed — usually from index creation storms or mapping updates.Monitor with
_catAPIs. Use_cat/nodes,_cat/indices,_cat/shards, and_cat/allocationfor quick cluster overview without parsing JSON.Consider increasing
cluster.routing.allocation.node_concurrent_recoveriesfrom the default of 2 if recovery is too slow on a large cluster with fast disks.Inspect for Java heap exhaustion. A node that’s hitting circuit breakers or running constant full GCs will be marked as left by the master even though the process is still running. Look for
OutOfMemoryError,gc overhead, orbreaker trippedin/var/log/elasticsearch/<cluster>.log. Raising-Xmxpast 31GB makes things worse, not better, because compressed oops stop working.Check
index.number_of_shardsagainst your node count. An index created with 50 primary shards on a 3-node cluster has 17+ shards per node, plus replicas. Hittingcluster.max_shards_per_node(default 1000) refuses allocation of new primaries on an otherwise healthy cluster. Lower the shard count for new indices or raise the limit explicitly.Look at
_cat/recoveryfor stuck recoveries. A primary that’s been recovering for hours at the same percentage usually means the source node is rate-limited or a peer recovery is starved for bandwidth.indices.recovery.max_bytes_per_secdefaults to 40MB/s and can be raised temporarily during incident response.Verify the data directory hasn’t moved. A node that comes back with a different
path.datafinds an empty directory, registers as a fresh node with no shards, and the master treats the original shards as permanently lost. Confirmpath.datamatches across restarts before declaring data loss.Watch for stale dangling index imports. When a node rejoins with index metadata the rest of the cluster doesn’t know about, Elasticsearch may import it as a dangling index. If auto-import is disabled (the default in newer versions), those indices appear unassigned until you accept or delete them via the dangling indices API.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Elasticsearch index_not_found_exception (Index Does Not Exist)
How to fix Elasticsearch index_not_found_exception errors — why index operations fail with 404, how to create indices correctly, manage index aliases, and handle missing indices in production.
Fix: Peewee Not Working — Connection Pooling, Field Errors, and Migration Setup
How to fix Peewee errors — OperationalError database is locked, connection already open, field type mismatch, model meta database missing, N+1 queries, and peewee-migrate setup.
Fix: Tortoise ORM Not Working — Model Registration, Async Init, and Relationship Errors
How to fix Tortoise ORM errors — Tortoise.init not called, no module imported model, fetch_related missing, aerich migration setup, FastAPI integration patterns, and ConfigurationError missing connection.
Fix: psycopg Not Working — psycopg2 to psycopg3 Migration, Connection Pool, and Async Errors
How to fix psycopg errors — psycopg2 to psycopg 3 import migration, connection pool setup, row factory tuple vs dict, COPY protocol changes, async psycopg pool, server-side cursor confusion, and binary mode performance.