Fix: Elasticsearch Cluster Health Red Status
Quick Answer
Fix Elasticsearch cluster health red status by resolving unassigned shards, disk watermark issues, node failures, and shard allocation problems.
The Error
You check your Elasticsearch cluster health and see:
curl -X GET "localhost:9200/_cluster/health?pretty"{
"cluster_name": "my-cluster",
"status": "red",
"number_of_nodes": 3,
"active_primary_shards": 45,
"unassigned_shards": 10
}A red status means one or more primary shards are not allocated. Data in those shards is unavailable for search and indexing. This is the most critical cluster health state.
Why This Happens
Elasticsearch distributes data across shards, which live on nodes in the cluster. When a node goes down, crashes, or runs out of disk space, the shards it hosted become unassigned. If those are primary shards (not replicas), the cluster turns red because data is genuinely missing from the cluster.
Common triggers include disk space exhaustion triggering the watermark threshold, node crashes due to JVM heap issues, network partitions causing split-brain scenarios, or corrupted shard data that Elasticsearch can’t recover.
Fix 1: Identify Unassigned Shards
First, find out which shards are unassigned and why:
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state"This shows each unassigned shard with its reason code. Common reasons:
- NODE_LEFT — The node hosting the shard left the cluster
- ALLOCATION_FAILED — Elasticsearch tried to allocate but failed
- CLUSTER_RECOVERED — Shard from a previous cluster state
- INDEX_CREATED — New index, shards not yet assigned
For detailed allocation explanation:
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"This tells you exactly why Elasticsearch can’t allocate a specific shard and what you need to fix.
Fix 2: Reroute Unassigned Shards Manually
If shards are stuck as unassigned, you can force allocation:
curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_stale_primary": {
"index": "my-index",
"shard": 0,
"node": "node-1",
"accept_data_loss": true
}
}
]
}'Warning: allocate_stale_primary with accept_data_loss: true may result in data loss if the shard data on that node is outdated. Use this only when the original node is permanently gone.
For replica shards, use allocate_replica instead:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "my-index",
"shard": 0,
"node": "node-2"
}
}
]
}'Fix 3: Fix Disk Watermark Issues
Elasticsearch stops allocating shards when disk usage exceeds thresholds:
- Low watermark (85%): No new shards allocated to this node
- High watermark (90%): Elasticsearch starts moving shards off this node
- Flood stage (95%): Indices become read-only
Check disk usage:
curl -X GET "localhost:9200/_cat/nodes?v&h=name,disk.used_percent,disk.avail"Free up disk space:
# Delete old indices
curl -X DELETE "localhost:9200/logs-2024-01-*"
# Force merge to reduce segment count
curl -X POST "localhost:9200/my-index/_forcemerge?max_num_segments=1"
# Clear the fielddata cache
curl -X POST "localhost:9200/_cache/clear"If indices are stuck in read-only mode after the flood stage, unlock them:
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
"index.blocks.read_only_allow_delete": null
}'Pro Tip: Set up disk monitoring alerts before you hit watermarks. The default thresholds are conservative — adjust them if your nodes have large disks where 85% still leaves hundreds of GB free.
Fix 4: Recover from Node Failures
If a node crashed or was shut down, restart it:
sudo systemctl start elasticsearchCheck the node’s logs for the crash reason:
tail -100 /var/log/elasticsearch/my-cluster.logIf the node can’t rejoin, verify:
- Cluster name matches in
elasticsearch.yml - Discovery settings point to the correct seed nodes
- Network binding allows communication between nodes
# elasticsearch.yml
cluster.name: my-cluster
node.name: node-1
network.host: 0.0.0.0
discovery.seed_hosts: ["node-1:9300", "node-2:9300", "node-3:9300"]After the node rejoins, shard recovery starts automatically. Monitor progress:
curl -X GET "localhost:9200/_cat/recovery?v&active_only=true"Fix 5: Prevent Split-Brain
Split-brain occurs when nodes can’t communicate and form separate clusters, each believing it’s the primary. This causes data inconsistency and red status when the clusters reconnect.
Configure minimum master nodes properly. In Elasticsearch 7+, this is handled automatically with the initial master nodes setting:
# elasticsearch.yml
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]For a 3-node cluster, Elasticsearch requires a quorum of 2 master-eligible nodes to elect a master. Never run a production cluster with only 2 master-eligible nodes — a single node failure loses quorum.
Common Mistake: Setting
discovery.zen.minimum_master_nodesin Elasticsearch 7+ has no effect. This setting was removed. The cluster auto-configures quorum based oncluster.initial_master_nodes.
Fix 6: Tune JVM Heap Settings
Insufficient JVM heap causes garbage collection pauses that make nodes appear to leave the cluster:
# Check current heap usage
curl -X GET "localhost:9200/_cat/nodes?v&h=name,heap.percent,heap.max"Set heap size in jvm.options:
-Xms4g
-Xmx4gRules for heap sizing:
- Set
-Xmsand-Xmxto the same value to avoid resizing pauses - Never exceed 50% of available RAM — the other 50% is for the filesystem cache
- Never exceed ~30GB — beyond this, the JVM can’t use compressed object pointers
- For nodes with 64GB RAM, use
-Xms31g -Xmx31g
Check for GC issues in the logs:
grep "GC overhead" /var/log/elasticsearch/my-cluster.log
grep "breaker" /var/log/elasticsearch/my-cluster.logFix 7: Adjust Replica Configuration
If you have a single-node cluster with replicas configured, the cluster stays yellow or red because replicas can’t be assigned to the same node as the primary:
# Check index settings
curl -X GET "localhost:9200/my-index/_settings?pretty" | grep number_of_replicasFor single-node clusters, set replicas to 0:
curl -X PUT "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 0
}
}'For all future indices, set a default template:
curl -X PUT "localhost:9200/_template/default" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["*"],
"settings": {
"number_of_replicas": 0
}
}'For multi-node clusters, ensure you have enough nodes to host all replicas. The formula: you need at least number_of_replicas + 1 nodes.
Fix 8: Restore from Snapshot
If shard data is corrupted and can’t be recovered, restore from a snapshot:
# List available snapshots
curl -X GET "localhost:9200/_snapshot/my-backup/_all?pretty"
# Close the index before restoring
curl -X POST "localhost:9200/my-index/_close"
# Restore specific index
curl -X POST "localhost:9200/_snapshot/my-backup/snapshot-2024-03-10/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "my-index",
"ignore_unavailable": true
}'If you don’t have snapshots, you may need to delete the corrupted index and re-index the data from your primary data source:
# Last resort: delete and recreate
curl -X DELETE "localhost:9200/corrupted-index"Set up automated snapshots to prevent this scenario:
# Register a snapshot repository
curl -X PUT "localhost:9200/_snapshot/my-backup" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/mnt/backups/elasticsearch"
}
}'Still Not Working?
Check cluster settings overrides. Transient and persistent cluster settings override
elasticsearch.yml. Runcurl localhost:9200/_cluster/settings?prettyto see active overrides.Look for shard allocation filters. Settings like
index.routing.allocation.exclude._namecan prevent shards from being assigned. Check withcurl localhost:9200/my-index/_settings?pretty.Verify network connectivity between nodes. Test with
curl node-2:9200from each node. Elasticsearch uses port 9200 for HTTP and 9300 for inter-node communication.Check for pending cluster tasks. Run
curl localhost:9200/_cluster/pending_tasks?pretty. A large queue indicates the master node is overwhelmed.Monitor with
_catAPIs. Use_cat/nodes,_cat/indices,_cat/shards, and_cat/allocationfor quick cluster overview without parsing JSON.Consider increasing
cluster.routing.allocation.node_concurrent_recoveriesfrom the default of 2 if recovery is too slow on a large cluster with fast disks.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: MongoDB E11000 duplicate key error collection
How to fix the MongoDB E11000 duplicate key error by identifying duplicate fields, fixing index conflicts, using upserts, handling null values, and resolving race conditions.
Fix: SQLite Database Is Locked Error
How to fix the SQLite 'database is locked' error caused by concurrent writes, long-running transactions, missing WAL mode, busy timeout, and unclosed connections.
Fix: MySQL ERROR 1205: Lock wait timeout exceeded
How to fix MySQL ERROR 1205 Lock wait timeout exceeded caused by long-running transactions, row-level locks, missing indexes, deadlocks, and InnoDB lock contention.
Fix: PostgreSQL permission denied for table (or relation, schema, sequence)
How to fix the PostgreSQL error 'permission denied for table' by granting privileges, fixing default permissions, resolving schema and ownership issues, RLS policies, and role inheritance.