Skip to content

Fix: Elasticsearch Cluster Health Red Status

FixDevs · (Updated: )

Part of:  Database Errors

Quick Answer

Fix Elasticsearch cluster health red status by resolving unassigned shards, disk watermark issues, node failures, and shard allocation problems.

The Error

You check your Elasticsearch cluster health and see:

curl -X GET "localhost:9200/_cluster/health?pretty"
{
  "cluster_name": "my-cluster",
  "status": "red",
  "number_of_nodes": 3,
  "active_primary_shards": 45,
  "unassigned_shards": 10
}

A red status means one or more primary shards are not allocated. Data in those shards is unavailable for search and indexing. This is the most critical cluster health state.

Why This Happens

Elasticsearch is a distributed system that stores each index as one or more primary shards plus zero or more replica shards, then assigns each shard to a node in the cluster. Cluster status is computed from shard state: green means every primary and every replica is assigned, yellow means every primary is assigned but at least one replica isn’t, and red means at least one primary is unassigned. Because a missing primary means data is unreadable and unwriteable, red is the only state that actually loses functionality — yellow is degraded but fully operational.

A primary shard becomes unassigned for a small set of reasons. The node that hosted it left the cluster (graceful shutdown, crash, network partition). The shard’s data on disk was corrupted or deleted out from under Elasticsearch. The cluster’s allocation rules (awareness, filtering, total_shards_per_node) refuse to place the shard anywhere. Or the disk watermark thresholds blocked allocation because every eligible node is too full. Each cause has a different fix, and applying the wrong fix — restarting a healthy cluster, force-allocating a stale primary, or deleting an index that could have been restored — can permanently lose data that was actually still recoverable.

The most common trigger by far is disk watermark exhaustion. Elasticsearch refuses to allocate new shards to any node above the low watermark (default 85%), actively moves shards off any node above the high watermark (90%), and marks every index read-only once any node exceeds the flood-stage watermark (95%). When all your nodes cross the low watermark at the same time — usually after a sustained ingest burst or a forgotten log index — primaries can’t be reassigned anywhere, and the cluster turns red even though no node has actually failed. Restarting Elasticsearch in that state changes nothing because the constraint is on disk, not on process state.

Diagnostic Timeline

Run this in order. Most red statuses resolve at minute 2 once you’ve identified the reason from the explain API.

  • Minute 0 — Confirm the scope. curl localhost:9200/_cluster/health?level=indices&pretty. This breaks status down per index. If only one or two indices are red, you have a localized shard problem; if every index is red, you have a cluster-wide issue (disk, master election, network partition).
  • Minute 1 — List unassigned shards with their reasons. curl 'localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state'. Look at the unassigned.reason column: NODE_LEFT, ALLOCATION_FAILED, CLUSTER_RECOVERED, DANGLING_INDEX_IMPORTED, INDEX_CREATED. Each maps to a different fix.
  • Minute 2 — Ask Elasticsearch why. curl 'localhost:9200/_cluster/allocation/explain?pretty'. This is the most important diagnostic command in the entire system. It picks one unassigned shard and explains in plain text which nodes were considered, which constraints rejected each one, and what would have to change for allocation to succeed.
  • Minute 3 — Check disk usage per node. curl 'localhost:9200/_cat/allocation?v'. If disk.percent is above 85 on every node, the watermark is your problem and no other fix will work until you free space or raise the threshold.
  • Minute 4 — Check master election. curl 'localhost:9200/_cat/master?v'. If the master is unstable (changing rapidly) or absent, cluster state can’t update and shard allocation stalls. Check pending_tasks for a long queue: curl 'localhost:9200/_cluster/pending_tasks?pretty'.
  • Minute 5 — Look at recent node leaves. curl 'localhost:9200/_cat/nodes?v' shows current members. Compare against your expected count. A node that left during a deploy and didn’t rejoin is the most common trigger for NODE_LEFT reasons.
  • Minute 6 — If a primary is truly lost, decide on data loss. allocate_stale_primary with accept_data_loss: true brings the shard back from an older copy. This is irreversible and the data between the stale copy and the most recent write is gone. Take a snapshot of what remains before running it.

Fix 1: Identify Unassigned Shards

First, find out which shards are unassigned and why:

curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state"

This shows each unassigned shard with its reason code. Common reasons:

  • NODE_LEFT — The node hosting the shard left the cluster
  • ALLOCATION_FAILED — Elasticsearch tried to allocate but failed
  • CLUSTER_RECOVERED — Shard from a previous cluster state
  • INDEX_CREATED — New index, shards not yet assigned

For detailed allocation explanation:

curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

This tells you exactly why Elasticsearch can’t allocate a specific shard and what you need to fix.

Fix 2: Reroute Unassigned Shards Manually

If shards are stuck as unassigned, you can force allocation:

curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "my-index",
        "shard": 0,
        "node": "node-1",
        "accept_data_loss": true
      }
    }
  ]
}'

Warning: allocate_stale_primary with accept_data_loss: true may result in data loss if the shard data on that node is outdated. Use this only when the original node is permanently gone.

For replica shards, use allocate_replica instead:

curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
  "commands": [
    {
      "allocate_replica": {
        "index": "my-index",
        "shard": 0,
        "node": "node-2"
      }
    }
  ]
}'

Fix 3: Fix Disk Watermark Issues

Elasticsearch stops allocating shards when disk usage exceeds thresholds:

  • Low watermark (85%): No new shards allocated to this node
  • High watermark (90%): Elasticsearch starts moving shards off this node
  • Flood stage (95%): Indices become read-only

Check disk usage:

curl -X GET "localhost:9200/_cat/nodes?v&h=name,disk.used_percent,disk.avail"

Free up disk space:

# Delete old indices
curl -X DELETE "localhost:9200/logs-2024-01-*"

# Force merge to reduce segment count
curl -X POST "localhost:9200/my-index/_forcemerge?max_num_segments=1"

# Clear the fielddata cache
curl -X POST "localhost:9200/_cache/clear"

If indices are stuck in read-only mode after the flood stage, unlock them:

curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
  "index.blocks.read_only_allow_delete": null
}'

Pro Tip: Set up disk monitoring alerts before you hit watermarks. The default thresholds are conservative — adjust them if your nodes have large disks where 85% still leaves hundreds of GB free.

Fix 4: Recover from Node Failures

If a node crashed or was shut down, restart it:

sudo systemctl start elasticsearch

Check the node’s logs for the crash reason:

tail -100 /var/log/elasticsearch/my-cluster.log

If the node can’t rejoin, verify:

  • Cluster name matches in elasticsearch.yml
  • Discovery settings point to the correct seed nodes
  • Network binding allows communication between nodes
# elasticsearch.yml
cluster.name: my-cluster
node.name: node-1
network.host: 0.0.0.0
discovery.seed_hosts: ["node-1:9300", "node-2:9300", "node-3:9300"]

After the node rejoins, shard recovery starts automatically. Monitor progress:

curl -X GET "localhost:9200/_cat/recovery?v&active_only=true"

Fix 5: Prevent Split-Brain

Split-brain occurs when nodes can’t communicate and form separate clusters, each believing it’s the primary. This causes data inconsistency and red status when the clusters reconnect.

Configure minimum master nodes properly. In Elasticsearch 7+, this is handled automatically with the initial master nodes setting:

# elasticsearch.yml
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

For a 3-node cluster, Elasticsearch requires a quorum of 2 master-eligible nodes to elect a master. Never run a production cluster with only 2 master-eligible nodes — a single node failure loses quorum.

Common Mistake: Setting discovery.zen.minimum_master_nodes in Elasticsearch 7+ has no effect. This setting was removed. The cluster auto-configures quorum based on cluster.initial_master_nodes.

Fix 6: Tune JVM Heap Settings

Insufficient JVM heap causes garbage collection pauses that make nodes appear to leave the cluster:

# Check current heap usage
curl -X GET "localhost:9200/_cat/nodes?v&h=name,heap.percent,heap.max"

Set heap size in jvm.options:

-Xms4g
-Xmx4g

Rules for heap sizing:

  • Set -Xms and -Xmx to the same value to avoid resizing pauses
  • Never exceed 50% of available RAM — the other 50% is for the filesystem cache
  • Never exceed ~30GB — beyond this, the JVM can’t use compressed object pointers
  • For nodes with 64GB RAM, use -Xms31g -Xmx31g

Check for GC issues in the logs:

grep "GC overhead" /var/log/elasticsearch/my-cluster.log
grep "breaker" /var/log/elasticsearch/my-cluster.log

Fix 7: Adjust Replica Configuration

If you have a single-node cluster with replicas configured, the cluster stays yellow or red because replicas can’t be assigned to the same node as the primary:

# Check index settings
curl -X GET "localhost:9200/my-index/_settings?pretty" | grep number_of_replicas

For single-node clusters, set replicas to 0:

curl -X PUT "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "number_of_replicas": 0
  }
}'

For all future indices, set a default template:

curl -X PUT "localhost:9200/_template/default" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["*"],
  "settings": {
    "number_of_replicas": 0
  }
}'

For multi-node clusters, ensure you have enough nodes to host all replicas. The formula: you need at least number_of_replicas + 1 nodes.

Fix 8: Restore from Snapshot

If shard data is corrupted and can’t be recovered, restore from a snapshot:

# List available snapshots
curl -X GET "localhost:9200/_snapshot/my-backup/_all?pretty"

# Close the index before restoring
curl -X POST "localhost:9200/my-index/_close"

# Restore specific index
curl -X POST "localhost:9200/_snapshot/my-backup/snapshot-2024-03-10/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "my-index",
  "ignore_unavailable": true
}'

If you don’t have snapshots, you may need to delete the corrupted index and re-index the data from your primary data source:

# Last resort: delete and recreate
curl -X DELETE "localhost:9200/corrupted-index"

Set up automated snapshots to prevent this scenario:

# Register a snapshot repository
curl -X PUT "localhost:9200/_snapshot/my-backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/mnt/backups/elasticsearch"
  }
}'

Still Not Working?

  • Check cluster settings overrides. Transient and persistent cluster settings override elasticsearch.yml. Run curl localhost:9200/_cluster/settings?pretty to see active overrides. A forgotten cluster.routing.allocation.enable: none set during a maintenance window is a frequent culprit — it disables all shard allocation cluster-wide.

  • Look for shard allocation filters. Settings like index.routing.allocation.exclude._name can prevent shards from being assigned. Check with curl localhost:9200/my-index/_settings?pretty. Awareness attributes (cluster.routing.allocation.awareness.attributes) can also exclude every node if your nodes don’t have the required attribute set.

  • Verify network connectivity between nodes. Test with curl node-2:9200 from each node. Elasticsearch uses port 9200 for HTTP and 9300 for inter-node communication. A firewall rule that blocks 9300 lets nodes appear to start but prevents cluster formation.

  • Check for pending cluster tasks. Run curl localhost:9200/_cluster/pending_tasks?pretty. A large queue indicates the master node is overwhelmed — usually from index creation storms or mapping updates.

  • Monitor with _cat APIs. Use _cat/nodes, _cat/indices, _cat/shards, and _cat/allocation for quick cluster overview without parsing JSON.

  • Consider increasing cluster.routing.allocation.node_concurrent_recoveries from the default of 2 if recovery is too slow on a large cluster with fast disks.

  • Inspect for Java heap exhaustion. A node that’s hitting circuit breakers or running constant full GCs will be marked as left by the master even though the process is still running. Look for OutOfMemoryError, gc overhead, or breaker tripped in /var/log/elasticsearch/<cluster>.log. Raising -Xmx past 31GB makes things worse, not better, because compressed oops stop working.

  • Check index.number_of_shards against your node count. An index created with 50 primary shards on a 3-node cluster has 17+ shards per node, plus replicas. Hitting cluster.max_shards_per_node (default 1000) refuses allocation of new primaries on an otherwise healthy cluster. Lower the shard count for new indices or raise the limit explicitly.

  • Look at _cat/recovery for stuck recoveries. A primary that’s been recovering for hours at the same percentage usually means the source node is rate-limited or a peer recovery is starved for bandwidth. indices.recovery.max_bytes_per_sec defaults to 40MB/s and can be raised temporarily during incident response.

  • Verify the data directory hasn’t moved. A node that comes back with a different path.data finds an empty directory, registers as a fresh node with no shards, and the master treats the original shards as permanently lost. Confirm path.data matches across restarts before declaring data loss.

  • Watch for stale dangling index imports. When a node rejoins with index metadata the rest of the cluster doesn’t know about, Elasticsearch may import it as a dangling index. If auto-import is disabled (the default in newer versions), those indices appear unassigned until you accept or delete them via the dangling indices API.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles