Exceptional Cluster Health Status (Red and Yellow)

Elasticsearch Service

User Guide

Release Notes and Announcements

Release Notes

Product Announcements

ES API Authentication Upgrade Notice

Security Announcement

Notice for CVE-2021-22145 Vulnerability

Product Introduction

Overview

Features

Performance

Overview

4-Core 16 GB 3-Node Cluster Performance Test

8-Core 32 GB 3-Node Cluster Performance Test

Stress Test Result Comparison Between 4-Core 16 GB 3-Node Cluster and 8-Core 32 GB 3-Node Cluster

Elastic Stack (X-Pack)

Strengths

Scenarios

Capabilities and Restrictions

Related Concepts

Purchase Guide

Billing Overview

Pricing

Elasticsearch Service Serverless Pricing

Notes on Arrears

ES Kernel Enhancement

Kernel Release Notes

Targeted Routing Optimization

Compression Algorithm Optimization

FST Off-Heap Memory Optimization

Getting Started

Evaluation of Cluster Specification and Capacity Configuration

Creating Clusters

Accessing Clusters

Accessing Clusters from Client

Accessing Cluster from API

Accessing Clusters from Kibana

ES Serverless Guide

Service Overview

Basic Concepts

5-Minute Quick Experience

Quick Start

Creating Indexes

CVM Log Access

TKE Log access

Elastic MapReduce log access

TCHouse-D Cluster Log Access

Customizing Filebeat Data Access

Access Control

Writing Data

Data Query

Index Management

Configuration Management

Alarm Management

ES API References

Related Issues

Kibana Usage Issues

Third-Party Cookie Settings

Field Type Conversion Through Reindex

Data Application Guide

Data Application Overview

Data Management

Autonomous Index Overview

Creating Autonomous Index

Index Search and Analysis

Basic Index Information

Index Monitoring

Index Configuration Management

Elasticsearch Guide

Managing Clusters

Cluster Status

Restarting Clusters

Terminating Clusters

Advanced Configuration

Access Control

CAM-Based Access Control Configuration

ES Cluster

LDAP Authentication

Multi-AZ Cluster Deployment

Cluster Scaling

Adjusting Configuration

Suggestions and Principles for Cluster Specification Adjustment

Cluster Configuration

Synonym Configuration

YML File Configuration

Scenario-based Cluster Template Configuration

Plugin Configuration

Monitoring and Alarming

Viewing Monitoring Information

Configuring Alarms

Suggestions for Configuring Monitors and Alarms

Log Query

Querying Cluster Logs

Data Backup

Automatic Snapshot Backup

Using COS for Backup and Restoration

Upgrade

ES Version Upgrade Check

Upgrading ES Clusters

Practical Tutorial

Data Migration and Sync

Migrate Data

Data Ingestion into ES

Syncing MySQL Data to ES in Real Time

Use Case Construction

Building a Log Analysis System

Index Configuration

Default Index Template Description and Adjustment

Managing Indices with Curator

Hot/Warm Architecture and Index Lifecycle Management

SQL Support

Receiving Watcher Alerts via WeCom Bot

API Documentation

FAQs

Product

ES Cluster

Cluster Exceptions

Overview

Exceptional Cluster Health Status (Red and Yellow)

Cluster Circuit Breaking

Bulk Rejection/Search Rejection

High Cluster CPU Utilization

High Cluster Disk Utilization and read_only Status

Uneven Cluster Load

Service Level Agreement

Glossary

New Version Introduction

Elasticsearch Service July 2020 Release

Elasticsearch Service February 2020 Release

Elasticsearch Service December 2019 Release

DocumentationElasticsearch ServiceFAQsES ClusterCluster ExceptionsExceptional Cluster Health Status (Red and Yellow)

Exceptional Cluster Health Status (Red and Yellow)

Download PDF

Last updated: 2024-11-29 22:01:51

Exceptional Cluster Health Status (Red and Yellow)

Last updated: 2024-11-29 22:01:51

Download PDF

Why is the cluster in an exceptional status?
In the following conditions, the cluster will be in the red or yellow status:
If the cluster has any unassigned primary index shard, the cluster status will become red, which affects index reads/writes and thus requires special attention.
If all primary index shards in the cluster have been assigned, but there are still unassigned replica index shards, the cluster status will become yellow, which does not affect index reads/writes and generally can be automatically recovered.
Viewing Cluster Status
You can use Kibana Dev Tools to view the cluster status:
GET /_cluster/health
Here, you can see that the current cluster status is red, and there are nine unassigned shards.
﻿
Official descriptions of the Elasticsearch health API responses:
Metric
Description
cluster_name
Cluster name
status
Health status of the cluster, based on the state of its primary and replica shards. Statuses are: <br>– green: all shards are assigned </br>– yellow: all primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data may be unavailable until that node is repaired </br>– red: one or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned
timed_out
If false, the response is returned within the period of time that is specified by the timeout parameter (30s by default)
number_of_nodes
Number of nodes in the cluster
number_of_data_nodes
Number of nodes that are dedicated data nodes
active_primary_shards
Number of active primary shards
active_shards
Total number of active primary and replica shards
relocating_shards
Number of shards that are under relocation
initializing_shards
Number of shards that are under initialization
unassigned_shards
Number of unassigned shards
delayed_unassigned_shards
Number of shards whose assignment has been delayed by the timeout settings
number_of_pending_tasks
Number of cluster-level changes that have not yet been executed
number_of_in_flight_fetch
Number of unfinished fetches
task_max_waiting_in_queue_millis
Time expressed in milliseconds since the earliest initiated task is waiting for being performed
active_shards_percent_as_number
Ratio of active shards in the cluster expressed as a percentage
Troubleshooting
If a cluster is exceptional, you need to pay attention to the shards that are not assigned properly in unassigned_shards. The following is an example:
Finding exceptional index
View the index status and find the exceptional index based on the response.
GET /_cat/indices
﻿
Viewing exception details
GET /_cluster/allocation/explain
﻿
Through the exception information, you can see that:
1. The primary shard is currently in the unassigned status (current_state). This problem occurs because the node to which the shard was assigned left the cluster (unassigned_info.reason).
2. After the above problem occurs, the shard cannot be automatically assigned because there are no available replicas of the shard in the cluster (can_allocate).
3. In addition, more detailed information is provided (allocate_explanation).
This problem occurs because the cluster has an offline node, so the primary shard has no available shard data. Currently, the only thing you can do is to wait for the node to recover and join the cluster again.
Note: 
In some extreme cases (for example, a shard in a single-replica cluster is corrupted, or the file system fails, causing the node to be removed permanently), you can only accept the fact of data loss and use the reroute command to assign an empty primary shard again. To avoid such cases as much as possible, we recommend you appropriately design index shards and refrain from setting a single replica for the index (single replica is also called zero replica, which means that an index has a primary shard but no replica shards). With the appropriate design of index shards, you can control the total number of shards in the cluster at a healthy scale, make better use of the distributed cluster characteristics while ensuring high availability, and improve the overall cluster performance.
All possible reasons of unassigned shards (unassigned_info.reason)
You can use the following analysis methods to preliminarily figure out the reason why there is an unassigned shard in the cluster. Generally, you can find the reason through the allocation explain API.
Note: 
If the cluster status hasn't automatically recovered after a long period of time, or you cannot fix the problem, please submit a ticket for assistance.
Reason
Description
INDEX_CREATED
Unassigned as a result of an API creation of an index
CLUSTER_RECOVERED
Unassigned as a result of a full cluster recovery
INDEX_REOPENED
Unassigned as a result of opening a closed index
DANGLING_INDEX_IMPORTED
Unassigned as a result of importing a dangling index
NEW_INDEX_RESTORED
Unassigned as a result of restoring into a new index
EXISTING_INDEX_RESTORED
Unassigned as a result of restoring into a closed index
REPLICA_ADDED
Unassigned as a result of explicit addition of a replica
ALLOCATION_FAILED
Unassigned as a result of a failed allocation of the shard
NODE_LEFT
Unassigned as a result of the node hosting it leaving the cluster
REROUTE_CANCELLED
Unassigned as a result of explicit cancel reroute command
REINITIALIZED
When a shard moves from started back to initializing
REALLOCATED_REPLICA
A better replica location is identified and causes the existing replica allocation to be canceled

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Metric	Description
cluster_name	Cluster name
status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: <br>– green: all shards are assigned </br>– yellow: all primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data may be unavailable until that node is repaired </br>– red: one or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned
timed_out	If `false`, the response is returned within the period of time that is specified by the `timeout` parameter (30s by default)
number_of_nodes	Number of nodes in the cluster
number_of_data_nodes	Number of nodes that are dedicated data nodes
active_primary_shards	Number of active primary shards
active_shards	Total number of active primary and replica shards
relocating_shards	Number of shards that are under relocation
initializing_shards	Number of shards that are under initialization
unassigned_shards	Number of unassigned shards
delayed_unassigned_shards	Number of shards whose assignment has been delayed by the timeout settings
number_of_pending_tasks	Number of cluster-level changes that have not yet been executed
number_of_in_flight_fetch	Number of unfinished fetches
task_max_waiting_in_queue_millis	Time expressed in milliseconds since the earliest initiated task is waiting for being performed
active_shards_percent_as_number	Ratio of active shards in the cluster expressed as a percentage

Reason	Description
INDEX_CREATED	Unassigned as a result of an API creation of an index
CLUSTER_RECOVERED	Unassigned as a result of a full cluster recovery
INDEX_REOPENED	Unassigned as a result of opening a closed index
DANGLING_INDEX_IMPORTED	Unassigned as a result of importing a dangling index
NEW_INDEX_RESTORED	Unassigned as a result of restoring into a new index
EXISTING_INDEX_RESTORED	Unassigned as a result of restoring into a closed index
REPLICA_ADDED	Unassigned as a result of explicit addition of a replica
ALLOCATION_FAILED	Unassigned as a result of a failed allocation of the shard
NODE_LEFT	Unassigned as a result of the node hosting it leaving the cluster
REROUTE_CANCELLED	Unassigned as a result of explicit cancel reroute command
REINITIALIZED	When a shard moves from started back to initializing
REALLOCATED_REPLICA	A better replica location is identified and causes the existing replica allocation to be canceled

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha