Product Name | Alarm Type | Metric/Event Name | Alarm Rule |
CVM | Metric alarm | CPU utilization | The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points |
| | Memory utilization | The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points |
| | Disk utilization | The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points |
| | Public network bandwidth utilization | The statistical period is 1 minute, the threshold is >95%, and the continuous monitoring duration is 5 monitoring data points |
| Event alarm | Read-only disk | - |
TencentDB for MySQL (server monitoring) | Metric alarm | Disk utilization | The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points |
| | CPU utilization | The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points |
| Event alarm | OOM | - |
TencentDB for MongoDB | Metric alarm | Disk utilization | The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points |
| | Connection utilization | The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points |
TencentDB for Redis - CKV version/community version | Metric alarm | Capacity utilization | The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points |
TDSQL for MySQL | Event alarm | OOM | - |
| | | Instance read-only status (disk overrun) |
TDSQL for PostgreSQL | Event alarm | Insufficient memory | - |
| | | OOM |
CKafka - instance | Metric alarm | Disk utilization | The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points |
ES | Metric alarm | Average disk utilization | The statistical period is 1 minute, the threshold is >80%, and the continuous monitoring duration is 5 monitoring data points |
| | Average CPU utilization | The statistical period is 1 minute, the threshold is >90%, and the continuous monitoring duration is 5 monitoring data points |
| | Average JVM memory utilization | The statistical period is 1 minute, the threshold is >85%, and the continuous monitoring duration is 5 monitoring data points |
| | Cluster health | The statistical period is 1 minute, the threshold is >=1, and the continuous monitoring duration is 5 monitoring data points |
DTS | Event alarm | Data migration task interruption | - |
| | Data sync task interruption | - |
| | Data subscription task interruption | - |
EMR (server monitoring - disk) | Metric alarm | Disk utilization (used_all) | The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | inode utilization | The statistical period is 1 minute, the threshold is >50%, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (server monitoring - CPU) | Metric alarm | CPU utilization (idle) | The statistical period is 1 minute, the threshold is <2%, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (server monitoring - memory) | Metric alarm | Memory utilization (used_percent) | The statistical period is 1 minute, the threshold is >95%, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (server monitoring - network) | Event alarm | Metadatabase ping failure | - |
EMR (cluster monitoring) | Event alarm | Elastic scaling failure | - |
EMR (HBase - overview) | Metric alarm | Number of cluster RSs (numDeadRegionServers) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of cluster regions in RIT state (ritCountOverThreshold) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (HBase - HMaster) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (HBase - RegionServer) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of regions (regionCount) | The statistical period is 1 minute, the threshold is >600, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of requests in operation queue (compactionQueueLength) | The statistical period is 1 minute, the threshold is >500, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (HDFS - NameNode) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of missing blocks (NumberOfMissingBlocks) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| Event alarm | NameNode master/slave switch | - |
EMR (HDFS - DataNode) | Metric alarm | Number of XCeivers (XceiverCount) | The statistical period is 1 minute, the threshold is >1,000, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (HDFS - overview) | Metric alarm | Disk failure | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of cluster DataNodes (NumDeadDataNodes) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of cluster DataNodes (NumStaleDataNodes) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | HDFS storage space utilization (capacityusedrate) | The statistical period is 1 minute, the threshold is 90%, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (Presto - Presto_Coordinator) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (Presto - Presto_Worker) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (Presto - overview) | Metric alarm | Number of nodes (Failed) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (ClickHouse - server) | Metric alarm | Number of largest active data blocks in partition | The statistical period is 1 minute, the threshold is >250, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (Hive - HiveMetaStore) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | DaemonThreadCount | The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | ThreadCount | The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (Hive - HiveServer2) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | DaemonThreadCount | The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | ThreadCount | The statistical period is 1 minute, the threshold is >2,000, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (YARN - overview) | Metric alarm | Number of nodes (NumUnhealthyNMs) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of nodes (NumLostNMs) | The statistical period is 1 minute, the threshold is >0, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (YARN - NodeManager) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
EMR (YARN - ResourceManger) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| Event alarm | ResourceManager master/slave switch | - |
EMR (ZooKeeper - ZooKeeper) | Metric alarm | GC time (FGCT) | The statistical period is 1 minute, the threshold is >5s, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of Znodes (zk_znode_count) | The statistical period is 1 minute, the threshold is >100,000, and an alarm will be triggered once every 5 consecutive times the conditions are met |
| | Number of queuing requests (zk_outstanding_requests) | The statistical period is 1 minute, the threshold is >50, and an alarm will be triggered once every 5 consecutive times the conditions are met |
CLB (public network CLB instance) | Metric alarm | Discarded connections | The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met |
| | Discarded inbound data packets | The statistical period is 1 minute, the threshold is >10, and an alarm will be triggered once every 3 consecutive times the conditions are met |
| | Discarded inbound bandwidth | The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met |
| | Discarded outbound bandwidth | The statistical period is 1 minute, the threshold is >10 MB, and an alarm will be triggered once every 3 consecutive times the conditions are met |
| | Inbound bandwidth utilization | The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met |
| | Outbound bandwidth utilization | The statistical period is 1 minute, the threshold is >80%, and an alarm will be triggered once every 3 consecutive times the conditions are met |
Was this page helpful?