Overview
This document describes how to configure an alarm policy based on logs so that alarms can be sent when certain conditions are met, such as when there are too many error logs or the API response time is too long.
Prerequisites
The log topic is not in STANDARD_IA storage, which doesn't support alarm policy configuration. An alarm policy requires SQL statements. We recommend that you structure logs as instructed in Collection Overview. You have logged in to the CLS console and entered the Alarm Policy page. Directions
On the Alarm Policy page, click Create and configure the following items.
Configuring the monitoring object and monitoring task
Monitoring Object: Select the target log topic(s). It can be determined whether the trigger conditions are met separately for each log topic. You can select up to 20 log topics in the same region. If multiple log topics meet the trigger conditions at the same time, multiple alarms will be generated at a time.
Monitoring Task
Query Statement: It is used for log topics and needs to contain the analysis statement (i.e., SQL statement as described in Overview and Syntax Rules). Example 1: To count logs with errors, use status:error | select count(*) as ErrCount
.
Example 2: To calculate the average response time of the domain name "domain:aaa.com", enter domain:"aaa.com" | select avg(request_time) as Latency
.
Query Time Range: It indicates the time range of data for query by the query statement, which can be up to the last 24 hours.
Trigger Condition: An alarm is triggered when the trigger condition is met. In the condition expression, $N.keyname
is used to reference the query statement result. Here, $N
indicates the Nth query statement in the current alarm policy, and keyname
indicates the corresponding field name. For more information on the expression syntax, see Trigger Condition Expression. Example 1: To trigger an alarm when the number of logs with errors exceeds 10, enter $1.ErrCount > 10
. Here, $1
indicates the first query statement, and ErrCount
indicates the ErrCount
field in the result.
Example 2: To trigger an alarm when the domain name "domain:aaa.com" takes more than 5 seconds on average to respond, enter $2.Latency > 5
. Here, $2
indicates the first query statement, and Latency
indicates the Latency
field in the result.
Trigger by Group: It specifies whether the trigger condition expression should trigger alarms by group. When it is enabled, if multiple results of the query statement meet the trigger condition, the results will be grouped based on the group field, and an alarm will be triggered for each group.
For example, if the query statement 2 is * | select avg(request_time) as Latency,domain group by domain order by Latency desc limit 5
, and multiple results are returned:
|
12.56 | aaa.com |
9.45 | bbb.com |
7.23 | ccc.com |
5.21 | ddd.com |
4.78 | eee.com |
If the trigger condition is `$2.Latency > 5`, then it is met by four results.
If triggering by group is not enabled, only one alarm will be triggered when the trigger condition is met by one of the above execution results.
If it is enabled and the results are grouped by the `domain` field, four alarms will be triggered separately for the above execution results.
Note:
When triggering by group is enabled, the trigger condition may be met by multiple results, and a large number of alarms will be triggered, leading to an alarm storm. Therefore, configure the group field and trigger condition appropriately.
When specifying the group field, you can divide execution results into up to 1,000 groups. No alarms will be triggered for excessive groups.>
Execution Cycle: It indicates the execution frequency of the monitoring task, which can be configured in the following two ways:
|
Fixed frequency | Monitoring tasks are performed at fixed intervalsInterval: 1–1,440 minutes. Granularity: Minute | Monitoring tasks are performed once every 5 minutes |
Fixed time | Monitoring tasks are performed once at fixed points in timeTime point range: 00:00–23:59. Granularity: Minute | Monitoring tasks are performed once at 02:00 every day |
Configuring multi-dimensional analysis
When an alarm is triggered, raw logs can be further analyzed through multi-dimensional analysis, and the analysis result can be added to the alarm notification to facilitate root cause discovery. The multi-dimensional analysis doesn't affect the alarm trigger condition.
|
| Get the raw logs that meet the search condition of the query statement. The log field, quantity, and display form can be configured. For example, when an alarm is triggered by too many error logs, you can view the detailed logs in the alarm. |
Top 5 field values by occurrence and their percentages | For all the logs within the time range when the alarm is triggered, group them based on the specified field and get the top 5 field values and their percentages. For example, when an alarm is triggered by too many error logs, you can get the top 5 URLs and top 5 response status codes. |
Custom search and analysis | Execute the custom search and analysis statement for all the logs within the time range when the alarm is triggered. Example 1: `* |
Note:
The "related raw logs" and "top 5 field values by occurrence and their percentages" options support the automatic association with the search condition of the specified query statement (excluding the analysis statement, i.e., SQL filter condition), so as to indicate to perform multi-dimensional analysis on raw logs that meet what conditions.
Configuring an alarm notification
Alarm Frequency:
Duration: A notification will be sent only after the trigger condition is met constantly a certain number of times (which can be 1–10 and is 1 by default).
Interval: No notifications will be sent within the specified interval after the last notification. For example, the an alarm will be triggered every 15 minutes option indicates that only one alarm will be sent within 15 minutes.
Notification Group:
The notification channels and objects can be set by associating a notification channel group. Notifications can be sent by SMS, email, phone call, Weixin, WeCom, and custom callback API (webhook). For more information, please see Managing Notification Groups. Notification Content:
By adding preset variables to the notification content, you can add specified information to the alarm notification. For more information on variables, see Alarm Notification Variable. Custom Webhook Configuration:
If the selected notification group contains a custom webhook, the custom webhook input box will be displayed. You can customize the request header and request body there, which will be used by CLS to call the specified API when an alarm is triggered. In the request header and body, you can use notification content variables to send relevant data to the specified API. Best Practices
문제 해결에 도움이 되었나요?