Overview
When using Flink, you need to monitor its task running status to know whether the tasks run normally and troubleshoot faults. TMP integrates Pushgateway to allow Flink to write metrics and provides an out-of-the-box Grafana monitoring dashboard for it.
Prerequisites
1. The EMR product you purchased includes the Flink component, and a Flink task is running in your instance.
2. You have created a TKE cluster in the region and VPC of your TMP instance. Directions
Integrating product
Getting Pushgateway access configuration
1. Log in to the EMR console, select the corresponding instance, and select Basic Info > Instance Info to get the Pushgateway address and token.
Modifying Flink configuration
1. Log in to the EMR console, select the corresponding instance, and select Cluster Service. 2. Find the Flink configuration item and select Configuration Management in the Operation column on the right to enter the configuration management page.
3. On the right of the page, click Add Configuration Item and add the following configuration items one by one:
|
metrics.reporter.promgateway.class | None | String | Name of the Java class for exporting metrics to Pushgateway | - |
metrics.reporter.promgateway.jobName | None | String | Push task name | Specify an easily understandable string |
metrics.reporter.promgateway.randomJobNameSuffix | true | Boolean | Whether to add a random string after the task name | Set it to `true`. If no random string is added, metrics of different Flink tasks will overwrite each other |
metrics.reporter.promgateway.groupingKey | None | String | Global label added to each metric in the format of `k1=v1;k2=v2` | Add the EMR instance ID to distinguish between the data of different instances, such as `instance_id=emr-xxx` |
metrics.reporter.promgateway.interval | None | Time | Time interval for pushing metrics, such as 30s | We recommend you set the value to about 1 minute |
metrics.reporter.promgateway.host | None | String | Pushgateway service address | It is the service address of the TMP instance in the console |
metrics.reporter.promgateway.port | -1 | Integer | Pushgateway service port | It is the port of the TMP instance in the console |
metrics.reporter.promgateway.needBasicAuth | false | Boolean | Whether the Pushgateway service requires authentication | Set it to `true`, as the Pushgateway of TMP requires authentication |
metrics.reporter.promgateway.user | None | String | Username for authentication | |
metrics.reporter.promgateway.password | None | String | Password for authentication | It is the access token of the TMP instance in the console |
metrics.reporter.promgateway.deleteOnShutdown | true | Boolean | Whether to delete the corresponding metrics on the Pushgateway after the Flink task is completed | Set it to `true` |
Below is a sample configuration:
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.jobName: climatePredict
metrics.reporter.promgateway.randomJobNameSuffix:true
metrics.reporter.promgateway.interval: 60 SECONDS
metrics.reporter.promgateway.groupingKey:instance_id=emr-xxxx
metrics.reporter.promgateway.host: 172.xx.xx.xx
metrics.reporter.promgateway.port: 9090
metrics.reporter.promgateway.needBasicAuth: true
metrics.reporter.promgateway.user: appid
metrics.reporter.promgateway.password: token
Installing Flink Pushgateway plugin
The Pushgateway plugin in the official package currently does not support configuring the authentication information, but TMP requires authentication before data can be written. Therefore, we recommend you use the JAR package we provide. We have also submitted a pull request for supporting authentication to the Flink team.
1. To prevent class conflicts, if you have already used the official Flink plugin, run the following command to delete it first:
cd /usr/local/service/flink/lib
rm flink-metrics-prometheus*jar
2. In the EMR console, select the corresponding instance, and select Cluster Resource > Resource Management > Master to view the master node. 3. Click the instance ID to go to the CVM console, log in to the CVM instance, and run the following command to install the plugin:
cd /usr/local/service/flink/lib
wget https://rig-1258344699.cos.ap-guangzhou.myqcloud.com/flink/flink-metrics-prometheus_2.11-auth.jar -O flink-metrics-prometheus_2.11-auth.jar
Verifying
1. Run the flink run
command on the master node to submit a new task and view the task log:
grep metrics /usr/local/service/flink/log/flink-hadoop-client-*.log
2. If the log contains the following content, the configuration is successfully loaded:
Note:
As tasks previously submitted in the cluster use the old configuration file, their metrics are not reported.
1. In Integration Center in the target TMP instance, find Flink monitoring, install the corresponding Grafana dashboard, and then you can enable the Flink monitoring dashboard.
2. Enter Grafana and click to expand the Flink monitoring panel.
3. Click Flink Job List to view the monitoring information.
4. Click a job name or job ID in the table to view the job monitoring details.
5. Click Flink Cluster in the top-right corner to view the Flink cluster monitoring information.
6. Click a task name in the table to view the task monitoring details.
Integrating with alert feature
1. Log in to the TMP console and select the target TMP instance to enter the management page. 2. Click Alerting Rule and add the corresponding alerting rules. For more information, please see Creating Alerting Rule.
Was this page helpful?