tencent cloud

Stream Compute Service

Releases Notes and Announcements

Product Introduction

Purchase Guide

Billing Overview

Configuration Adjustments

Getting Started

Creating a Private Cluster

Creating a SQL Job

Creating a JAR Job

Creating an ETL Job

Creating a Python Job

Operation Guide

Managing Jobs

Job Information

Developing Jobs

Developing Jobs in Batches

Advanced Job Parameters

Setting the Maximum Parallelism of a Job

Configuring Job Resources

Managing Versions

Monitoring Jobs

Viewing Monitoring Information

Configuring Monitoring Alarms (Numerical Metrics)

Configuring Event Alarms (Events)

Monitoring Metric List

Connecting to Prometheus

Viewing the Flink UI of a Job

Job Logs

Configuring Running Log Collection for a Job

Events and Diagnosis

Diagnosis with Logs

Viewing Critical Events

Events

Checkpointing Failure

Abnormal TaskManager Pod Exit

Abnormal JobManager Pod Exit

TaskManager Full GC Too Long

Too-High TaskManager CPU Load

High/Severe TaskManager Backpressure

JobManager CPU Load Too High

JobManager Full GC Too Long

Managing Metadata

Databases and Tables

Managing Checkpoints

Tuning Jobs

Automated Tuning

Managing Dependencies

Managing Clusters

Viewing the Information of a Cluster

Scaling Out a Cluster

Terminating a Cluster

Scaling In a Cluster

Migrating a Cluster

Customizing DNS Service

Testing Network Connectivity

Managing Permissions

Configuring Basic Permissions

Space Role Permissions

SQL Developer Guide

Glossary and Data Types

DDL Statements

CREATE FUNCTION

DML Statements

Merging MySQL CDC Sources

Connectors

SET Statement

Flink Configuration Items

Operators and Built-in Functions

Comparison with Flink Built-in Functions

Comparison Functions

Logic Functions

Arithmetic Functions

Condition Functions

String Functions

Type Conversion Functions

Date and Time Functions

Aggregate Functions

Time Window Functions

Other Functions

Identifiers and Reserved Words

Python Developer Guide

ETL Developer Guide

Connectors

PostgreSQL Sinks

ClickHouse Sinks

Elasticsearch Sinks

DocumentationStream Compute ServiceOperation GuideEvents and DiagnosisEventsCheckpointing Failure

Checkpointing Failure

Last updated: 2023-11-07 16:40:32

Checkpointing Failure

Last updated: 2023-11-07 16:40:32

Overview
A ‍checkpoint failure event in Stream Compute Service indicates that, for a job for which checkpointing is enabled, a checkpoint fails due to timeout or any other reason.
For a long-running job, an occasional checkpoint failure may not represent severe exceptions in the job, and you just need to handle the issue only when checkpoints fail frequently. For example, a checkpoint (ID: 6717) of a job fails, as shown on the Checkpoints page of the Flink UI.
﻿
Conditions
Trigger
A checkpoint of a job fails, with FAILED as its final status.
Clearing
A subsequent checkpoint of the job succeeds, with COMPLETED as its final status.
Alarms
You can configure an alarm policy for this event to receive trigger and clearing notifications in real time.
Suggestions
The causes of a checkpoint failure event are available on the events page. Depending on the Flink execution links used, the direct causes of checkpoint failure or some common errors may be displayed, so further analysis is required based on the specific issue.
You can also, based on the time of checkpoint failure, view the error logs of the JobManager and TaskManagers near this time point on the logs and the Flink UI pages of the job as instructed in Viewing ‍the Logs of a Job and Viewing the Flink UI of a Job, respectively.
If you fail to identify errors as stated above due to too many TaskManagers or logs, you can search for exception logs under the instance ID of the checkpoint failure event as instructed in Diagnosis with Logs‍.
If the problem is still not found with the above diagnosis, please check as instructed in Viewing Monitoring Information whether resource overuse exists. In particular, you can focus on TaskManager CPU usage, heap memory usage, full GC count, full GC time, and other critical metrics to check whether exceptions exist.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

No

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support