Overview
In data subscription (Kafka Edition, where the current Kafka Server version is 2.6.0), subscribed data in Protobuf format can be consumed by using a Flink client (only the DataStream API type). This document provides a demo for data consumption with consumer-demo-tdsql-pb-flink for you to quickly test the process of data consumption and understand the method of data parsing.
Prerequisites
3. You have installed Flink, and it can execute tasks normally. Note
The demo only prints out the consumed data and contains no instructions. You need to write your own data processing logic based on the demo.
Currently, data subscription to Kafka for consumption can only be implemented over the Tencent Cloud private network but not the public network. In addition, the subscribed database instance and the data consumer must be in the same region.
In scenarios where only the specified databases and tables (part of the source instance objects) are subscribed and single-partition Kafka topics are used, only the data of the subscribed objects will be written to Kafka topics after DTS parses the incremental data. The data of non-subscription objects will be converted into empty transactions and then written to Kafka topics. Therefore, there are empty transactions during message consumption. The BEGIN and COMMIT messages in the empty transactions contain the GTID information, which can ensure the GTID continuity and integrity.
To ensure that data can be rewritten from where the task is paused, DTS adopts the checkpoint mechanism for data subscription. Specifically, when messages are written to Kafka topics, a checkpoint message is inserted every 10 seconds to mark the data sync offset. When the task is resumed after being interrupted, data can be rewritten from the checkpointed offeset. The consumer commits a consumption offset every time it encounters a checkpoint message so that the consumption offset can be updated timely.
Downloading Consumption Demo
When configuring the subscription task, you can only select Protobuf as the format of the subscribed data for TDSQL for MySQL. As Protobuf adopts the binary format, the consumption efficiency is high. The following demo already contains the Protobuf protocol file, so you don't need to download it separately. If you download it on your own, use Protobuf 3.X for code generation to ensure that data structures are compatible. Directions for the Java Flink Demo
Compiling environment: Maven and JDK8. You can choose a desired package management tool. The following takes Maven as an example.
Runtime environment: Tencent Cloud CVM (which can access the private network address of the Kafka server only if it is in the same region as the subscribed instance). Install JRE8.
Directions:
1. Download the consumer-demo-tdsql-pb-flink.zip file and unzip it.
2. Access the directory of the unzipped file. The pom.xml
file has been placed under the directory for your convenience. You need to modify the Flink cluster version to the same as that specified in the pom.xml
dependency.
3. The value of ${flink.version}
in the code below must be the same as the Flink cluster version.
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
4. Go to the directory where the pom.xml
file is located and package it with Maven or IEDA.
Package with Maven by running mvn clean package
.
5. If the Flink client type is DataStream API, use Flink client commands to submit the job to the Flink cluster and start consumption.
./bin/flink run consumer-demo-tdsql-pb-flink.jar --brokers xxx --topic xxx --group xxx --user xxx --password xxx --trans2sql
brokers
is the private network access address for data subscription to Kafka, and topic
is the subscription topic, which can be viewed on the Subscription details page as instructed in Viewing Subscription Details.group
, user
, and password
are the name, account, and password of the consumer group, which can be viewed on the Consumption Management page as instructed in Managing Consumer Group.trans2sql
indicates whether to enable conversion to SQL statement. In Java code, if this parameter is carried, the conversion will be enabled.
6. Execute DML statements in the source database.
CREATE TABLE `flink_test` (
`id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`parent_id` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
`user_id` bigint NOT NULL,
`type` int NOT NULL COMMENT '1: Expenditure 2: Income',
`create_time` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=DYNAMIC
7. Observe the consumption performed by the previously submitted job.
View the specific task logs on task managers.
View the specific Stdout information on task managers.
Was this page helpful?