Data Source Type | Edition | Driver |
MySQL | 5.6,5.7,8.0.x | JDBC Driver:8.0.21 |
| RDS MySQL | 5.6,5.7, 8.0.x |
| PolarDB MySQL | 5.6,5.7,8.0.x |
| Aurora MySQL | 5.6,5.7,8.0.x |
| MariaDB | 10.x |
| PolarDB X | 2.0.1 |
select version();
mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT SELECT, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password';
mysql>FLUSH PRIVILEGES;
show variables like "log_bin"
show variables like "log_slave_updates";
show variables like "binlog_format";
show variables like "binlog_row_image";
show global variables like '%GTID%';
+--------------------------+-------+| Variable_name | Value |+--------------------------+-------+| enforce_gtid_consistency | ON || gtid_mode | ON |+--------------------------+-------+
Parameter | Description |
Data Source | Select the MySQL data source to be synchronized. |
Source Table | All databases and tables: Monitor all databases under the data source. Newly added databases and tables during the task run will be synchronized to the target by default. Specific table: Under this option, you need to specify the exact table name. After setting, the task will only synchronize the specified table; if you need to add a new table for synchronization, you need to stop and restart the task. Specific database: Under this option, you need to specify the exact database name, using regular expression for table names. After setting, new tables that match the table name expression during the task run will be synchronized to the target by default. |
Read Mode | Full + Increment: Data synchronization is divided into full and increment phases. After the full phase is completed, the task enters the increment phase. The full phase will synchronize historical data in the database, and the incremental phase starts synchronizing from the binlog cdc location after the task starts. Increment Only: Only synchronize data from the binlog cdc position after the task starts. |
Consistency Semantics | Only represents the consistency semantics of the reading end. Supports At-least-once and Exactly-once. At-least-once: Data may be read more than once, relying on the destination end's deduplication to ensure data consistency. Suitable for scenarios with large data volumes in the full phase and using non-numeric primary keys, with high synchronization performance requirements. Exactly-once: Data is read strictly once, with some performance losses. Tables without a primary key and unique index column are not supported. Suitable for general scenarios where the source table has a numeric primary key or unique index column. The current version's two modes are state-incompatible. If the mode is changed after task submission, stateful restart is not supported. |
Filter Operation | Supports three operations: insert, update, and delete. Data of the specified operation type will not be synchronized when set. |
Time Zone | Set the timezone for log time, default is Shanghai. |
Sync gh-ost temporary table | The business scenario for gh-ost is to perform online table structure changes in MySQL, known as Online DDL, without affecting normal business operations. It can solve problems such as table locks, performance degradation, and synchronization delays caused by traditional alter table or create index commands. It is suitable for scenarios where table modifications are needed, such as adding new columns, adding indexes, modifying field types, etc. Pre-requisites for using gh-ost: gh-ost must have access to MySQL. If MySQL is Tencent Cloud's CDB, you need to add the --aliyun-rds parameter to the gh-ost execution command. During the execution of the gh-ost tool, the rule for generating temporary tables is ^_(.*)_(gho|ghc|del)$, where (.*) is the name of the table being changed. Custom names for temporary tables are not supported. For other gh-ost limitations, you can refer to gh-ost/requirements-and-limitations.md at master · github/gh-ost · GitHub. To sync gh-ost temporary tables, you need to turn on this switch. Once enabled, it will monitor '_tablename_gho', '_tablename_ghc', '_tablename_del', and sync changes to the '_tablename_gho' table. |
Advanced Settings (optional) | You can configure parameters according to business needs. |
Primary key supports range types | TINYINT,TINYINT_UNSIGNED,SMALLINT,SMALLINT_UNSIGNED,INT,MEDIUMINT,INT_UNSIGNED,MEDIUMINT_UNSIGNED,BIGINT,BIGINT_UNSIGNED,FLOAT,DOUBLE,DECIMAL,TIME,DATE,DATETIME,TIMESTAMP,CHAR,VARCHAR,TEXT,BINARY,VARBINARY,BLOB. |
Parameter | Description |
Data Source | Select MySQL data source. |
Database | Support selecting or manually entering the name of the required database. By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected. |
Table | Support selecting or manually entering the table name to be read. In the case of table partitioning, you can select or enter multiple table names on the MySQL source end. Multiple tables need to have consistent structures. In the case of partitioned tables, you can configure the table index range. For example, 'table_[0-99]' indicates reading 'table_0','table_1','table_2' up to 'table_99'. If your table number suffix length is consistent, such as 'table_000','table_001','table_002' up to 'table_999', you can configure it as '"table": ["table_00[0-9]", "table_0[10-99]", "table_[100-999]"]'. If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected. |
Add Shared Database/Table | Applicable to sharding scenarios. Click to configure multiple data sources, databases, and table information. In sharding scenarios, ensure all table structures are consistent. The task configuration will default to displaying and using the structure of the first table to obtain data. |
Shard Column | The shard column is used to divide the table into multiple shards for synchronization. It is recommended to prioritize using the primary key of the table as the shard column for tables with primary keys; for tables without primary keys, it is recommended to choose indexed columns as the shard column, and ensure that the shard column does not have data update operations, otherwise only At-Least-Once semantics can be guaranteed. |
Read Mode | Full + Increment: Data synchronization is divided into full and increment phases. After the full phase is completed, the task enters the increment phase. The full phase will synchronize historical data in the database, and the incremental phase starts synchronizing from the binlog cdc location after the task starts. Increment Only: Only synchronize data from the binlog cdc position after the task starts. |
Consistency Semantics | Only represents the consistency semantics of the reading end. Supports At-least-once and Exactly-once. At-least-once: Data may be read more than once, relying on the destination end's deduplication to ensure data consistency. Suitable for scenarios with large data volumes in the full phase and using non-numeric primary keys, with high synchronization performance requirements. Exactly-once: Data is read strictly once, with some performance losses. Tables without a primary key and unique index column are not supported. Suitable for general scenarios where the source table has a numeric primary key or unique index column. The current version's two modes are state-incompatible. If the mode is changed after task submission, stateful restart is not supported. |
Filter Operation | Once set, data for the specified operation type will not be synchronized; supports filtering insert, update, and delete operations. |
Time Zone | Set the timezone for log time, default is Shanghai. |
Special Type Conversion | Map tinyint(1) to bool or tinyint, default to bool. |
Parameter (optional) | You can configure parameters according to business needs. |
Parameter | Description |
Data Destination | Select MySQL data source. |
Database | Support selecting or manually entering the name of the required database. By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected. |
Table | Support selecting or manually entering the table name to be read. If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected. |
Parameter (optional) | You can configure parameters according to business needs. |
Parameter | Description |
Data Destination | Select the available MySQL data source in the current project. |
Database | Select the corresponding database in the data source. |
Table | Select the corresponding table in the data source. |
Advanced Settings (optional) | You can configure parameters according to business needs. |
Field Type | Supported | Internal Mapping Fields | Remarks |
TINYINT | Yes | TINYINT | TINYINT(1) maps to BOOLEAN Add an option to support mapping TINYINT(1) to either bool or tinyint |
SMALLINT | Yes | SMALLINT | - |
TINYINT_UNSIGNED | Yes | SMALLINT | - |
TINYINT_UNSIGNED_ZEROFILL | Yes | SMALLINT | - |
INT | Yes | INT | - |
INTEGER | Yes | INT | - |
YEAR | Yes | INT | - |
MEDIUMINT | Yes | INT | - |
SMALLINT_UNSIGNED | Yes | INT | - |
SMALLINT_UNSIGNED_ZEROFILL | Yes | INT | - |
BIGINT | Yes | LONG | - |
INT_UNSIGNED | Yes | LONG | - |
MEDIUMINT_UNSIGNED | Yes | LONG | - |
MEDIUMINT_UNSIGNED_ZEROFILL | Yes | LONG | - |
INT_UNSIGNED_ZEROFILL | Yes | LONG | - |
BIGINT_UNSIGNED | Yes | DECIMAL | DECIMAL(20,0) |
BIGINT_UNSIGNED_ZEROFILL | Yes | DECIMAL | DECIMAL(20,0) |
SERIAL | Yes | DECIMAL | DECIMAL(20,0) |
FLOAT | Yes | FLOAT | - |
FLOAT_UNSIGNED | Yes | FLOAT | - |
FLOAT_UNSIGNED_ZEROFILL | Yes | FLOAT | - |
DOUBLE | Yes | DOUBLE | - |
DOUBLE_UNSIGNED | Yes | DOUBLE | - |
DOUBLE_UNSIGNED_ZEROFILL | Yes | DOUBLE | - |
DOUBLE_PRECISION | Yes | DOUBLE | - |
DOUBLE_PRECISION_UNSIGNED | Yes | DOUBLE | - |
ZEROFILL | Yes | DOUBLE | - |
REAL | Yes | DOUBLE | - |
REAL_UNSIGNED | Yes | DOUBLE | - |
REAL_UNSIGNED_ZEROFILL | Yes | DOUBLE | - |
NUMERIC | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
NUMERIC_UNSIGNED | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
NUMERIC_UNSIGNED_ZEROFILL | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
DECIMAL | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
DECIMAL_UNSIGNED | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
DECIMAL_UNSIGNED_ZEROFILL | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
FIXED | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
FIXED_UNSIGNED | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
FIXED_UNSIGNED_ZEROFILL | Yes | DECIMAL | Using the actual precision of the user database p<=38 maps to DECIMAL, 38 < p <= 65 maps to String |
BOOLEAN | Yes | BOOLEAN | - |
DATE | Yes | DATE | - |
TIME | Yes | TIME | - |
DATETIME | Yes | TIMESTAMP | - |
TIMESTAMP | Yes | TIMESTAMP | - |
CHAR | Yes | STRING | - |
JSON | Yes | STRING | - |
BIT | Yes | STRING | BIT(1) mapped to BOOLEAN |
VARCHAR | Yes | STRING | - |
TEXT | Yes | STRING | - |
BLOB | Yes | STRING | - |
TINYBLOB | Yes | STRING | - |
TINYTEXT | Yes | STRING | - |
MEDIUMBLOB | Yes | STRING | - |
MEDIUMTEXT | Yes | STRING | - |
LONGBLOB | Yes | STRING | - |
LONGTEXT | Yes | STRING | - |
VARBINARY | Yes | STRING | - |
GEOMETRY | Yes | STRING | - |
POINT | Yes | STRING | - |
LINESTRING | Yes | STRING | - |
POLYGON | Yes | STRING | - |
MULTIPOINT | Yes | STRING | - |
MULTILINESTRING | Yes | STRING | - |
MULTIPOLYGON | Yes | STRING | - |
GEOMETRYCOLLECTION | Yes | STRING | - |
ENUM | Yes | STRING | - |
BINARY | Yes | BINARY | BINARY(1) |
SET | No | | - |
Internal Type | MySQL Type |
TINYINT | TINYINT |
SMALLINT | SMALLINT,TINYINT UNSIGNED |
INT | INT,MEDIUMINT,SMALLINT UNSIGNED |
BIGINT | BIGINT,INT UNSIGNED |
DECIMAL(20, 0) | BIGINT UNSIGNED |
FLOAT | FLOAT |
DOUBLE | DOUBLE,DOUBLE PRECISION |
DECIMAL(p, s) | NUMERIC(p, s),DECIMAL(p, s) |
BOOLEAN | BOOLEAN,TINYINT(1) |
DATE | DATE |
TIME [(p)][WITHOUT TIMEZONE] | TIME [(p)] |
TIMESTAMP [(p)][WITHOUT TIMEZONE] | DATETIME [(p)] |
STRING | CHAR(n),VARCHAR(n),TEXT |
BYTES | BINARY,VARBINARY,BLOB |
ARRAY | - |
com.github.shyiko.mysql.binlog.network.ServerException: A slave with the same server_uuid/server_id as this slave has connected to the master.
Caused by: org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at GTIDs xxx and binlog file 'binlog.xxx', pos=xxx, skipping 4 events plus 1 rows, but this is no longer available on the server. Reconfigure the connector to use a snapshot when needed.
EventDataDeserializationException: Failed to deserialize data of EventHeaderV4 .... Caused by: java.net.SocketException: Connection reset.
ERROR io.debezium.connector.mysql.MySqlStreamingChangeEventSource [] - Error during binlog processing. Last offset stored = null, binlog reader near position = mysql-bin.000044/211839464.2023-02-20 21:37:28.480 [blc-172.17.48.3:3306] ERROR io.debezium.pipeline.ErrorHandler [] - Producer failureio.debezium.DebeziumException: Error processing binlog event.
Was this page helpful?