tencent cloud

Feedback

MySQL Data Source

Last updated: 2024-11-01 17:42:33

    MySQL Environment Preparation and Database Configuration

    If you want to perform offline data synchronization using MySQL, you need to first confirm the version support situation of the MySQL data source and the supported read data type conversions.

    Supported Versions

    Follow these version limitations:
    Data Source Type
    Edition
    Driver
    MySQL
    5.6,5.7,8.0.x
    JDBC Driver:8.0.21

    Confirm MySQL Version

    Data Integration has version requirements for MySQL. Check if the MySQL version to be synchronized meets the version requirements. You can check the current MySQL database version using the following statement in the MySQL database.
    select version();

    Set MySQL Server Permissions

    You can define a MySQL user with appropriate permissions.
    1. Create MySQL User (Optional):
    mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password';
    2. Grant Required Permissions to User:
    For offline data synchronization, the account must have SELECT permission on the database. Refer to the following example to execute the command:
    mysql> GRANT SELECT ON *.* TO 'user' IDENTIFIED BY 'password';
    3. Refresh the user's permissions:
    mysql>FLUSH PRIVILEGES;

    Data source configuration

    Enter the data source configuration interface, MySQL data sources support two connection methods: cloud instance and connection string.
    click Project Management > Data Source Management > Create Data Source > Select MySQL Data Source.
    Create a data source via connection string.
    
    
    
    Parameter
    Description
    Data Source Name
    The name of the newly created data source, defined by the user and cannot be empty. It should start with a letter and can include letters, numbers, and underscores. The length should be within 20 characters.
    Description
    Optional, description of this data source.
    Data Source Permission
    Project sharing means all members of the current data source project can use it; Individual Only and Administrator indicates that the data source is only available to the creator and project administrators.
    Deployment Method
    Supports two deployment modes: Self-Built Instance and Public Network Instance.Self-Built Instance is deployed on Tencent CVM, while Public Network Instanceis is in the client’s local IDC or other cloud resources, accessible via public network.
    Region and Network
    When selecting a self-built instance, you need to select the region and vpcID where the data source instance is located.
    JDBC URL
    Connection string information used to connect to the MySQL data source instance, including host IP, port, database name, and other information.
    Username
    Username for connecting to the database.
    Password
    Password for connecting to the database.
    Data Connectivity
    Test whether the configured database can be connected.
    Note:
    If the connectivity test fails, the data source can still be saved. However, if saved, the data source cannot be used until the connectivity test passes.
    If the connectivity test fails, it might be because WeData is blocked by the network firewall where the database is located.

    MySQL Offline Single Table Read Node Configuration

    
    
    
    Parameter
    Description
    Data Source
    Available MySQL Data Source.
    Database
    Supports selection or manual input of the library name to read from.
    By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
    If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
    Table
    Supports selecting or manually entering the table name to be read.
    In the case of table partitioning, you can select or input multiple table names at the MySQL source end, and ensure that the structure of multiple tables is consistent.
    In the case of table partitioning, the table index range can be configured. For instance, 'table_[0-99]' indicates reading 'table_0', 'table_1', 'table_2' up to 'table_99'; if the numeric suffix of your tables has a consistent length, like 'table_000','table_001','table_002' up to 'table_999', you can configure it as '"table": ["table_00[0-9]", "table_0[10-99]", "table_[100-999]"]'.
    If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
    Add Shared Database/Table
    Applicable to sharding scenarios. Click to configure multiple data sources, databases, and table information. In sharding scenarios, ensure all table structures are consistent. The task configuration will default to displaying and using the structure of the first table to obtain data.
    Split Key
    Specify the field used for data sharding. After specifying, concurrent tasks will be initiated for data synchronization, improving data synchronization efficiency. You can use a column in the source data table as the partition key. It is recommended to use the primary key or indexed columns as the partition key. Only integer-type fields are supported.
    Filter Conditions (Optional)
    In actual business scenarios, data from the current day is usually chosen for synchronization. Specify the where condition as gmt_create>$bizdate. The where condition can effectively carry out business incremental synchronization.
    If the WHERE clause is not provided, including missing the key or value, the data synchronization is treated as full data synchronization.
    You cannot set the WHERE condition to LIMIT 10, as this does not comply with MySQL WHERE clause constraints.
    Advanced Settings (Optional)
    You can configure parameters according to business needs.

    MySQL Offline Single Table Write Node Configuration

    
    
    
    Parameter
    Description
    Data Destination
    The MySQL data source to write to.
    Write Type
    Single table write: Supports single table writing
    Write to Sharded Databases/Tables: Supports one-time writing to sharded MySQL tables within the same data source. Ensure all table schema structures are consistent and all target tables exist.
    Database
    Supports selection or manual input of the database name to write to
    By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
    If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
    Table
    Supports selection or manual input of the table name to write to
    If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name.
    Data synchronization can still be performed when the Data Integration network is connected.
    Whether to Clear Table
    Before writing to the MySQL data table, you can manually choose whether to clear the data table.
    Write Mode
    MySQL Writing supports three modes:
    Append: When a primary key/unique index conflict occurs, the conflicting rows cannot be written.
    Overwrite: In case of primary key/unique index conflict, the original row will be deleted first, and then the new row will be inserted.
    On duplicate key: When a primary key/uniqueness index conflict occurs, the new row will update the specified fields. Specified fields refer to the fields added in the field mapping configuration of the synchronization task.
    Batch Submission Size
    The number of records submitted in a single batch can greatly reduce the number of network interactions between the data synchronization system and MySQL, and improve overall throughput. If this value is set too high, it may lead to OOM exceptions in the data synchronization running process.
    Pre-Executed SQL (Optional)
    The SQL statement executed before the synchronization task. Fill in the correct SQL syntax according to the data source type, such as clearing the old data in the table before execution (truncate table tablename).
    Post-Executed SQL (Optional)
    The SQL statement executed after the synchronization task. Fill in the correct SQL syntax according to the data source type, such as adding a timestamp (alter table tablename add colname timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP).

    Data type conversion support

    Read

    The supported field types and type conversion mappings for MySQL reading are as follows (when processing MySQL, the data types of the MySQL data source and the data processing engine will be mapped):
    MySQL Data Types
    Internal Types
    tinyint, smallint, mediumint, int, bigint,year
    Long
    float, double, decimal
    Double
    varchar, char, tinytext, text, mediumtext, longtext,set,json
    String
    date, datetime, timestamp, time
    Date
    bit, bool
    Boolean
    tinyblob, mediumblob, blob, longblob, varbinary
    Bytes

    Write

    The supported field types and type conversion mappings for MySQL writing are as follows:
    Internal Types
    MySQL Data Types
    Long
    tinyint, smallint, mediumint, int, bigint,year
    Double
    float, double, decimal
    String
    varchar, char, tinytext, text, mediumtext, longtext,set,json
    Date
    date, datetime, timestamp, time
    Boolean
    bit, bool
    Bytes
    tinyblob, mediumblob, blob, longblob, varbinary

    FAQs

    1. Using COLLATE to specify the collation may cause some data synchronization duplication

    Cause:
    COLLATE is used to specify the collation of a field. It affects the order of the ORDER BY statement, the results of greater than and less than filters in WHERE conditions, as well as the results of DISTINCT, GROUP BY, and HAVING queries. All character type comparisons or sorting are related to COLLATE.
    Solution:
    Choose non-COLLATE columns when setting split keys for integration tasks.
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support