tencent cloud

Feedback

Database Task Configuration Overview

Last updated: 2024-11-01 17:05:07

    Background

    Whole database migration supports data and structural monitoring of the source end, allowing real-time synchronization of all databases and tables' full or incremental data from the source to the destination. It also supports automatic table creation at the destination, field change synchronization, and other features. Supported data sources include MySQL, Doris, DLC, Kafka, etc.

    Prerequisites

    1. The source and target data sources have been configured for subsequent tasks.
    2. The DataInLong Resource Group has been purchased.
    3. Network connectivity between the DataInLong Resource Group and the data source has been completed.
    4. The data source environment preparation has been completed. Based on the sync configuration you need, grant the data source configuration account the necessary operation permissions in the database before executing the sync task.

    Operation step

    Step 1: Creating a Whole Database Sync Task

    After entering the Configuration Center > Real-time Sync tasks page, click New to create a whole database migration task.
    
    
    

    Step 2: Link Selection

    Based on your actual business needs, select the types of data for the source and destination you need to sync. Following this selection, subsequent steps will display the corresponding source and destination configuration parameters. Ensure the selected data source type matches the actual configured data source type.
    
    
    

    Step 3: Data Source Settings

    In this step, you can select the libraries and tables from the source data source that need to be synchronized, configure the source end reading method, consistency semantics, filtering operations, timezone, etc.
    Note:
    The configuration items here vary depending on the type of source data source. Please refer to the actual configuration interface of the data source.
    
    
    

    Step 4: Data Destination Settings

    In this step, you can define the relevant attributes of the target data source and databases/tables, such as write mode, database, table name matching rules, etc., as well as define the name mapping rules between the target and source objects. The current strategies for matching whole database target objects support "Same Name as Source", as well as "Custom Definition"
    Same name as the source repository/source table
    By default, the source database and tables in the full library synchronization task will be written into the target data source with the same name. Under this policy, the system will match the objects with the same name in the target data source by default during task execution.
    
    Description: The task plan synchronizes TableA and TableB from the source database DB1 to the Doris data source. After configuring the same name as the source database and same name as the source table policies, the task will by default match "DB1.TableA" and "DB1.TableB" within the Doris connection.
    Custom Definition
    Custom Definition rules support setting special relationships between the source and target, such as uniformly adding a fixed prefix or suffix to the source database name or table name when writing into the target database or table. Under this strategy, when the task runs, the system will by default match the target object according to the naming conventions.
    
    
    
    Under the Custom Definition method, the system provides built-in system parameters for the whole database scenario. The built-in parameters mainly cover the source data source name, source database name, source table name, etc. Among them, for Kafka types, it supports dynamic matching of the value in the data field within the message. The explanation of built-in parameters under Custom Definition is as follows:
    Parameter name
    Parameter description
    Source Data Source Name
    ${datasource_name_di_src}
    Source Database Name
    ${db_name_di_src}
    Source Table Name
    ${table_name_di_src}
    Source Schema Name
    ${schema_name_di_src}
    Description: Applicable to data source types with a Schema attribute, such as PostgreSQL and Oracle.
    Source Topic Name
    ${topic_di_src}
    Note:Applicable only to Kafka type.
    Data Fields
    ${key}
    Note:
    Applicable only to Kafka type, please replace key with specific field names.
    Note:
    Example 1: If the source table name is table1, and the mapping rule is ${table_name_di_src}_inlong, then the data from table1 will be finally mapped and written into table1_inlong.
    Example 2: If the source Kafka message format is {"name":"inlong";"age":12}, and the mapping rule is ${name}_di, then this message will be finally mapped and written into the inlong_di table.
    The configuration items here vary depending on the type of target data source. Please refer to the actual configuration interface of each target.

    Step Five (Optional): Batch Manual Creation of Target Tables

    For the name matching strategy between source and target tables, the system will automatically convert the DDL structure between heterogeneous data sources, providing you with the capability for manual batch quick modification and creation of target tables.
    Note:
    1. Batch Table Creation completes the configuration of the target library and table name matching strategy
    2. Batch table creation supports only certain target data sources and synchronization links.
    1. Check the database and table name matching strategy, and click Batch Table Creation.
    
    
    
    2. Determine Table Creation Rules
    In this step, the system will scan target objects based on the database and table matching strategy and automatically generate target table DDL statements. You can check if the target table matches correctly, configure the target table creation method, and edit table creation statements in this step. Key features and corresponding descriptions are as follows:
    
    
    
    Key Feature Points
    Feature Overview
    Feature Example
    1
    Target Database/Table Name
    The system will default to generating target database/table names based on predefined matching rules, while verifying whether the database/table exists in the target data source:
    Failed Matches for Databases/Tables: No database/table objects found in the target data source that match the matching rules. The list defaults to highlighting the databases/tables with failed matches.
    Matching successful database/table: There are database/table objects in the target data source that match the matching rules.
    
    
    
    2
    Target table creation method
    For database/table objects in the target data source, the system offers various table creation strategies:
    Failed Matches for Databases/Tables: Supports Create New Table and Do Not Create New methods
    Create New Table: In this batch creation, target tables are automatically created with DDL generated from the conversion.
    Do Not Create New: This table is temporarily ignored in this operation.
    Matching successful database/table: Supports Use Existing Table and Delete existing table and create a new one methods
    Use Existing Table: This table is temporarily ignored in this operation.
    Delete existing table and create a new one: In this operation, delete the existing table and recreate a table of the same name with DDL generated from the conversion.
    
    
    
    
    3
    Preview/edit table statement
    For the specified target table creation method, the system will automatically generate a DDL example. You can view and edit the create table statement:
    1. Target table name is automatically generated based on the source table name and target table matching strategy.
    2. Target table fields default to being consistent with the source table name.
    3. For target data sources that have multiple data models (e.g., doris), the system will designate the appropriate model according to the default policy. You can manually modify the DDL statements to fit business characteristics.
    
    
    
    3. Creating target tables in batches
    After confirming the creation rules for the target table, you can create the target tables all at once in this step.
    After successful creation, you can click Finish to close the pop-up window and continue configuring subsequent tasks.
    If there are tables that failed to be created, you can check the reason for the failure or click Retry to recreate.

    Step 6: Configure Runtime Resources and Policies

    This step mainly involves configuring resources for the task, DDL and anomaly data handling policies, as well as advanced running parameters, etc.
    1. Integrated Resource Configuration
    Integrated resources support multiple allocation methods
    Fixed Allocation: In this mode, tasks do not differentiate between synchronization phases. A fixed amount of resources is allocated to the current task throughout the full and incremental synchronization process. This method prevents resource preemption between tasks and is suitable for scenarios where data may undergo significant changes during task execution.
    Allocate by Synchronization Phase: Resources are allocated according to the planned usage for full and incremental synchronization phases to save overall resource usage.
    Associate the current task with the corresponding integrated resource group, while setting the runtime JM, TM specifications, and task parallelism. Here, the actual number of CUs occupied during task runtime = JobManager specification + TaskManager specification × parallelism.
    
    
    
    2. Message Processing Strategy:
    Configure the current task with write DDL message response policies, metadata write policies, data anomaly write policies, dirty data archiving policies, etc.
    Note:
    The configuration items here vary depending on the type of target data source. Please refer to the actual configuration interface of each link.
    
    
    
    Parameter
    Description
    DDL Message
    This parameter is mainly used to set how the DDL change messages captured from the source end during the task synchronization process are passed to the downstream and how the downstream responds to these messages. The target end provides the following response strategies for DDL messages:
    1. Automatic Change: Under this strategy, for the messages captured from the source end, the target end will automatically follow the structural changes of the source end, including automatic table creation, automatic column addition, etc.
    2. Ignore Changes: Under this strategy, the target end will ignore the DDL change messages and will not respond or notify any message.
    3. Log Alerts: Under this strategy, the target end ignores DDL change messages, but the logs will include details of the DDL change messages.
    4. Task Error: Under this strategy, if a DDL change occurs at the source end, the entire task will experience an error, continuously restart, and report errors.
    Note:
    Different source and target ends support varying DDL types and message processing. Please refer to the configuration strategy supported by different links.
    Metadata Writing
    When the new table in the DDL strategy is set to Automatic Table Creation, you can choose whether to write metadata. Selecting the corresponding metadata fields will create the corresponding metadata fields in the target table and write the corresponding system metadata during synchronization.
    
    
    
    Write Exceptions
    This parameter is used to set how the task handles exception data writing when data writing fails due to various reasons such as mismatch in table segment structure, field type mismatch, etc., and whether to interrupt the data flow. The overall write exception strategy includes:
    1. Partial Stop: If some tables have write exceptions, only stop writing data for that table, other tables synchronize normally. The stopped tables cannot resume writing in this task run.
    2. Abnormal restart: If some tables have write exceptions, all tables stop writing. Under this strategy, the task will continuously restart until all tables are synchronized normally, which may cause duplicate data writing for some tables during the restart period.
    3. Ignore Exception: Ignore the exception data that cannot be written in the table and mark it as dirty data. Other data in that table, and other tables in the task, synchronize normally. Dirty data offers two schemes: COS archiving and Do Not Archive.
    Dirty Data
    When the configuration for write exceptions is set to Ignore Exceptions, you can choose whether to archive the ignored data:
    1. COS Archiving: Uniformly archive the exception data into a COS file, this method can prevent loss of exception data, facilitate subsequent analysis of reasons for exception writing, and data recovery.
    2. Do Not Archive: The task completely ignores and discards the exception data.
    3. Task Execution Strategy:
    Set the submission interval, maximum restart attempts, and task-level execution parameters for the current task.
    
    Parameter
    Description
    Checkpoint Interval
    The maximum checkpoint interval for the current task submission.
    Maximum Restart Attempts
    Sets the maximum restart threshold for the task in case a fault occurs during execution. If the number of restarts during operation exceeds this threshold, the task status will be set to Failed. The setting range is [-1,100]. Where
    A threshold of 0 means no restart;
    -1 means no limit on the maximum number of restarts.
    Parameter
    Set task-level running parameters. There are differences in task-level parameters supported by different sources and destinations.

    Step Seven: Configuration Preview and Task Submission

    1. Configuration Preview
    
    
    
    Serial number
    Parameter
    Description
    1
    Submit
    Submit the current task to the production environment. When submitting, different running strategies can be chosen depending on whether there is a production task for the current task:
    If there are no effective online tasks for the current task, either because it's the first submission or the online task is in a "Failed" state, it can be submitted directly.
    If there are online tasks in a "Running" or "Suspension" state, different strategies must be chosen. Stopping an online job will discard the previous task runtime position and start consuming data from the beginning, while keeping the job status will continue running from the last consumed point after a restart.
    
    
    
    Note:
    Click Start Now to have the task run immediately after submission, otherwise, it needs to be manually triggered to run formally.
    2
    Lock/Unlock
    By default, the creator is the first lock holder, allowing only the lock holder to edit task configurations and run tasks. If the lock holder does not make an edit operation within 5 minutes, others can click the icon to grab the lock, and successful lock grabbing allows for editing operations.
    3
    Go to Operations
    Quickly jump to the Task Operation and Maintenance Page based on the current task name.
    4
    Save
    After completing the preview, click the Save button to save the whole database task configuration. The task will not be submitted to the Operations Center if only saved
    2. Task Configuration Detection and Submission.
    Steps
    Step Instructions
    
    Task Configuration Detection
    This step involves detection of the read end, write end, and resources within the task:
    Detection passed: Configuration Correct.
    Detection failed: Configuration issues exist, requiring repairs for subsequent configuration.
    Detection and Alert: This detection offers system recommended modifications. After modifications, you can click Retry for re-detection; or, you can click Ignore Exception to proceed to the Next step without blocking subsequent configuration.
    
    The currently supported detection items are listed in the subsequent table.
    
    
    
    Submission Strategy Selection
    In this step, you can choose the submission strategy for this task:
    First Submission: The initial submission supports synchronizing data from a default or specified point.
    Start immediately, syncing from the default position: If the source is configured for "full + incremental" read mode, it will first sync the existing data (full phase) by default, and then proceed to consume the binlog to obtain the changed data (incremental phase); if the source is set to "incremental only" read mode, it will start reading from the latest position in the binlog by default.
    Start immediately, syncing from a specified point in time: The task will sync the data according to the configured time and timezone. If the specified time point is not found, the task will default to sync from the earliest point in the binlog; if the source's read method is "full + incremental", the task will default to skipping the full phase and start syncing from the specified time point in the incremental phase.
    Not Starting Now: The task will not start immediately after submission, and can be manually started later from the Operations and Maintenance list.
    Not the First Submission: Supports starting or continuing tasks with a running status
    Continue Running: Under this policy, after a new version of the task is submitted, it will continue running from the last synchronization position.
    Restart from a specified location: With this strategy, you can specify the read start location. The task will ignore the old version and start reading from the specified location. If the specified time location is not found, the task will default to synchronizing from the earliest binlog location.
    Restart, running from the default position: Under this policy, the system will start reading from the default position according to the source configuration. If the source is configured for "full + incremental" read mode, it will first sync the existing data (full phase) by default, and then it can consume the binlog to get the changed data (incremental phase); if the source is set to "incremental only" read mode, it will start reading from the latest position in the binlog by default.
    The submission and execution strategies supported by different task statuses vary. See the subsequent table for details.
    
    Additionally, each submission will generate a new real-time task version, and you can configure the version description in the dialog.
    
    
    
    Submitting the job
    After a successful submission, you can click Go to Operations and Maintenance to check the task execution status.
    
    
    
    Detection items supported by task configuration detection:
    Detection Classification
    Check Items
    Description
    Task Configuration Detection
    Source Configuration
    Check whether mandatory items in the source configuration are missing
    Destination Configuration
    Check whether mandatory items in the destination configuration are missing
    Mapping Relationship Configuration
    Check whether field mapping has been configured
    Resource Group Configuration
    Check whether the resource group is configured
    Data Source Detection
    Source Connectivity Detection
    Check whether the source data source and the task configuration resource group have network connectivity. If the detection fails, you can view the diagnostic information. After resolving the network issue, you can recheck. Otherwise, the task is likely to fail.
    Destination Connectivity Detection
    Check whether the destination data source and the task configuration resource group have network connectivity. If the detection fails, you can view the diagnostic information. After resolving the network issue, you can recheck. Otherwise, the task is likely to fail.
    Resource Detection
    Resource Status Detection
    Check whether the resource group is in an available status. If the resource status is unavailable, please replace the task configuration resource group. Otherwise, the task is likely to fail.
    Resource Margin Detection
    Check whether the current remaining resources in the resource group meet the task configuration resource requirements. If the detection fails, please appropriately reduce the task resource configuration or expand the resource group.
    Different Task Running States and Supported Submission Strategies:
    Task Status
    Submission Strategies
    Description
    1. First Submission
    2. Stopped/Detect Anomalies/Initialization (Not the First Submission)
    Start Now, Synchronize from Default Position
    Under this policy, reading will start from the default position based on the source configuration. If the source is configured to "full + incremental" reading, it will by default first synchronize the existing data (full phase), and after completion, consume binlog to obtain changed data (incremental phase); if the source is set to "incremental only", it will by default start reading from the latest position of binlog.
    Start Now, Synchronize from Designated Time Point
    Under this strategy, a specific start time must be selected to match the position by time.
    1. Reads data from the designated time point. If the specific position is not matched, the task will default to synchronizing from the earliest binlog position
    2. If your source reading method is Full + Incremental, selecting this strategy will skip the Full phase and start synchronizing from the designated incremental time point
    Do Not Start Yet, Manually Start the Task Later in Real-Time Task Operation and Maintenance
    Under this strategy, the task will only be submitted to real-time operation and maintenance without starting the task. Subsequent batch task starts can be performed from the real-time operation and maintenance page.
    Running (not the first submission)
    Continue Running, Retain Job Status Data, Continue Running from Last Synchronization Position
    Under this strategy, the new version of the task, once submitted, will continue running from the last synchronization position.
    Restart from a specified time point and continue running
    Under this strategy, you can specify the read start location. The task will ignore the old version and start reading from the specified location. If the designated time location is not found, the task will default to synchronizing from the earliest binlog location.
    Restart and stop the running task, discarding its state, and start running from the default location
    This policy will stop the currently running task and discard the task state, then start reading from the default point according to the source end configuration. If the source end is configured for "Full + Increment" reading mode, it will first synchronize the existing data (full stage). Once completed, it will consume binlog to get changed data (increment stage). If the source end is configured for "Increment Only" reading, it will start reading from the latest binlog point by default.
    Paused (not the first submission)
    Continue Running, Retain Job Status Data, Continue Running from Last Synchronization Position
    Under this strategy, the new version of the task, once submitted, will continue running from the last synchronization position.
    Note:
    Pausing operation will create a snapshot, and the task can be resubmitted to continue running from the last checkpoint.
    Forced Pause does not generate a snapshot. When the task is resubmitted, it will continue from the last snapshot taken during task operation. This type of pause may cause partial data replay. If the target write is Append, there will be duplicate data. If the target write is Upsert, there will be no duplication.
    Restart from a specified time point and continue running
    Under this strategy, you can specify the read start location. The task will ignore the old version and start reading from the specified location. If the designated time location is not found, the task will default to synchronizing from the earliest binlog location.
    Restart and stop the running task, discarding its state, and start running from the default location
    This policy will stop the currently running task and discard the task state, then start reading from the default point according to the source end configuration. If the source end is configured for "Full + Increment" reading mode, it will first synchronize the existing data (full stage). Once completed, it will consume binlog to get changed data (increment stage). If the source end is configured for "Increment Only" reading, it will start reading from the latest binlog point by default.
    Failed (not the first submission)
    Resume operation from the last failed checkpoint
    This policy will continue running from the point where the task last failed
    Restart and run from the default point according to the task reading configuration
    This policy will read from the default point according to the source end configuration. If the source end is configured for "Full + Increment" reading mode, it will first synchronize the existing data (full stage). Once completed, it will consume binlog to get changed data (increment stage). If the source end is configured for "Increment Only" reading, it will start reading from the latest binlog point by default.
    In Progress (not the first submission)
    Not supported
    When there is an online task with the same name and its status is in progress, resubmitting the task is not supported

    Subsequent Steps

    After completing the task configuration, you can perform operation and monitoring alarms for the created tasks, such as configuring task monitoring and alerts, and viewing key indicators of task execution.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support