tencent cloud

Feedback

Creating Single Task

Last updated: 2024-11-01 17:35:28

    Notes

    
    
    
    1. This page displays all offline synchronization tasks created from the Data Integration page
    2. If you only need to run an offline sync task once or periodically, you can configure scheduling in Data Integration and submit the task to Data Integration's Operations Center.
    3. If you need to orchestrate with other tasks or configure task dependencies, it is recommended to create an offline synchronization task in the orchestration space of Data Development, or after creating the task in Data Integration, import it into the orchestration space using the Data Development's Import Task (Data Integration) feature.
    Note:
    Tasks already submitted in Data Integration cannot be imported into the Orchestration Space of Data Development, only unsubmitted tasks can be imported.

    Background

    Single Table Synchronization uses fixed field synchronization, which only synchronizes the source field data specified in the task configuration's mapping relationships to the target end. Single table tasks support canvas, form, and script configuration modes, covering data sources like MySQL, Hive, DLC, Doris, etc.

    Conditions and Restrictions

    1. The source and target data sources have been configured for subsequent tasks.
    2. The Data Integration Resource Group has been purchased.
    3. Network connectivity between the Data Integration Resource Group and the data source has been completed.
    4. The data source environment preparation has been completed. Based on the sync configuration you need, grant the data source configuration account the necessary operation permissions in the database before executing the sync task.
    5. If the database account configured for the data source lacks read and write permissions, it will cause the task to fail. Please configure an account with appropriate permissions according to the actual read and write scenarios.

    Operation step

    Step One: Create a new offline synchronization task and select the configuration mode

    On the Data Integration page, click Configuration Center > Offline Synchronization to enter the synchronization task list. In the pop-up, configure the basic task information, and after clicking Confirm, you can enter the task configuration page.
    
    
    
    Parameter
    Description
    Task Name
    Required fields.
    Task Mode
    Form Mode: Provides only read and write nodes, suitable for fixed field synchronization from single table to single table. Applicable to ODS layer data synchronization that does not require data cleaning.
    Canvas Mode: Provides read, write, and conversion nodes. Suitable for data linkages involving cleaning, and many-to-many data link.
    Script Mode: Supports the script mode configuration page for initialization, allowing users to select different data sources and data destinations, and displaying the corresponding script templates:
    Users need to select the data source and destination first, and editing is not allowed in the unselected state.
    After selection, the corresponding script module is displayed.
    In the script, users can manually write parameters such as data source, connection information.
    Supports writing SQL statements in the script, putting the query SQL into the connection.
    
    
    
    Description
    Optional fields.
    Note:
    Script mode currently supports the following data sources:
    Read: MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, Dameng DM, SAP HANA, SyBase, Doris, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Impala, Gbase, TBase, MongoDB, COS, FTP, SFTP, REST API, Elasticsearch, Kafka, Iceberg, StarRocks, Graph Database.
    Write: MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, Dameng DM, SAP HANA, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Gbase, TBase, Impala, COS, FTP, SFTP, Elasticsearch, Redis, Mongodb, Kafka, Iceberg, Doris, StarRocks, Graph Database.

    Step 2: Data node configuration

    Configure read node

    Read node configuration includes basic information, data source, and data fields.
    Basic Information The node name cannot be empty, and there cannot be data nodes with the same name within a single task.
    Data Source Configure the database table object to be read and the synchronization method, etc.
    Data Fields Based on the configured data table object, the system supports two methods: default pulling of field metadata information and manual field configuration.
    Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
    Manual Configuration: File data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo) do not support automatic metadata pulling. You can clickAdd Field/Batch Add to manually add field names and types. The read node additionally supports configuring time parameters and constants.
    Batch adding
    
    Parameter
    Description
    Data Type
    Current node's data source type.
    Adding Method
    Additional Fields: Append the newly parsed fields after the original fields of the table.
    Overwrite Existing Fields: The newly parsed fields overwrite the original field information of the current source table.
    Field Acquisition
    Text parsing: Parse based on the text content.
    JSON parsing: Input JSON content and quickly parse the content based on key/value, such as {"age":10,"name":"demo"}.
    Fetch from Homogeneous Table: Specify a table object from a data source and parse its fields.
    Text to be Parsed
    Separator
    Used to separate field names and types, supports tab, |, space, such as "age|int".
    Quick Filling of Field Type
    Common field types, supports constants, functions, variables, string, boolean, date, datetime, timestamp, time, double, float, tinyint, smallint, tinyint unsigned, int, mediumint, smallint unsigned, bigint, int unsigned, bigint unsigned, double precision, tinyint(1), char, varchar, text, varbinary, blob.
    Parse Data
    Parse input content.
    Preview
    Batch Delete
    After selecting the preview list, batch delete the parsing results.
    Field name
    Field name.
    Type
    Field Type.
    Note
    Time Parameter Field: Only the read node of offline tasks supports configuring the time parameter field, commonly used to write the instance run time value into the primary or multi-level partition of a table.
    Constant Fields: Only reading nodes support configuring constant fields. Constant fields can be used to write a certain constant value into the target table when the number of fields in the source and target tables do not match.

    Configure Transformation Node

    Transition node configuration includes basic information, conversion rules, and data fields. The transition node must be downstream of the reading node. After creating and connecting to the reading node, the system will automatically retrieve the field information from the upstream node and complete the data conversion based on the conversion rules.
    Basic Information Configure node name information. The node name cannot be empty, and there cannot be data nodes with the same name within a single task.
    Conversion Rules Configure field or data level conversion rules, where field information is inherited from the upstream node. After connecting to the upstream node, the system will automatically retrieve the field information from the upstream node.
    Data Fields By default, pull all data fields from the upstream node for subsequent mapping in the writing node.

    Configure Writing Node

    Writing node configuration includes basic information, data source, data fields, and field mapping. The writing node will write the upstream data content into the target object based on the connection relationship.
    Basic Information The node name cannot be empty, and there cannot be data nodes with the same name within a single task.
    Data Source Configure the database table object to be read and the synchronization method, etc.
    Data Fields Based on the configured data table object, the system supports two methods: default pulling of field metadata information and manual field configuration.
    Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
    Manual Configuration: For file data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo), the system does not support automatic metadata pulling. You can click Field Configuration to add field names and types manually.
    Field Mapping The write node requires an additional configuration of the field mapping relationship compared to the read node. The purpose of the field mapping relationship is to specify the source of the target field content by connections, supporting same name mapping, same row mapping, and manual connection as three ways to configure the relationship between the source and the target node.
    
    
    
    Parameter
    Description
    Mapping by the same name
    Establish a mapping relationship between the source table field and the target table field with the same field name.
    Peer Mapping
    Establish a mapping relationship between the source table field and the target table field in the same row number.
    Clear Mappings
    Clear the established mapping relationship between the source table field and the target table field.
    Pinning Mapped
    Pin to the top and format the display of already established mapping relationships; this formatting does not affect the actual storage field order of the table, it is only used for frontend optimization display.
    Manual connection mapping
    Supports manually establishing the mapping relationship between source table fields and target table fields through connections.
    Source Table
    Source Table Field Name
    The name of the source table field.
    Type
    The type of the source table field.
    Mapping
    Quickly create mapping.
    
    
    
    Target Table
    Field Name in Target Table
    The name of the target table field.
    Type
    The type of the target table field.
    Note
    The prerequisite for configuring field mapping is that the current write node has a connected source (either a read node or a transformation node).
    The content of target fields without a configured mapping relationship will be empty or remain unchanged.
    If the source field type cannot be converted to the target field type, it may lead to task failure.

    Step Three: Offline Task Attribute Configuration

    Offline task attribute configuration includesTask Attributes, Task Scheduling, and Resource Configuration:

    Task Attribute

    
    
    
    Set basic task attributes, use resources, and data link channel information.
    Category
    Parameter
    Description
    Basic Attributes
    Task Name/Type
    Displays the basic information of current task name and type.
    Owner
    Name of one or more space members responsible for this task, by default, the task creator.
    Description
    Displays the remark information of the current task.
    Scheduling Parameters
    Scheduling parameters are used during task scheduling. They will be automatically replaced according to the business time of task scheduling and the value format of scheduling parameters, realizing dynamic value retrieval within the task scheduling time
    Resource configuration
    Integrated Resource Group
    Specifies the name of the Integrated Resource Group used by the current task. A task can only be bound to one resource group.
    Task Running Policy
    Associated Alarms
    Supports associating alert rules with the current task
    Channel Settings
    Dirty Data Threshold
    Dirty data refers to data that fails to write during synchronization. The dirty data threshold refers to the maximum number of dirty data entries or byte count that can be tolerated during synchronization. If this threshold is exceeded, the task will automatically end. The default threshold is 0, meaning dirty data is not tolerated.
    Concurrent Number
    The maximum number of concurrent operations expected during actual execution. Due to resources, data source types, and task optimization results, the actual number of concurrent operations may be less than or equal to this value. The larger this value is, the more pre-allocated execution machine resources.
    Note:
    When concurrency >1, if the source data supports setting a splitting key, the splitting key is mandatory; otherwise, the set concurrency value will not take effect.
    Sync Rate Limit
    Limit the synchronization rate by traffic or number of records to protect the read and write pressure on the data source endpoint or data destination endpoint. This value is the maximum operating rate, with the default -1 indicating no rate limit.

    Task Scheduling

    
    
    
    Set the current task cyclic operation plan, including scheduling time and dependency attributes.
    Category
    Parameter
    Description
    Scheduling Time
    Scheduling Method
    Periodic Scheduling: The task runs cyclically according to the scheduled plan.
    One-time Execution: The task runs only once at the specified time.
    Effective Date
    The valid time period for scheduling time configuration. The system will automatically schedule within this time range according to the time configuration, and will no longer automatically schedule after the validity period.
    Scheduling Cycle
    Interval step unit for scheduling plans, supports Year, Month, Weekly, Days, Hour, Minutes:
    Minutes: Requires specifying specific execution start time and interval. The task starts from the execution minute of every hour and runs cyclically according to the time interval. For example, if the execution time is from 02:00 to 23:59, with an interval of 5 minutes, then the task will run one instance every 5 minutes starting from 02:00.
    Hour: Requires specifying specific execution start and end times and interval. For example, if the execution time is from 02:20 to 05:00, with an interval of 1 hour, then the task will run at 02:20, 03:20, and 04:20 respectively.
    Days: Requires specifying the specific execution time each day. The task will run only at that time every day.
    Weekly: Requires specifying the day(s) of the week for a fixed run (multiple selections supported) and the time. The task runs only at the specified time on the designated day.
    Month: Specify the fixed monthly run number and time. If the end of the month is selected, the last day of each month will be used for the run.
    Year: Specify the fixed annual run date and time.
    Dependency Attributes
    Self-Dependency
    Self-Dependency refers to the dependency relationship between different instances within the same task:
    Ordered Serial: The current instance depends on the status of the previous cycle instance.
    Unordered Serial: The current instance and the previous cycle instance have no dependency relationship. If a task has multiple instances at the same time, the system randomly selects an instance to run. Only one instance is in a running state at the same time.
    Parallel: There is no dependency relationship between the previous and subsequent cycle instances. If a task has multiple instances at the same time, multiple instances will run simultaneously.
    Retry Wait Time
    Maximum waiting time interval for each retry after an instance fails. If the instance has not been retried after exceeding this value, it will be marked as failed.
    Number of Retries
    Maximum number of retries after an instance fails. If this value is exceeded, the task will be marked as failed.

    Step Four: Task test run and submission

    Offline Synchronization Task can perform online test runs or submit to the production scheduling environment after configuration. The Task Configuration Page supports saving, submitting, test running, stopping debugging, locking/unlocking, and proceeding to operation and maintenance operations.
    
    
    
    Number
    Parameter
    Description
    1
    Save
    Saves the current task configuration information, including data node configuration, node connections, task attributes, and task scheduling configurations.
    2
    Submit
    Submit the current task to the production environment. After submission, the task will run periodically according to the scheduling attributes, and submitting the task will generate task and instance records under Task Operation and Maintenance > Offline Maintenance.
    Note:
    The task will default to saving the latest configuration before submission.
    Before submission, the task will undergo a necessity check, including task node configuration, task connection, resource groups, etc. If the necessity check fails, the task submission will fail and a prompt will appear.
    3
    Test Run
    Debug runs the current task.
    4
    Debug Stop
    Terminate the task currently undergoing a test run.
    5
    Lock/Unlock
    The default creator is the first lock holder, and only the lock holder is allowed to edit the task configuration and run the task. If the locker does not perform any editing actions within 5 minutes, others can click the icon to grab the lock, and if successful, they can edit.
    6
    Go to Operations
    Quickly jump to the Task Operation and Maintenance Page based on the current task name.
    7
    Canvas/Form Conversion
    Supports conversion to Canvas/Form mode.
    Note:
    Canvas and Form modes support mutual conversion. When there are transition nodes in Canvas mode, conversion to Form mode is not supported.
    Script mode conversion to Canvas/Form mode is not supported.
    8
    Script Conversion
    Supports conversion from Canvas/Form mode to script mode. After conversion, it does not support reverting back to Canvas/Form mode.

    Task Submission Detection

    
    
    
    Parameter
    Description
    Detection found exceptions
    Support skipping exceptions to submit directly, or terminate submission.
    Detection only found warnings or less
    Can be submitted directly.

    Submit results

    
    Task Submitting:
    Display Submission Progress Percentage.
    Remind users not to refresh/close the page, with the message: The current task has been successfully submitted, you can go to OPS to check the task status and manage data.
    Task submission result - Successful:
    Display Task Submission Successful Result.
    Indicate success and subsequent redirect: "Submission successful, will redirect to the current task Operation and Maintenance detail page in 10 seconds" "The current task has been successfully submitted, you can go to Operation and Maintenance to check the task status and manage data".
    Show Task Submission Failure Reason:
    Return Failure Reason.

    Subsequent Steps

    After completing the task configuration, you can perform operation and monitoring alarms for the created tasks, such as configuring task monitoring and alerts, and viewing key indicators of task execution.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support