tencent cloud

All product documents
Tencent Cloud WeData
Creating Single Task
Last updated: 2024-11-01 17:35:28
Creating Single Task
Last updated: 2024-11-01 17:35:28

Notes




1. This page displays all offline synchronization tasks created from the Data Integration page
2. If you only need to run an offline sync task once or periodically, you can configure scheduling in Data Integration and submit the task to Data Integration's Operations Center.
3. If you need to orchestrate with other tasks or configure task dependencies, it is recommended to create an offline synchronization task in the orchestration space of Data Development, or after creating the task in Data Integration, import it into the orchestration space using the Data Development's Import Task (Data Integration) feature.
Note:
Tasks already submitted in Data Integration cannot be imported into the Orchestration Space of Data Development, only unsubmitted tasks can be imported.

Background

Single Table Synchronization uses fixed field synchronization, which only synchronizes the source field data specified in the task configuration's mapping relationships to the target end. Single table tasks support canvas, form, and script configuration modes, covering data sources like MySQL, Hive, DLC, Doris, etc.

Conditions and Restrictions

1. The source and target data sources have been configured for subsequent tasks.
2. The Data Integration Resource Group has been purchased.
3. Network connectivity between the Data Integration Resource Group and the data source has been completed.
4. The data source environment preparation has been completed. Based on the sync configuration you need, grant the data source configuration account the necessary operation permissions in the database before executing the sync task.
5. If the database account configured for the data source lacks read and write permissions, it will cause the task to fail. Please configure an account with appropriate permissions according to the actual read and write scenarios.

Operation step

Step One: Create a new offline synchronization task and select the configuration mode

On the Data Integration page, click Configuration Center > Offline Synchronization to enter the synchronization task list. In the pop-up, configure the basic task information, and after clicking Confirm, you can enter the task configuration page.



Parameter
Description
Task Name
Required fields.
Task Mode
Form Mode: Provides only read and write nodes, suitable for fixed field synchronization from single table to single table. Applicable to ODS layer data synchronization that does not require data cleaning.
Canvas Mode: Provides read, write, and conversion nodes. Suitable for data linkages involving cleaning, and many-to-many data link.
Script Mode: Supports the script mode configuration page for initialization, allowing users to select different data sources and data destinations, and displaying the corresponding script templates:
Users need to select the data source and destination first, and editing is not allowed in the unselected state.
After selection, the corresponding script module is displayed.
In the script, users can manually write parameters such as data source, connection information.
Supports writing SQL statements in the script, putting the query SQL into the connection.



Description
Optional fields.
Note:
Script mode currently supports the following data sources:
Read: MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, Dameng DM, SAP HANA, SyBase, Doris, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Impala, Gbase, TBase, MongoDB, COS, FTP, SFTP, REST API, Elasticsearch, Kafka, Iceberg, StarRocks, Graph Database.
Write: MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, Dameng DM, SAP HANA, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Gbase, TBase, Impala, COS, FTP, SFTP, Elasticsearch, Redis, Mongodb, Kafka, Iceberg, Doris, StarRocks, Graph Database.

Step 2: Data node configuration

Configure read node

Read node configuration includes basic information, data source, and data fields.
Basic Information The node name cannot be empty, and there cannot be data nodes with the same name within a single task.
Data Source Configure the database table object to be read and the synchronization method, etc.
Data Fields Based on the configured data table object, the system supports two methods: default pulling of field metadata information and manual field configuration.
Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
Manual Configuration: File data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo) do not support automatic metadata pulling. You can clickAdd Field/Batch Add to manually add field names and types. The read node additionally supports configuring time parameters and constants.
Batch adding

Parameter
Description
Data Type
Current node's data source type.
Adding Method
Additional Fields: Append the newly parsed fields after the original fields of the table.
Overwrite Existing Fields: The newly parsed fields overwrite the original field information of the current source table.
Field Acquisition
Text parsing: Parse based on the text content.
JSON parsing: Input JSON content and quickly parse the content based on key/value, such as {"age":10,"name":"demo"}.
Fetch from Homogeneous Table: Specify a table object from a data source and parse its fields.
Text to be Parsed
Separator
Used to separate field names and types, supports tab, |, space, such as "age|int".
Quick Filling of Field Type
Common field types, supports constants, functions, variables, string, boolean, date, datetime, timestamp, time, double, float, tinyint, smallint, tinyint unsigned, int, mediumint, smallint unsigned, bigint, int unsigned, bigint unsigned, double precision, tinyint(1), char, varchar, text, varbinary, blob.
Parse Data
Parse input content.
Preview
Batch Delete
After selecting the preview list, batch delete the parsing results.
Field name
Field name.
Type
Field Type.
Note
Time Parameter Field: Only the read node of offline tasks supports configuring the time parameter field, commonly used to write the instance run time value into the primary or multi-level partition of a table.
Constant Fields: Only reading nodes support configuring constant fields. Constant fields can be used to write a certain constant value into the target table when the number of fields in the source and target tables do not match.

Configure Transformation Node

Transition node configuration includes basic information, conversion rules, and data fields. The transition node must be downstream of the reading node. After creating and connecting to the reading node, the system will automatically retrieve the field information from the upstream node and complete the data conversion based on the conversion rules.
Basic Information Configure node name information. The node name cannot be empty, and there cannot be data nodes with the same name within a single task.
Conversion Rules Configure field or data level conversion rules, where field information is inherited from the upstream node. After connecting to the upstream node, the system will automatically retrieve the field information from the upstream node.
Data Fields By default, pull all data fields from the upstream node for subsequent mapping in the writing node.

Configure Writing Node

Writing node configuration includes basic information, data source, data fields, and field mapping. The writing node will write the upstream data content into the target object based on the connection relationship.
Basic Information The node name cannot be empty, and there cannot be data nodes with the same name within a single task.
Data Source Configure the database table object to be read and the synchronization method, etc.
Data Fields Based on the configured data table object, the system supports two methods: default pulling of field metadata information and manual field configuration.
Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
Manual Configuration: For file data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo), the system does not support automatic metadata pulling. You can click Field Configuration to add field names and types manually.
Field Mapping The write node requires an additional configuration of the field mapping relationship compared to the read node. The purpose of the field mapping relationship is to specify the source of the target field content by connections, supporting same name mapping, same row mapping, and manual connection as three ways to configure the relationship between the source and the target node.



Parameter
Description
Mapping by the same name
Establish a mapping relationship between the source table field and the target table field with the same field name.
Peer Mapping
Establish a mapping relationship between the source table field and the target table field in the same row number.
Clear Mappings
Clear the established mapping relationship between the source table field and the target table field.
Pinning Mapped
Pin to the top and format the display of already established mapping relationships; this formatting does not affect the actual storage field order of the table, it is only used for frontend optimization display.
Manual connection mapping
Supports manually establishing the mapping relationship between source table fields and target table fields through connections.
Source Table
Source Table Field Name
The name of the source table field.
Type
The type of the source table field.
Mapping
Quickly create mapping.



Target Table
Field Name in Target Table
The name of the target table field.
Type
The type of the target table field.
Note
The prerequisite for configuring field mapping is that the current write node has a connected source (either a read node or a transformation node).
The content of target fields without a configured mapping relationship will be empty or remain unchanged.
If the source field type cannot be converted to the target field type, it may lead to task failure.

Step Three: Offline Task Attribute Configuration

Offline task attribute configuration includesTask Attributes, Task Scheduling, and Resource Configuration:

Task Attribute




Set basic task attributes, use resources, and data link channel information.
Category
Parameter
Description
Basic Attributes
Task Name/Type
Displays the basic information of current task name and type.
Owner
Name of one or more space members responsible for this task, by default, the task creator.
Description
Displays the remark information of the current task.
Scheduling Parameters
Scheduling parameters are used during task scheduling. They will be automatically replaced according to the business time of task scheduling and the value format of scheduling parameters, realizing dynamic value retrieval within the task scheduling time
Resource configuration
Integrated Resource Group
Specifies the name of the Integrated Resource Group used by the current task. A task can only be bound to one resource group.
Task Running Policy
Associated Alarms
Supports associating alert rules with the current task
Channel Settings
Dirty Data Threshold
Dirty data refers to data that fails to write during synchronization. The dirty data threshold refers to the maximum number of dirty data entries or byte count that can be tolerated during synchronization. If this threshold is exceeded, the task will automatically end. The default threshold is 0, meaning dirty data is not tolerated.
Concurrent Number
The maximum number of concurrent operations expected during actual execution. Due to resources, data source types, and task optimization results, the actual number of concurrent operations may be less than or equal to this value. The larger this value is, the more pre-allocated execution machine resources.
Note:
When concurrency >1, if the source data supports setting a splitting key, the splitting key is mandatory; otherwise, the set concurrency value will not take effect.
Sync Rate Limit
Limit the synchronization rate by traffic or number of records to protect the read and write pressure on the data source endpoint or data destination endpoint. This value is the maximum operating rate, with the default -1 indicating no rate limit.

Task Scheduling




Set the current task cyclic operation plan, including scheduling time and dependency attributes.
Category
Parameter
Description
Scheduling Time
Scheduling Method
Periodic Scheduling: The task runs cyclically according to the scheduled plan.
One-time Execution: The task runs only once at the specified time.
Effective Date
The valid time period for scheduling time configuration. The system will automatically schedule within this time range according to the time configuration, and will no longer automatically schedule after the validity period.
Scheduling Cycle
Interval step unit for scheduling plans, supports Year, Month, Weekly, Days, Hour, Minutes:
Minutes: Requires specifying specific execution start time and interval. The task starts from the execution minute of every hour and runs cyclically according to the time interval. For example, if the execution time is from 02:00 to 23:59, with an interval of 5 minutes, then the task will run one instance every 5 minutes starting from 02:00.
Hour: Requires specifying specific execution start and end times and interval. For example, if the execution time is from 02:20 to 05:00, with an interval of 1 hour, then the task will run at 02:20, 03:20, and 04:20 respectively.
Days: Requires specifying the specific execution time each day. The task will run only at that time every day.
Weekly: Requires specifying the day(s) of the week for a fixed run (multiple selections supported) and the time. The task runs only at the specified time on the designated day.
Month: Specify the fixed monthly run number and time. If the end of the month is selected, the last day of each month will be used for the run.
Year: Specify the fixed annual run date and time.
Dependency Attributes
Self-Dependency
Self-Dependency refers to the dependency relationship between different instances within the same task:
Ordered Serial: The current instance depends on the status of the previous cycle instance.
Unordered Serial: The current instance and the previous cycle instance have no dependency relationship. If a task has multiple instances at the same time, the system randomly selects an instance to run. Only one instance is in a running state at the same time.
Parallel: There is no dependency relationship between the previous and subsequent cycle instances. If a task has multiple instances at the same time, multiple instances will run simultaneously.
Retry Wait Time
Maximum waiting time interval for each retry after an instance fails. If the instance has not been retried after exceeding this value, it will be marked as failed.
Number of Retries
Maximum number of retries after an instance fails. If this value is exceeded, the task will be marked as failed.

Step Four: Task test run and submission

Offline Synchronization Task can perform online test runs or submit to the production scheduling environment after configuration. The Task Configuration Page supports saving, submitting, test running, stopping debugging, locking/unlocking, and proceeding to operation and maintenance operations.



Number
Parameter
Description
1
Save
Saves the current task configuration information, including data node configuration, node connections, task attributes, and task scheduling configurations.
2
Submit
Submit the current task to the production environment. After submission, the task will run periodically according to the scheduling attributes, and submitting the task will generate task and instance records under Task Operation and Maintenance > Offline Maintenance.
Note:
The task will default to saving the latest configuration before submission.
Before submission, the task will undergo a necessity check, including task node configuration, task connection, resource groups, etc. If the necessity check fails, the task submission will fail and a prompt will appear.
3
Test Run
Debug runs the current task.
4
Debug Stop
Terminate the task currently undergoing a test run.
5
Lock/Unlock
The default creator is the first lock holder, and only the lock holder is allowed to edit the task configuration and run the task. If the locker does not perform any editing actions within 5 minutes, others can click the icon to grab the lock, and if successful, they can edit.
6
Go to Operations
Quickly jump to the Task Operation and Maintenance Page based on the current task name.
7
Canvas/Form Conversion
Supports conversion to Canvas/Form mode.
Note:
Canvas and Form modes support mutual conversion. When there are transition nodes in Canvas mode, conversion to Form mode is not supported.
Script mode conversion to Canvas/Form mode is not supported.
8
Script Conversion
Supports conversion from Canvas/Form mode to script mode. After conversion, it does not support reverting back to Canvas/Form mode.

Task Submission Detection




Parameter
Description
Detection found exceptions
Support skipping exceptions to submit directly, or terminate submission.
Detection only found warnings or less
Can be submitted directly.

Submit results


Task Submitting:
Display Submission Progress Percentage.
Remind users not to refresh/close the page, with the message: The current task has been successfully submitted, you can go to OPS to check the task status and manage data.
Task submission result - Successful:
Display Task Submission Successful Result.
Indicate success and subsequent redirect: "Submission successful, will redirect to the current task Operation and Maintenance detail page in 10 seconds" "The current task has been successfully submitted, you can go to Operation and Maintenance to check the task status and manage data".
Show Task Submission Failure Reason:
Return Failure Reason.

Subsequent Steps

After completing the task configuration, you can perform operation and monitoring alarms for the created tasks, such as configuring task monitoring and alerts, and viewing key indicators of task execution.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support