tencent cloud

All product documents
Tencent Cloud WeData
Creating Single Task
Last updated: 2024-11-01 17:32:23
Creating Single Task
Last updated: 2024-11-01 17:32:23
Note:
If offline synchronization tasks need to be orchestrated with other tasks or configured for task dependencies, it is recommended to create offline synchronization tasks in the data development orchestration space, or import the tasks into the orchestration space through the DataInLong import task (DataInLong) feature after creating them in DataInLong.
Tasks that have been submitted in DataInLong cannot be imported into the data development orchestration space, only tasks that have not been submitted can be imported.

Background

Single Table Synchronization uses fixed field synchronization, which only synchronizes the source field data specified in the task configuration's mapping relationships to the target end. Single table tasks support canvas, form, and script configuration modes, covering data sources like MySQL, Hive, DLC, Doris, etc.

Conditions and Restrictions

1. The source and target data sources have been configured for subsequent tasks.
2. The DataInLong Resource Group has been purchased.
3. Network connectivity between the DataInLong Resource Group and the data source has been completed.
4. The data source environment preparation has been completed. Based on the sync configuration you need, grant the data source configuration account the necessary operation permissions in the database before executing the sync task.
5. If the database account configured for the data source lacks read and write permissions, it will cause the task to fail. Please configure an account with appropriate permissions according to the actual read and write scenarios.

Operation step

Step One: Create a new offline synchronization task and select the configuration mode

1. On the DataInLong page, click Offline Development > Folder > Workflow > DataInLong > New Task > Offline Synchronization to access the synchronization task list.



2. Configure the basic task information in the pop-up window, click Confirm, and then you can enter the task configuration page.



Parameter
Description
Task Name
Required fields.
Task Type
Select Offline Synchronization.
Development Mode
Form Mode: Provides only read and write nodes, suitable for fixed field synchronization from single table to single table. Applicable to ODS layer data synchronization that does not require data cleaning.
Canvas Mode: Provides read, write, and conversion nodes. Suitable for data linkages involving cleaning, and many-to-many data link.
Script Mode: Supports initialization of the script mode configuration page, allowing users to select different data sources and targets, displaying the corresponding script template
Users need to select the data source and destination first, and editing is not allowed in the unselected state.
After selection, the corresponding script module is displayed.
In the script, users can manually write parameters such as data source, connection information.
Supports writing SQL statements in the script, putting the query SQL into the connection.

Loading…


Workflow Directory
Select an existing workflow.
Note:
Script mode currently supports the following data sources:
Read:MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, DM DB, SAP HANA, SyBase, Doris, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Impala, Gbase, TBase, MongoDB, COS, FTP, SFTP, REST API, Elasticsearch, Kafka, Iceberg, StarRocks, Graph Database.
Write:MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, DM DB, SAP HANA, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Gbase, TBase, Impala, COS, FTP, SFTP, Elasticsearch, Redis, MongoDB, Kafka, Iceberg, Doris, StarRocks, Graph Database.

Step 2: Data node configuration

Configure read node




Read node configuration includes basic information, data source, and data fields.
Basic Information The node name cannot be empty, and the same data node name cannot exist within a single task.
Data Source Configure the database table objects to be read and the synchronization method, among other information.
Data Fields Based on the configured database table objects, the system supports both default pulling of metadata information and manual field configuration.
Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
Manual Configuration: File data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo) do not support automatic metadata pulling. You can clickAdd Field/Batch Add to manually add field names and types. The read node additionally supports configuring time parameters and constants.
Batch addition

Parameter
Description(Optional)
Data Type
Current node's data source type.
Adding Method
Additional Fields: Append the newly parsed fields after the original fields of the table.
Overwrite existing fields: The newly parsed fields overwrite the original field information of the current source table.
Field Acquisition
Text parsing: Parse based on the text content.
JSON parsing: Input JSON content and quickly parse the content based on key/value, such as {"age":10,"name":"demo"}.
Fetch from Homogeneous Table: Specify a table object from a data source and parse its fields.
Text to Be Parsed
Separator
Used to separate field names and types, supports tab, |, space, e.g., "age|int".
Quick Filling of Field Type
Common field types, supports constants, functions, variables, string, boolean, date, datetime, timestamp, time, double, float, tinyint, smallint, tinyint unsigned, int, mediumint, smallint unsigned, bigint, int unsigned, bigint unsigned, double precision, tinyint(1), char, varchar, text, varbinary, blob.
Parse Data
Parse input content.
Preview
Batch Delete
After selecting the preview list, batch delete the parsing results.
Field Name
Field name.
Type
Field Type.
Note
Time Parameter Field: Only the read node of offline tasks supports configuring the time parameter field, commonly used to write the instance run time value into the primary or multi-level partition of a table.
Constant Fields: Only reading nodes support configuring constant fields. Constant fields can be used to write a certain constant value into the target table when the number of fields in the source and target tables do not match.

Configure Transformation Node

Transition node configuration includes basic information, conversion rules, and data fields. The transition node must be downstream of the reading node. After creating and connecting to the reading node, the system will automatically retrieve the field information from the upstream node and complete the data conversion based on the conversion rules.
Basic Information Configure the node name information. Node names cannot be empty, and data nodes within a single task cannot have the same name.
Conversion Rules Configure field or data-level transformation rules. Field information is inherited from the upstream node. After connecting with the upstream node, the system will automatically retrieve the field information from the upstream node.
Data Fields By default, pull all data fields from the upstream node for subsequent writing node mapping.

Configure Writing Node




Writing node configuration includes basic information, data source, data fields, and field mapping. The writing node will write the upstream data content into the target object based on the connection relationship.
Basic Information The node name cannot be empty, and the same data node name cannot exist within a single task.
Data Source Configure the database table objects to be read and the synchronization method, among other information.
Data Fields Based on the configured database table objects, the system supports both default pulling of metadata information and manual field configuration.
Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
Manual Configuration: For file data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo), the system does not support automatic metadata pulling. You can click Field Configuration to add field names and types manually.
Field Mapping For the writing node, field mapping relationships need to be additionally configured compared to the reading node. Field mapping relationships designate the source of target field content through linking, supporting three configurations: same name mapping, same row mapping, and manual linking between source and target nodes.



Parameter
Description(Optional)
Mapping by the same name
Establish a mapping relationship between the source table field and the target table field with the same field name.
Peer Mapping
Establish a mapping relationship between the source table field and the target table field in the same row number.
Clear Mappings
Clear the established mapping relationship between the source table field and the target table field.
Pinning Mapped
Pin to the top and format the display of already established mapping relationships; this formatting does not affect the actual storage field order of the table, it is only used for frontend optimization display.
Manual connection mapping
Supports manually establishing the mapping relationship between source table fields and target table fields through connections.
Source Table
Source Table Field Name
The name of the source table field.
Type
The type of the source table field.
Mapping
Quickly create mapping.

Loading…


Target Table
Target Table Field Name
The name of the target table field.
Type
The type of the target table field.
Note
The prerequisite for configuring field mapping is that the current write node has a connected source (either a read node or a transformation node).
The content of target fields without a configured mapping relationship will be empty or remain unchanged.
If the source field type cannot be converted to the target field type, it may lead to task failure.

Step Three: Offline Task Attribute Configuration

Offline task attribute configuration includesTask Attributes, Task Scheduling, and Resource Configuration:

Task Attributes




Set basic task attributes, use resources, and data link channel information.
Category
Parameter
Description
Basic information
Task Name/Type
Displays the basic information of current task name and type.
Task Owner
Name of one or more space members responsible for this task, by default, the task creator.
Description(Optional)
Displays the remark information of the current task.
Scheduling Parameters(Optional)
Scheduling parameters are used during task scheduling. They will be automatically replaced according to the business time of task scheduling and the value format of scheduling parameters, realizing dynamic value retrieval within the task scheduling time
Channel Settings
Dirty Data Threshold
Dirty data refers to data that fails to write during synchronization. The dirty data threshold refers to the maximum number of dirty data entries or byte count that can be tolerated during synchronization. If this threshold is exceeded, the task will automatically end. The default threshold is 0, meaning dirty data is not tolerated.
Concurrent Number
The maximum number of concurrent operations expected during actual execution. Due to resources, data source types, and task optimization results, the actual number of concurrent operations may be less than or equal to this value. The larger this value is, the more pre-allocated execution machine resources.
Note:
When concurrency >1, if the source data supports setting a splitting key, the splitting key is mandatory; otherwise, the set concurrency value will not take effect.
Sync Rate Limit
Limit the synchronization rate by traffic or number of records to protect the read and write pressure on the data source endpoint or data destination endpoint. This value is the maximum operating rate, with the default -1 indicating no rate limit.

Scheduling Settings




Feature
Description(Optional)
Scheduling Cycle
The execution cycle unit for task scheduling supports minute, hour, day, week, month, year, and one-time.
Effective Time
The valid time period for scheduling time configuration. The system will automatically schedule within this time range according to the time configuration, and will no longer automatically schedule after the validity period.
Execution time
Users can set the duration for each execution interval of the task and the specific start time of the task execution.
If the weekly interval is 10 minutes, then the scheduled task will run once every 10 minutes from 00:00 to 23:59 every day between March 27, 2022, and April 27, 2022.
Scheduling Plan
It will be automatically generated based on the setting of the periodic time.
Self-Dependency
Configure the self-dependency attribute uniformly for computation tasks in the current workflow.
Workflow Self-Dependency
When enabled, it indicates that the calculation tasks in the current workflow depend on all calculation tasks from the previous period of the current workflow. The workflow self-dependency feature only takes effect when the tasks in the current workflow have the same scheduling cycle and are on a daily cycle.

Resource Configuration




Before submitting the task, you need to select the integrated resource group as the running resource for the task.

Step Four: Task test run and submission

Offline Synchronization Task can perform online test runs or submit to the production scheduling environment after configuration. The Task Configuration Page supports saving, submitting, test running, stopping debugging, locking/unlocking, and proceeding to operation and maintenance operations.



Serial number
Parameter
Description(Optional)
1
Save
Click the icon to save the current task node.
2
Submit
Click the icon to submit the task node to the scheduling system (node basic content, scheduling configuration attributes), and generate a new version record.
Feature limitation: Tasks can be submitted normally only after their data sources and scheduling conditions are fully set.
3
Lock/Unlock
Click the icon to lock/unlock the editing of the current file. If the task has been locked by someone else, it cannot be edited.
4
Running
Click the icon to debug and run the current task node.
5
Advanced Running
Click the icon to run the current task node with variables. The system will automatically pop up the time parameters and custom parameters used in the code.

Loading…


6
Stop Running
Click the icon to stop debugging and running the current task node.
7
Refresh
Click the icon to refresh the content of the current task node.
8
Project Parameter
Click the icon to display the parameter settings of the project. If you need to modify them, please go to Project Management.

Loading…


9
Task Ops
Click the icon to go to the Task Ops page.
10
Instance Ops
Click the icon to go to the Instance Operation and Maintenance page.
11
Form Conversion
Click the icon to convert the task configuration to Form Mode.
12
Canvas Conversion
Click the icon to convert the task configuration to Canvas Mode.
13
Script Conversion
Click the icon to convert the task configuration to Script Mode. Once converted, it cannot be reverted to Form/Canvas Mode

Task Submission

Each time a Data Workflow is edited and submitted for operation and maintenance, a corresponding workflow version is generated. After confirming there are no errors, choose Submit Version and Start Scheduling, and fill in any content in the Change Description field to publish the Offline Synchronization Task to the operation and maintenance phase.




Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support