tencent cloud

Feedback

Creating Single Task

Last updated: 2024-11-01 17:32:23
    Note:
    If offline synchronization tasks need to be orchestrated with other tasks or configured for task dependencies, it is recommended to create offline synchronization tasks in the data development orchestration space, or import the tasks into the orchestration space through the DataInLong import task (DataInLong) feature after creating them in DataInLong.
    Tasks that have been submitted in DataInLong cannot be imported into the data development orchestration space, only tasks that have not been submitted can be imported.

    Background

    Single Table Synchronization uses fixed field synchronization, which only synchronizes the source field data specified in the task configuration's mapping relationships to the target end. Single table tasks support canvas, form, and script configuration modes, covering data sources like MySQL, Hive, DLC, Doris, etc.

    Conditions and Restrictions

    1. The source and target data sources have been configured for subsequent tasks.
    2. The DataInLong Resource Group has been purchased.
    3. Network connectivity between the DataInLong Resource Group and the data source has been completed.
    4. The data source environment preparation has been completed. Based on the sync configuration you need, grant the data source configuration account the necessary operation permissions in the database before executing the sync task.
    5. If the database account configured for the data source lacks read and write permissions, it will cause the task to fail. Please configure an account with appropriate permissions according to the actual read and write scenarios.

    Operation step

    Step One: Create a new offline synchronization task and select the configuration mode

    1. On the DataInLong page, click Offline Development > Folder > Workflow > DataInLong > New Task > Offline Synchronization to access the synchronization task list.
    
    
    
    2. Configure the basic task information in the pop-up window, click Confirm, and then you can enter the task configuration page.
    
    
    
    Parameter
    Description
    Task Name
    Required fields.
    Task Type
    Select Offline Synchronization.
    Development Mode
    Form Mode: Provides only read and write nodes, suitable for fixed field synchronization from single table to single table. Applicable to ODS layer data synchronization that does not require data cleaning.
    Canvas Mode: Provides read, write, and conversion nodes. Suitable for data linkages involving cleaning, and many-to-many data link.
    Script Mode: Supports initialization of the script mode configuration page, allowing users to select different data sources and targets, displaying the corresponding script template
    Users need to select the data source and destination first, and editing is not allowed in the unselected state.
    After selection, the corresponding script module is displayed.
    In the script, users can manually write parameters such as data source, connection information.
    Supports writing SQL statements in the script, putting the query SQL into the connection.
    
    
    
    Workflow Directory
    Select an existing workflow.
    Note:
    Script mode currently supports the following data sources:
    Read:MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, DM DB, SAP HANA, SyBase, Doris, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Impala, Gbase, TBase, MongoDB, COS, FTP, SFTP, REST API, Elasticsearch, Kafka, Iceberg, StarRocks, Graph Database.
    Write:MySQL, TDSQL-C MySQL, TDSQL MySQL, TDSQL PostgreSQL, PostgreSQL, TCHouse-P, SQL Server, Oracle, IBM DB2, DM DB, SAP HANA, Hive, HBase, Clickhouse, DLC, Kudu, HDFS, Greenplum, GaussDB, Gbase, TBase, Impala, COS, FTP, SFTP, Elasticsearch, Redis, MongoDB, Kafka, Iceberg, Doris, StarRocks, Graph Database.

    Step 2: Data node configuration

    Configure read node

    
    
    
    Read node configuration includes basic information, data source, and data fields.
    Basic Information The node name cannot be empty, and the same data node name cannot exist within a single task.
    Data Source Configure the database table objects to be read and the synchronization method, among other information.
    Data Fields Based on the configured database table objects, the system supports both default pulling of metadata information and manual field configuration.
    Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
    Manual Configuration: File data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo) do not support automatic metadata pulling. You can clickAdd Field/Batch Add to manually add field names and types. The read node additionally supports configuring time parameters and constants.
    Batch addition
    
    Parameter
    Description(Optional)
    Data Type
    Current node's data source type.
    Adding Method
    Additional Fields: Append the newly parsed fields after the original fields of the table.
    Overwrite existing fields: The newly parsed fields overwrite the original field information of the current source table.
    Field Acquisition
    Text parsing: Parse based on the text content.
    JSON parsing: Input JSON content and quickly parse the content based on key/value, such as {"age":10,"name":"demo"}.
    Fetch from Homogeneous Table: Specify a table object from a data source and parse its fields.
    Text to Be Parsed
    Separator
    Used to separate field names and types, supports tab, |, space, e.g., "age|int".
    Quick Filling of Field Type
    Common field types, supports constants, functions, variables, string, boolean, date, datetime, timestamp, time, double, float, tinyint, smallint, tinyint unsigned, int, mediumint, smallint unsigned, bigint, int unsigned, bigint unsigned, double precision, tinyint(1), char, varchar, text, varbinary, blob.
    Parse Data
    Parse input content.
    Preview
    Batch Delete
    After selecting the preview list, batch delete the parsing results.
    Field Name
    Field name.
    Type
    Field Type.
    Note
    Time Parameter Field: Only the read node of offline tasks supports configuring the time parameter field, commonly used to write the instance run time value into the primary or multi-level partition of a table.
    Constant Fields: Only reading nodes support configuring constant fields. Constant fields can be used to write a certain constant value into the target table when the number of fields in the source and target tables do not match.

    Configure Transformation Node

    Transition node configuration includes basic information, conversion rules, and data fields. The transition node must be downstream of the reading node. After creating and connecting to the reading node, the system will automatically retrieve the field information from the upstream node and complete the data conversion based on the conversion rules.
    Basic Information Configure the node name information. Node names cannot be empty, and data nodes within a single task cannot have the same name.
    Conversion Rules Configure field or data-level transformation rules. Field information is inherited from the upstream node. After connecting with the upstream node, the system will automatically retrieve the field information from the upstream node.
    Data Fields By default, pull all data fields from the upstream node for subsequent writing node mapping.

    Configure Writing Node

    
    
    
    Writing node configuration includes basic information, data source, data fields, and field mapping. The writing node will write the upstream data content into the target object based on the connection relationship.
    Basic Information The node name cannot be empty, and the same data node name cannot exist within a single task.
    Data Source Configure the database table objects to be read and the synchronization method, among other information.
    Data Fields Based on the configured database table objects, the system supports both default pulling of metadata information and manual field configuration.
    Default Pulling: For types such as MySQL, Hive, PostgreSQL, etc., the system supports automatic pulling of metadata fields and types based on their database table information, without the need for manual editing.
    Manual Configuration: For file data sources (e.g., HDFS, COS) and columnar storage data sources (e.g., HBase, Mongo), the system does not support automatic metadata pulling. You can click Field Configuration to add field names and types manually.
    Field Mapping For the writing node, field mapping relationships need to be additionally configured compared to the reading node. Field mapping relationships designate the source of target field content through linking, supporting three configurations: same name mapping, same row mapping, and manual linking between source and target nodes.
    
    
    
    Parameter
    Description(Optional)
    Mapping by the same name
    Establish a mapping relationship between the source table field and the target table field with the same field name.
    Peer Mapping
    Establish a mapping relationship between the source table field and the target table field in the same row number.
    Clear Mappings
    Clear the established mapping relationship between the source table field and the target table field.
    Pinning Mapped
    Pin to the top and format the display of already established mapping relationships; this formatting does not affect the actual storage field order of the table, it is only used for frontend optimization display.
    Manual connection mapping
    Supports manually establishing the mapping relationship between source table fields and target table fields through connections.
    Source Table
    Source Table Field Name
    The name of the source table field.
    Type
    The type of the source table field.
    Mapping
    Quickly create mapping.
    
    
    
    Target Table
    Target Table Field Name
    The name of the target table field.
    Type
    The type of the target table field.
    Note
    The prerequisite for configuring field mapping is that the current write node has a connected source (either a read node or a transformation node).
    The content of target fields without a configured mapping relationship will be empty or remain unchanged.
    If the source field type cannot be converted to the target field type, it may lead to task failure.

    Step Three: Offline Task Attribute Configuration

    Offline task attribute configuration includesTask Attributes, Task Scheduling, and Resource Configuration:

    Task Attributes

    
    
    
    Set basic task attributes, use resources, and data link channel information.
    Category
    Parameter
    Description
    Basic information
    Task Name/Type
    Displays the basic information of current task name and type.
    Task Owner
    Name of one or more space members responsible for this task, by default, the task creator.
    Description(Optional)
    Displays the remark information of the current task.
    Scheduling Parameters(Optional)
    Scheduling parameters are used during task scheduling. They will be automatically replaced according to the business time of task scheduling and the value format of scheduling parameters, realizing dynamic value retrieval within the task scheduling time
    Channel Settings
    Dirty Data Threshold
    Dirty data refers to data that fails to write during synchronization. The dirty data threshold refers to the maximum number of dirty data entries or byte count that can be tolerated during synchronization. If this threshold is exceeded, the task will automatically end. The default threshold is 0, meaning dirty data is not tolerated.
    Concurrent Number
    The maximum number of concurrent operations expected during actual execution. Due to resources, data source types, and task optimization results, the actual number of concurrent operations may be less than or equal to this value. The larger this value is, the more pre-allocated execution machine resources.
    Note:
    When concurrency >1, if the source data supports setting a splitting key, the splitting key is mandatory; otherwise, the set concurrency value will not take effect.
    Sync Rate Limit
    Limit the synchronization rate by traffic or number of records to protect the read and write pressure on the data source endpoint or data destination endpoint. This value is the maximum operating rate, with the default -1 indicating no rate limit.

    Scheduling Settings

    
    
    
    Feature
    Description(Optional)
    Scheduling Cycle
    The execution cycle unit for task scheduling supports minute, hour, day, week, month, year, and one-time.
    Effective Time
    The valid time period for scheduling time configuration. The system will automatically schedule within this time range according to the time configuration, and will no longer automatically schedule after the validity period.
    Execution time
    Users can set the duration for each execution interval of the task and the specific start time of the task execution.
    If the weekly interval is 10 minutes, then the scheduled task will run once every 10 minutes from 00:00 to 23:59 every day between March 27, 2022, and April 27, 2022.
    Scheduling Plan
    It will be automatically generated based on the setting of the periodic time.
    Self-Dependency
    Configure the self-dependency attribute uniformly for computation tasks in the current workflow.
    Workflow Self-Dependency
    When enabled, it indicates that the calculation tasks in the current workflow depend on all calculation tasks from the previous period of the current workflow. The workflow self-dependency feature only takes effect when the tasks in the current workflow have the same scheduling cycle and are on a daily cycle.

    Resource Configuration

    
    
    
    Before submitting the task, you need to select the integrated resource group as the running resource for the task.

    Step Four: Task test run and submission

    Offline Synchronization Task can perform online test runs or submit to the production scheduling environment after configuration. The Task Configuration Page supports saving, submitting, test running, stopping debugging, locking/unlocking, and proceeding to operation and maintenance operations.
    
    
    
    Serial number
    Parameter
    Description(Optional)
    1
    Save
    Click the icon to save the current task node.
    2
    Submit
    Click the icon to submit the task node to the scheduling system (node basic content, scheduling configuration attributes), and generate a new version record.
    Feature limitation: Tasks can be submitted normally only after their data sources and scheduling conditions are fully set.
    3
    Lock/Unlock
    Click the icon to lock/unlock the editing of the current file. If the task has been locked by someone else, it cannot be edited.
    4
    Running
    Click the icon to debug and run the current task node.
    5
    Advanced Running
    Click the icon to run the current task node with variables. The system will automatically pop up the time parameters and custom parameters used in the code.
    
    
    
    6
    Stop Running
    Click the icon to stop debugging and running the current task node.
    7
    Refresh
    Click the icon to refresh the content of the current task node.
    8
    Project Parameter
    Click the icon to display the parameter settings of the project. If you need to modify them, please go to Project Management.
    
    
    
    9
    Task Ops
    Click the icon to go to the Task Ops page.
    10
    Instance Ops
    Click the icon to go to the Instance Operation and Maintenance page.
    11
    Form Conversion
    Click the icon to convert the task configuration to Form Mode.
    12
    Canvas Conversion
    Click the icon to convert the task configuration to Canvas Mode.
    13
    Script Conversion
    Click the icon to convert the task configuration to Script Mode. Once converted, it cannot be reverted to Form/Canvas Mode

    Task Submission

    Each time a Data Workflow is edited and submitted for operation and maintenance, a corresponding workflow version is generated. After confirming there are no errors, choose Submit Version and Start Scheduling, and fill in any content in the Change Description field to publish the Offline Synchronization Task to the operation and maintenance phase.
    
    
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support