tencent cloud

Feedback

Introduction to Task Development

Last updated: 2024-11-01 16:26:14

    Entering the Task Development Page

    1. log in to WeData console.
    2. Click on the Project List in the left menu to find the target project that requires the Task Development feature.
    3. After selecting a project, click to enter the Data Development module.
    4. Click on the Orchestration Space in the left menu.

    Overview of Task Development

    WeData's Task Development orchestrates computing tasks into data workflows for streamlined data processing. It supports flexible data development processes through features like scheduling policy, event listening, task parameters, self-dependency, and function libraries. It meets user needs for data processing, transformation, and conversion, providing a visual configuration interface for easily building and managing complex data processing workflows.

    Data Processing Flow

    Definition of data flow and conversion rules across different tasks to perform operations like processing, cleaning, and transformation.

    Data Workflow Orchestration

    Orchestrate and organize computing tasks as data processing nodes in the form of data workflows, forming a complete data processing process.

    Scheduling policy

    Scheduling policies determine when tasks are executed. Workflows can be automatically triggered based on periodic scheduling times and other conditions, ensuring that tasks are processed in predefined sequences and times to meet various business needs.

    Listening on events

    Event listening applies to scenarios where computing tasks depend on an event trigger. It consists of a trigger program, an event trigger, and listening tasks. Firstly, define the event in the project based on business definitions, write a trigger program to send the event, and the task runs upon detecting the event.

    Task Parameters and Parameter Passing

    Support for using variable parameters in data workflow design and computing task configuration, and parameter passing between tasks. Each computing task can have different input parameters, and the task's output parameters can be passed to the next task, enabling data sharing and interaction among tasks.

    Self-Dependency

    Self-dependency is supported during computing task maintenance, meaning that tasks can rely on the execution status of the previous cycle during scheduling operations.

    Function

    Provides a feature-rich function library, including commonly used functions and algorithms of Hive SQL, Spark SQL, and DLC, such as mathematical functions, data transformation functions, aggregation functions, etc. It also supports UDF self-definition functions to assist users in data processing and computational operations, offering more flexible and comprehensive data processing capabilities.

    Data Development Collaboration

    In WeData Data Development, create, write, and debug development scripts to be used in collaboration with data workflows. Scripts completed in Ad Hoc Development within the development space can be directly incorporated into the data workflow orchestration as a task node, achieving code reuse and optimizing the overall process.

    Workflow Introduction

    The orchestration space provides features for the orchestration and configuration of data workflows, supporting users in organizing and developing different types of task code based on the workflow, and submitting them to the scheduling system for periodic execution. A project can contain multiple workflows, and WeData supports placing different workflows into the same folder for convenient and efficient management. A workflow is a collection of various types of task objects, including DataInLong, Computing Tasks (Hive SQL, JDBC SQL, MapReduce, PySpark, Python, Shell, Spark, Spark SQL, DLC SQL, DLC Spark, Impala, TCHouse-P, Trino), and Generic Tasks.

    Workflow Directory

    
    
    
    Directory Features:
    Feature
    Description
    Search
    Support searching folders, workflows, and task names.
    
    
    
    Code Search
    
    Support for global directory search, providing rich retrieval dimensions. Keywords can be used to search orchestration space and development space for computing tasks and development scripts.
    
    
    
    Refresh
    
    
    
    Refresh: Refresh the directory tree to get the latest status of the orchestration directory.
    Locate within the Tree Node: One-click locate to the current tree node.
    Collapse within the Tree Node: One-click collapse all expanded directories.
    Batch
    
    
    
    Import, Export: Batch import and export data workflows and computing tasks in the specified directory.
    Batch operations: Support batch operations on all computing tasks in the orchestration directory, including submitting tasks (batch), deleting tasks, modifying resource groups, modifying responsible persons, modifying data sources, modifying task parameters, modifying scheduling cycles, modifying advanced scheduling settings, and modifying scheduling parameters. You can view batch operation records.
    Show/Hide: Support showing/hiding task category folders in the workflow.
    Create new
    Supports creating new folders and data workflows.
    
    
    

    Workflow canvas

    
    Canvas feature :
    Feature
    Description
    Submit
    click
    
    icon to submit the current workflow to the scheduling system (including node content, configuration properties, dependency relationships), and generate a new version.
    click
    
    icon to refresh the content on the current workflow canvas.
    click
    
    icon to go to the Operations - Workflow List page.
    click
    
    icon to test the current workflow. During the testing process click the icon to stop testing.
    
    
    
    Refresh
    Go to Operations
    Workflow testing
    Task Type Directory
    In the Task Type Directory, click Types of Computing Tasks to add task nodes to the workflow canvas.
    
    
    
    Locate
    Click the icon, in the pop-up filter box, you can freely choose and locate the corresponding task.
    
    
    
    Zooming in/out the canvas
    Click the icon, you can scale the workflow canvas.
    
    
    
    Formatting
    Click the icon, you can standardize the layout format of tasks in the workflow.
    
    
    
    Selection Box
    Click the icon, the mouse changes to selection mode, allowing you to select multiple tasks simultaneously and perform batch operations.
    
    
    

    General Settings

    Click the right sidebar General Settings, you can edit the current workflow's name, person in charge, add description information, workflow variables, and Spark SQL configuration parameters (optional). The Spark SQL configuration is effective only for Spark SQL tasks within the workflow tasks.
    
    
    
    Feature Description:
    Feature
    Description
    Workflow name
    Custom Definition Workflow name.
    Workflow Person in Charge
    Designate the person in charge of the workflow. During the subsequent workflow submission and changes, relevant permissions and operations such as application and approval will be handled by this person.
    Description (Optional)
    Custom Definition of workflow description information.
    Workflow Parameters (Optional)
    Workflow parameters (optional) apply to the parameters of tasks within the current workflow. They are set through the general settings parameter item of the workflow. The setting rule is: variable name = variable value; multiple values can be separated by ";", for example, a=${yyyyMMdd};b=123;c=456;
    Spark SQL configuration parameters (optional)
    Used to configure optimization parameters (thread, memory, CPU cores, etc.), only applicable to Spark SQL nodes. Multiple parameters should be separated by English semicolons.

    Unified Scheduling

    Workflow scheduling supports two types of periodic scheduling configuration: regular and crontab. For regular configuration, refer to the scheduling settings for one-time, minute, hour, day, week, month, and year scheduling. Crontab configuration is more flexible and only supports configuration in unified workflow scheduling. Under crontab configuration, all task scheduling times (crontab expressions) must be the same. Cross-workflow dependency tasks are not supported, nor is establishing dependencies with regular configuration tasks.
    Note:
    The operation of unified scheduling is similar to batch operations and will change the scheduling cycles of all tasks under the current workflow to a uniform scheduling cycle. It is recommended to use when the scheduling cycles of tasks within the workflow are consistent.
    Regular configuration method
    
    Configuration instructions:
    Feature
    Description
    Scheduling Cycle
    The execution cycle unit for task scheduling supports minute, hour, day, week, month, year, and one-time.
    Time to Take Effect
    The valid time period for scheduling time configuration. The system will automatically schedule within this time range according to the time configuration, and will no longer automatically schedule after the validity period.
    Execution time
    Users can set the duration for each execution interval of the task and the specific start time of the task execution.
    If the weekly interval is 10 minutes, then the scheduled task will run once every 10 minutes from 00:00 to 23:59 every day between March 27, 2022, and April 27, 2022.
    Scheduling Plan
    It will be automatically generated based on the setting of the periodic time.
    Self-Dependency
    Configure the self-dependency attribute uniformly for computation tasks in the current workflow.
    Workflow Self-Dependency
    When enabled, it indicates that the calculation tasks in the current workflow depend on all calculation tasks from the previous period of the current workflow. The workflow self-dependency feature only takes effect when the tasks in the current workflow have the same scheduling cycle and are on a daily cycle.
    Crontab configuration method
    Crontab configuration supports fine-grained configuration from year, month, week, day, hour, minute to second. After configuration is complete, you can view the specific execution time.
    
    
    
    Support configuring the scheduling cycle using crontab statements. Click Configure to enter the configuration page.
    
    
    

    History

    Click the right sidebar History to view the historical operation information of the current workflow, including the operator (execute account), operation time, and specific operation content.
    

    version

    Each time a data workflow is edited and submitted for operation and maintenance, a corresponding workflow version will be generated. Click the right sidebar Version to view the historical version information of the current workflow, including the version name (version number), saved by (version submitter), save time (submission time), and change description.
    Note:
    A workflow version is generated only when the workflow is submitted. Individual task submissions will not generate a workflow version.
    
    You can view the configuration information of the corresponding version by using the view feature in the operation column.
    

    Introduction to Computing Tasks

    
    
    

    Canvas Feature

    Feature
    Description
    Save
    Click the icon to save the current task node.
    
    
    
    Submit
    Click the icon to submit the task node to the scheduling system (node basic content, scheduling configuration attributes), and generate a new version record.
    Feature limitation: Tasks can be submitted normally only after their data sources and scheduling conditions are fully set.
    
    
    
    Lock/Unlock
    Click the icon to lock/unlock the editing of the current file. If the task has been locked by someone else, it cannot be edited.
    
    
    
    Running
    Click the icon to debug and run the current task node.
    
    
    
    Advanced Running
    
    
    
    Click the icon to run the current task node with variables. The system will automatically pop up the time parameters and custom parameters used in the code.
    
    
    
    Stop Running
    Click the icon to stop debugging and running the current task node.
    
    
    
    Formatting
    Click the icon to standardize the format of code statements in the task.
    
    
    
    Refresh
    Click the icon to refresh the content of the current task node.
    
    
    
    Project Variables
    Click the icon to view project global variables and use them in tasks.
    
    
    
    Go to Operations
    Click the icon to go to the task operations page and automatically filter the current task.
    
    
    
    Data Source
    Select the data source used by the current computing task.
    
    
    
    Execution Resource Group
    Select the execution resource group for the current computing task.
    
    
    
    Resource Queue
    Select the resource queue used when executing the current computing task.
    
    
    

    Task Attributes

    You can modify the task name, task owner, task description information, task scheduling parameters, use application parameters, provide automatic code variable parsing feature, and provide parameter description documents to assist in the usage of scheduling parameters.
    
    
    
    

    Scheduling Settings

    Task scheduling includes configuration items such as scheduling strategy, event scheduling, dependency configuration, upstream dependency task configuration, scheduling priority, and failure strategy.

    version

    Displays historical submission records of computing tasks, including information such as node historical versions, submitted by, submission time, change type, status, and remarks in the version panel. You can view information for a single version and compare two versions.
    
    
    
    Version information only exists for submitted task nodes; otherwise, the version information is empty.
    Each submission generates a new version and creates a new record in the Version panel.
    Compare: Provides pairwise comparison of historical versions of computing tasks, displaying key information in terms of code and task configuration parameters.
    
    Code: Support viewing code and configuration parameters for any version status of a computing task.
    Rollback: Only the script content and configuration of the task are rolled back, without including dependency relationships. The rollback takes effect only after the submission. Unsubmitted changes (including code and task configurations) will be lost after rollback.

    Metadata Database

    Displays metadata information of data sources accessed under the current project. You can obtain database table information by searching for data sources, databases, and data tables, facilitating quick use during task development. Provides quick features for replicating table query SQL, table DDL, and table name.
    Note:
    Currently, the ability to replicate table query SQL and table DDL only supports system data sources.
    

    Function

    Displays functions that can be used in task development. Currently, DLC SQL, Hive SQL, and Spark SQL functions are supported. Select according to the engine targeted by the development task. The function library includes commonly used system functions, such as analysis functions (corr, covar_pop), encryption functions (hash, md5), and logical functions (decode, nvl). Custom-defined functions are also supported. Functions uploaded through the resource management feature and created through the function development feature can be displayed in this function library and invoked in development tasks.
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support