Entering the Task Development Page
2. Click on the Project List in the left menu to find the target project that requires the Task Development feature.
3. After selecting a project, click to enter the Data Development module.
4. Click on the Orchestration Space in the left menu.
Overview of Task Development
WeData's Task Development orchestrates computing tasks into data workflows for streamlined data processing. It supports flexible data development processes through features like scheduling policy, event listening, task parameters, self-dependency, and function libraries. It meets user needs for data processing, transformation, and conversion, providing a visual configuration interface for easily building and managing complex data processing workflows.
Data Processing Flow
Definition of data flow and conversion rules across different tasks to perform operations like processing, cleaning, and transformation.
Data Workflow Orchestration
Orchestrate and organize computing tasks as data processing nodes in the form of data workflows, forming a complete data processing process.
Scheduling policy
Scheduling policies determine when tasks are executed. Workflows can be automatically triggered based on periodic scheduling times and other conditions, ensuring that tasks are processed in predefined sequences and times to meet various business needs.
Listening on events
Event listening applies to scenarios where computing tasks depend on an event trigger. It consists of a trigger program, an event trigger, and listening tasks. Firstly, define the event in the project based on business definitions, write a trigger program to send the event, and the task runs upon detecting the event.
Task Parameters and Parameter Passing
Support for using variable parameters in data workflow design and computing task configuration, and parameter passing between tasks. Each computing task can have different input parameters, and the task's output parameters can be passed to the next task, enabling data sharing and interaction among tasks.
Self-Dependency
Self-dependency is supported during computing task maintenance, meaning that tasks can rely on the execution status of the previous cycle during scheduling operations.
Function
Provides a feature-rich function library, including commonly used functions and algorithms of Hive SQL, Spark SQL, and DLC, such as mathematical functions, data transformation functions, aggregation functions, etc. It also supports UDF self-definition functions to assist users in data processing and computational operations, offering more flexible and comprehensive data processing capabilities.
Data Development Collaboration
In WeData Data Development, create, write, and debug development scripts to be used in collaboration with data workflows. Scripts completed in Ad Hoc Development within the development space can be directly incorporated into the data workflow orchestration as a task node, achieving code reuse and optimizing the overall process.
Workflow Introduction
The orchestration space provides features for the orchestration and configuration of data workflows, supporting users in organizing and developing different types of task code based on the workflow, and submitting them to the scheduling system for periodic execution. A project can contain multiple workflows, and WeData supports placing different workflows into the same folder for convenient and efficient management. A workflow is a collection of various types of task objects, including DataInLong, Computing Tasks (Hive SQL, JDBC SQL, MapReduce, PySpark, Python, Shell, Spark, Spark SQL, DLC SQL, DLC Spark, Impala, TCHouse-P, Trino), and Generic Tasks.
Workflow Directory
Directory Features:
|
Search | Support searching folders, workflows, and task names. |
Code Search | Support for global directory search, providing rich retrieval dimensions. Keywords can be used to search orchestration space and development space for computing tasks and development scripts. |
Refresh | Refresh: Refresh the directory tree to get the latest status of the orchestration directory. Locate within the Tree Node: One-click locate to the current tree node. Collapse within the Tree Node: One-click collapse all expanded directories. |
Batch | Import, Export: Batch import and export data workflows and computing tasks in the specified directory. Batch operations: Support batch operations on all computing tasks in the orchestration directory, including submitting tasks (batch), deleting tasks, modifying resource groups, modifying responsible persons, modifying data sources, modifying task parameters, modifying scheduling cycles, modifying advanced scheduling settings, and modifying scheduling parameters. You can view batch operation records. Show/Hide: Support showing/hiding task category folders in the workflow. |
Create new | Supports creating new folders and data workflows. |
Workflow canvas
Canvas feature :
|
Submit | click icon to submit the current workflow to the scheduling system (including node content, configuration properties, dependency relationships), and generate a new version. click icon to refresh the content on the current workflow canvas. click icon to go to the Operations - Workflow List page. click icon to test the current workflow. During the testing process click the icon to stop testing. |
Refresh |
|
Go to Operations |
|
Workflow testing |
|
Task Type Directory | In the Task Type Directory, click Types of Computing Tasks to add task nodes to the workflow canvas. |
Locate | Click the icon, in the pop-up filter box, you can freely choose and locate the corresponding task. |
Zooming in/out the canvas | Click the icon, you can scale the workflow canvas. |
Formatting | Click the icon, you can standardize the layout format of tasks in the workflow. |
Selection Box | Click the icon, the mouse changes to selection mode, allowing you to select multiple tasks simultaneously and perform batch operations. |
General Settings
Click the right sidebar General Settings, you can edit the current workflow's name, person in charge, add description information, workflow variables, and Spark SQL configuration parameters (optional). The Spark SQL configuration is effective only for Spark SQL tasks within the workflow tasks.
Feature Description:
|
Workflow name | Custom Definition Workflow name. |
Workflow Person in Charge | Designate the person in charge of the workflow. During the subsequent workflow submission and changes, relevant permissions and operations such as application and approval will be handled by this person. |
Description (Optional) | Custom Definition of workflow description information. |
Workflow Parameters (Optional) | Workflow parameters (optional) apply to the parameters of tasks within the current workflow. They are set through the general settings parameter item of the workflow. The setting rule is: variable name = variable value; multiple values can be separated by ";", for example, a=${yyyyMMdd};b=123;c=456; |
Spark SQL configuration parameters (optional) | Used to configure optimization parameters (thread, memory, CPU cores, etc.), only applicable to Spark SQL nodes. Multiple parameters should be separated by English semicolons. |
Unified Scheduling
Workflow scheduling supports two types of periodic scheduling configuration: regular and crontab. For regular configuration, refer to the scheduling settings for one-time, minute, hour, day, week, month, and year scheduling. Crontab configuration is more flexible and only supports configuration in unified workflow scheduling. Under crontab configuration, all task scheduling times (crontab expressions) must be the same. Cross-workflow dependency tasks are not supported, nor is establishing dependencies with regular configuration tasks.
Note:
The operation of unified scheduling is similar to batch operations and will change the scheduling cycles of all tasks under the current workflow to a uniform scheduling cycle. It is recommended to use when the scheduling cycles of tasks within the workflow are consistent.
Regular configuration method
Configuration instructions:
|
Scheduling Cycle | The execution cycle unit for task scheduling supports minute, hour, day, week, month, year, and one-time. |
Time to Take Effect | The valid time period for scheduling time configuration. The system will automatically schedule within this time range according to the time configuration, and will no longer automatically schedule after the validity period. |
Execution time | Users can set the duration for each execution interval of the task and the specific start time of the task execution. If the weekly interval is 10 minutes, then the scheduled task will run once every 10 minutes from 00:00 to 23:59 every day between March 27, 2022, and April 27, 2022. |
Scheduling Plan | It will be automatically generated based on the setting of the periodic time. |
Self-Dependency | Configure the self-dependency attribute uniformly for computation tasks in the current workflow. |
Workflow Self-Dependency | When enabled, it indicates that the calculation tasks in the current workflow depend on all calculation tasks from the previous period of the current workflow. The workflow self-dependency feature only takes effect when the tasks in the current workflow have the same scheduling cycle and are on a daily cycle. |
Crontab configuration method
Crontab configuration supports fine-grained configuration from year, month, week, day, hour, minute to second. After configuration is complete, you can view the specific execution time.
Support configuring the scheduling cycle using crontab statements. Click Configure to enter the configuration page.
History
Click the right sidebar History to view the historical operation information of the current workflow, including the operator (execute account), operation time, and specific operation content.
version
Each time a data workflow is edited and submitted for operation and maintenance, a corresponding workflow version will be generated. Click the right sidebar Version to view the historical version information of the current workflow, including the version name (version number), saved by (version submitter), save time (submission time), and change description.
Note:
A workflow version is generated only when the workflow is submitted. Individual task submissions will not generate a workflow version.
You can view the configuration information of the corresponding version by using the view feature in the operation column.
Introduction to Computing Tasks
Canvas Feature
|
Save | Click the icon to save the current task node. |
Submit | Click the icon to submit the task node to the scheduling system (node basic content, scheduling configuration attributes), and generate a new version record. Feature limitation: Tasks can be submitted normally only after their data sources and scheduling conditions are fully set. |
Lock/Unlock | Click the icon to lock/unlock the editing of the current file. If the task has been locked by someone else, it cannot be edited. |
Running | Click the icon to debug and run the current task node. |
Advanced Running | Click the icon to run the current task node with variables. The system will automatically pop up the time parameters and custom parameters used in the code. |
Stop Running | Click the icon to stop debugging and running the current task node. |
Formatting | Click the icon to standardize the format of code statements in the task. |
Refresh | Click the icon to refresh the content of the current task node. |
Project Variables | Click the icon to view project global variables and use them in tasks. |
Go to Operations | Click the icon to go to the task operations page and automatically filter the current task. |
Data Source | Select the data source used by the current computing task. |
Execution Resource Group | Select the execution resource group for the current computing task. |
Resource Queue | Select the resource queue used when executing the current computing task. |
Task Attributes
You can modify the task name, task owner, task description information, task scheduling parameters, use application parameters, provide automatic code variable parsing feature, and provide parameter description documents to assist in the usage of scheduling parameters.
Scheduling Settings
Task scheduling includes configuration items such as scheduling strategy, event scheduling, dependency configuration, upstream dependency task configuration, scheduling priority, and failure strategy.
version
Displays historical submission records of computing tasks, including information such as node historical versions, submitted by, submission time, change type, status, and remarks in the version panel. You can view information for a single version and compare two versions.
Version information only exists for submitted task nodes; otherwise, the version information is empty.
Each submission generates a new version and creates a new record in the Version panel.
Compare: Provides pairwise comparison of historical versions of computing tasks, displaying key information in terms of code and task configuration parameters.
Code: Support viewing code and configuration parameters for any version status of a computing task.
Rollback: Only the script content and configuration of the task are rolled back, without including dependency relationships. The rollback takes effect only after the submission. Unsubmitted changes (including code and task configurations) will be lost after rollback.
Metadata Database
Displays metadata information of data sources accessed under the current project. You can obtain database table information by searching for data sources, databases, and data tables, facilitating quick use during task development. Provides quick features for replicating table query SQL, table DDL, and table name.
Note:
Currently, the ability to replicate table query SQL and table DDL only supports system data sources.
Function
Displays functions that can be used in task development. Currently, DLC SQL, Hive SQL, and Spark SQL functions are supported. Select according to the engine targeted by the development task. The function library includes commonly used system functions, such as analysis functions (corr, covar_pop), encryption functions (hash, md5), and logical functions (decode, nvl). Custom-defined functions are also supported. Functions uploaded through the resource management feature and created through the function development feature can be displayed in this function library and invoked in development tasks.
Was this page helpful?