The use of variables in data development is divided into three levels: project level, workflow level, and task level. The scope of coverage varies for different levels of variables:
Project Level: Development scripts, data workflows, and task nodes within the project can all use it.
Workflow Level: Data workflows and task nodes within the workflows can use it.
Task Level: Only task nodes within development scripts and orchestration space can create and use it.
Priority: For variables with the same name, upstream task transmission > task level > workflow level > project level.
Project Level
Variable Configuration
Project variables are applied to project-wide data development tasks. For specific configuration steps, please refer to Variable Settings. Variable Usage
Script files created in the development space and computational tasks created in the orchestration space can both use project variables. Here is an example with SQL scripts:
1. In the SQL script, click to view the configured project variables. 2. To reference a variable from the Definition, use the format: ${key}
SELECT * from demo_bi.hive_user
where id=${one}
3. As seen in the figure, after referencing the 'one' Global Variable, the SQL execution retrieves the value of the variable.
Workflow Level
Variable Configuration
Workflow level variables are applied to the compute task nodes created within the corresponding workflow. Through the General Settings on the right side of the workflow canvas page, you can define workflow variables.
Variable Usage
Create compute task nodes in the data workflow. After entering the task node configuration page, you can use workflow variables during the configuration process. The following example shows the configuration of variables in the demo_workflow workflow and the use of a workflow variable in a Hive SQL compute task:
1. Configure the following workflow parameters in the General Settings of the demo_workflow workflow.
2. Then enter the Hive SQL compute task in the workflow and reference the workflow parameter variable name in the code, in the format: ${key}
SELECT * from demo_bi.hive_user
where id=${pg}
3. As seen in the diagram, referencing the workflow variable pg in the compute task results in parameter substitution during runtime.
Task Level
Variable Configuration
Task-level variables are applied to compute task nodes in the data workflow. Each compute task can independently configure task variables suitable for itself. Through the task attributes on the right side of the compute task configuration page, you can define the corresponding task's task variables.
Variable Usage
Enter the compute task node and configure task variables in the scheduling parameters section of the task attributes. The following example shows a Hive SQL task:
1. Configure the following task variables in the scheduling parameters of the quest_hive compute task.
2. Then use the variable name in code development in quest_hive with the format: ${variableName}.
SELECT * from demo_bi.hive_user
where id=${qt}
3. As seen in the diagram, referencing the task variable qt in the compute task results in parameter substitution during runtime.
4. Additionally, providing detailed parameter descriptions during task variable configuration helps users quickly understand and use the feature. Clicking Automatic Code Variable Parsing during task variable configuration will pop up a variable list, allowing you to check if variable configurations are as expected.
5. You can also view the parameter variable information available for the current task in the variable list, including project variables, workflow variables, and task variables. It supports modifying the task variables of the current task and adding new task variables for the current task.
6. Different task variables configured in multiple task nodes in the data workflow can provide parameter passing capabilities according to dependency relationships for interoperability.
Application variables
You can retrieve project, workflow, and compute task information in the form of application variables. View compute task-related information in SQL tasks and YARN.
The currently supported built-in application variables include:
1. Project Identifier: ${projectIdent}
2. Workflow Name: ${workflowName}
3. Task Name: ${taskName}
4. Task ID: ${taskId}
5. Person in Charge: ${taskInCharge}
6. Task Type: ${taskType}
Application scenario
Use it in SQL to get task information.
You can query application variables in SQL statements to obtain information about the current project, workflow, etc.
Sample SQL:
select
"${projectIdent}" as projectIdent,
"${workflowName}" as workflowName,
"${taskName}" as taskName,
"${taskId}" as taskId,
"${taskInCharge}" as taskInCharge,
"${taskType}" as taskType,
user_id
from
wedata_demo_db.user_info
limit
10;
Specify task name on YARN.
Use --name + variable to specify task name on YARN in SparkSQL, PySpark, and Spark task types.
Parameter example:
--name ${projectIdent}-${workflowName}-${taskInCharge}
Note:
When connecting to the Kyuubi data source in SparkSQL, currently the parameter name needs to be specified using spark.app.name.
spark.app.name=${projectIdent}-${workflowName}-${taskInCharge}
Final effect on YARN:
Other notes
Currently, if a Kyuubi job has been submitted before, subsequent Kyuubi jobs might reuse the previous YARN application ID, causing duplicate application names. This issue is under investigation and resolution.
Was this page helpful?