tencent cloud

All product documents
Data Lake Compute
DocumentationData Lake ComputePractical TutorialUsing Apache Airflow to Schedule DLC Engine to Submit Tasks
Using Apache Airflow to Schedule DLC Engine to Submit Tasks
Last updated: 2025-03-21 12:24:55
Using Apache Airflow to Schedule DLC Engine to Submit Tasks
Last updated: 2025-03-21 12:24:55
This article introduces the support of DLC for Apache DolphinScheduler scheduling tools and provides examples to demonstrate how to use Apache DolphinScheduler to run DLC engine to submit tasks.

Background

Apache DolphinScheduler is an open-source distributed workflow scheduling system designed to provide efficient task scheduling and management for big data scenarios. It supports visual workflow design, task dependency management, scheduled scheduling, and other features, suitable for data processing, ETL, and machine learning applications. For more information, please visit Apache DolphinScheduler Official Website.

Prerequisites

1. Apache DolphinScheduler Environment Preparation.
1.1. Apache DolphinScheduler has been installed and started. For more installation and startup operations of Apache DolphinScheduler, please refer to Apache DolphinScheduler Quick Start.
2. Data Lake Compute (DLC) environment preparation.
2.1. The Data Lake Compute DLC engine service has been activated.
2.2. If using the SuperSQL engine, prepare the DLC JDBC driver, click to download the JDBC driver.

Key Steps for Standard Spark Engine Docking with Apache DolphinScheduler

Choosing to Add a Kyuubi or Spark Data Source

Notes:
Note: Apache DolphinScheduler versions below 3.2.1 do not support kyuubi data source and can only choose spark data source.



Set the parameters as shown in the figure:



Parameter description:
Parameters
Required
Description
Source Name
Yes
Customizable.
Description
No
Customizable.
IP hostname
Yes
Fill in the IP that can access the engine.
For intranet access, you can check it in the DLC console as shown below:



For public network access, you need to enable public network access for the engine first. Please refer to Configuring Public Network Access for the Engine.
Port
Yes
10009
Username
Yes
Fill in engine id and resource group id, separated by "&".
Password
Yes
Fill in secretid and secretkey, separated by "&".

Creating a Project and Workflow

1. Create a project under Project Management as shown:



2. Enter the project to create a workflow:



3. Create an SQL node in the workflow, select the newly created data source instance as shown, and enter SQL.




4. Save the node and workflow, then publish the workflow before running:



5. In the workflow instance, you can see the historical task list, click the instance name to view the results, logs, and other information:




Key Steps for Standard Presto Engine Docking with Apache DolphinScheduler

Currently does not support direct use of kyuubi or presto data sources to connect to the presto engine. Users can access DLC through kyuubi's Python node.

Configuring PYTHON_HOME

Find the configuration file dolphinscheduler_env.sh in the Apache DolphinScheduler installation path, modify the PYTHON_HOME parameter to the current Python path, or link Python to the specified location:
ln -s /usr/bin/python /opt/soft/python

Downloading the Tencent Cloud Python SDK

pip install --upgrade tencentcloud-sdk-python

Creating a Project and Workflow

1. Create a project under Project Management as shown:



2. Configuring the Node:



Sample Code:
Notes:
Note: This script reads the local sql_file_path/dlc.sql file and submits all SQL in the file to the specified DLC Presto engine.
Note: Remember to replace secretId, secretKey, region, engineName, file path, and other parameters in the script.
import json
import types
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.dlc.v20210125 import dlc_client, models
import base64
import time
import sys


def create_task(sql):
global err
try:
# Instantiate an authentication object. Pass in the SecretId and SecretKey of your Tencent Cloud account as input parameters and keep the key pair confidential.
# Code leakage may lead to the exposure of SecretId and SecretKey, threatening the security of all resources under the account. The following code example is for reference only. It is recommended to use a more secure way to handle the keys. Please refer to: https://cloud.tencent.com/document/product/1278/85305
# You can get the keys from the official console at https://console.cloud.tencent.com/cam/capi
cred = credential.Credential("secretId", "secretKey")
# Instantiate an HTTP option, which is optional and can be skipped without specific requirements.
httpProfile = HttpProfile()
httpProfile.endpoint = "dlc.tencentcloudapi.com"

# Instantiate a client option, which is optional and can be skipped without specific requirements.
clientProfile = ClientProfile()
clientProfile.httpProfile = httpProfile
// Instantiate the client object of the requested product. `clientProfile` is optional
client = dlc_client.DlcClient(cred, "region", clientProfile)

// Instantiate a request object. Each API corresponds to a request object
req = models.CreateTasksRequest()
base64sql = base64.b64encode(sql.encode('utf-8')).decode('utf-8')
# print(base64sql)
# print(base64sql)
params = {
"DatabaseName": "db",
"Tasks": {
"TaskType": "SQLTask",
"FailureTolerance": "Terminate",
"SQL": base64sql,
},
"DatasourceConnectionName": "DataLakeCatalog",
"DataEngineName": "engineName"
}
req.from_json_string(json.dumps(params))

// The returned `resp` is an instance of the `CreateTasksResponse` class which corresponds to the request object
resp = client.CreateTasks(req)
# A string return packet in JSON format is outputted
print(resp.to_json_string())

return resp
except TencentCloudSDKException as err:
print(err)

if __name__ == "__main__":
try:
sql_file = "sql_file_path/dlc.sql"
print(sql_file)
with open(sql_file, 'r') as file:
sqls = file.read()
print(sqls)
create_rsp = create_task(sqls)

except Exception as main_err:
print(main_err)

Key Steps for SuperSQL Engine Docking with Apache DolphinScheduler

Specifying Environment Variables





Creating a Project and Workflow

1. Create a project under Project Management as shown:



2. Enter the project to create a workflow:



3. Create a new Python script, drag the Python component to the editing area.



4. Set the node.



5. Set the script.



Sample code:
import jaydebeapi

jdbc_url = "jdbc:dlc:dlc.tencentcloudapi.com?task_type=SQLTask&datasource_connection_name=DataLakeCatalog®ion=ap-guangzhou&data_engine_name=public-engine"
user = "xx"
pwd = "xx"
driver = "com.tencent.cloud.dlc.jdbc.DlcDriver"
jar_file = '/opt/dolphinscheduler/libs/dlc-jdbc-2.2.3-jar-with-dependencies.jar'
sql = "select 1"
conn = jaydebeapi.connect(driver, jdbc_url, [user, pwd], jar_file)
curs = conn.cursor()
curs.execute(sql)
array_size = curs.arraysize.real
rowcount = curs.rowcount.real
print(array_size)
print(rowcount)
rows = curs.rowcount.real
if rows != 0:
result = curs.fetchall()
print(result)

curs.close()
conn.close()

Parameter description:
Parameters
Description
jdbc_url
JDBC connection address and configuration parameters. For details, see DLC JDBC Access.
user
SecretId
pwd
SecretKey
dirver
Load JDBC driver. For details, see DLC JDBC Access.
jar_file
Path to store driver jar package. For details, see DLC JDBC Access.
6. Click Workflow Save and OK.



7. Deploy the workflow.



8. Execute the workflow.



9. View the executing result.









Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon