Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Data Lake Compute

Product Introduction

Purchase Guide

Configuration Adjustment Fees

Getting Started

Complete Process for New User Activation

DLC Data Import Guide

Quick Start with Data Analytics in Data Lake Compute

Quick Start with Permission Management in Data Lake Compute

Quick Start with Partition Table

Enabling Data Optimization

Cross-Source Analysis of EMR Hive Data

Standard Engine Configuration Guide

Operation Guide

Console Operation Introduction

Data Development and Exploration

Data Exploration

SQL Editor

Data Query Task

SELECT Task

Querying Partition Table

Querying JSON Data

Querying Data from Other Sources

Using View

INSERT INTO

Querying Script Parameters

Obtaining Task Results

Query Script Analysis

Data Job

Overview

Configuring Data Access Policy

Creating Data Job

Managing Data Job

PySpark Dependency Package Management

Resource Management

Engine Management

Network Connection Configuration

Storage Configuration

Managed Storage Configuration

Binding a Metadata Acceleration Bucket

Metadata Management

Data Catalogs and DMC

Data Table Management

Data View Management

Function Management

Partition Field Policy

Ops Management

Historical Task Instances

Historical task(Old version)

Session Management

Insight Management

Task Insights

System Management

User and Permission Management

CAM Service

Permission Overview

User and Work Group

Sub-Account Permission Management

Monitoring and Alarms

Data Engine Monitoring

Data Job Monitoring

Access Point Gateway Engine Monitoring

Monitoring Alarm Configuration

Audit Log

Development Guide

SparkJar Job Development Guide

PySpark Job Development Guide

Query Performance Optimization Guide

UDF Function Development Guide

Materialized View

System Restraints

Metadata Information

Computing Task

Client Access

JDBC Access

DLC JDBC Access

Hive JDBC Access

Presto JDBC Access

Configuring Public Access for Standard Engine

TDLC Command Line Interface Tool Access

Third-party Software Linkage

Python Access

Practical Tutorial

Table Creation Practice

Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Direct Query of DLC Internal Storage with StarRocks

DLC Native Table

DLC Source Table Core Capabilities

DLC Source Table Operation Configuration

DLC Source Table Lake Ingestion Practice

DLC Source Table FAQs

SQL Statement

SuperSQL Statement

Overview of SuperSQL Statement

Unified Statement

DDL Statement

ALTER DATABASE

ALTER DATABASE SET DBPROPERTIES

ALTER DATABASE SET LOCATION

DROP DATABASE

CREATE TABLE

REPLACE TABLE AS SELECT

SHOW COLUMNS IN TABLE

ALTER TABLE

ALTER TABLE ADD COLUMNS

ALTER TABLE ADD COLUMN AFTER/FIRST

ALTER TABLE DROP COLUMN

ALTER TABLE ADD PARTATION

SHOW PARTITIONS

ALTER TABLE DROP PARTITION

ALTER TABLE ADD PARTITION FIELD

ALTER TABLE DROP PARTITION FIELD

ALTER TABLE ... RENAME COLUMN

ALTER TABLE SET TBLPROPERTIES

ALTER TABLE SET LOCATION

ALTER TABLE ... WRITE ORDERED BY

ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION

ALTER TABLE ... SET IDENTIFIER FIELDS

ALTER TABLE ... DROP IDENTIFIER FIELDS

ALTER VIEW

ALTER VIEW SET TBLPROPERTIES

DML Statement

DQL Statement

Iceberg Table Statement

Differences in Statement Between Iceberg External Tables and Native Tables

Materialized View Statement

SQL Implicit Conversion

Functions

Unified Functions

Overview of Unified Functions

Binary Functions

Bitwise Functions

Collection Functions

Date and Time Functions

JSON Functions

Mathematical Functions

Presto Built-in Functions

Comparison of Hive Functions

Overview of Standard Spark Statement

Overview of Standard Presto Statement

API Documentation

Making API Requests

Data Table APIs

DescribeLakeFsDirSummary

DescribeLakeFsInfo

QueryResult

GenerateCreateMangedTableSql

Task APIs

Metadata APIs

DescribeForbiddenTablePro

DescribeDLCCatalogAccess

GrantDLCCatalogAccess

RevokeDLCCatalogAccess

DropDMSTable

DropDLCTable

DescribeDMSDatabaseList

Service Configuration APIs

CreateCHDFSBindingProduct

DeleteCHDFSBindingProduct

DescribeOtherCHDFSBindingList

CreateStoreLocation

DescribeStoreLocation

ModifyDataEngineDescription

RollbackDataEngineImage

SwitchDataEngine

SwitchDataEngineImage

UpgradeDataEngineImage

DeleteThirdPartyAccessUser

DescribeDataEngineImageVersions

DescribeSubUserAccessPolicy

DescribeThirdPartyAccessUser

RegisterThirdPartyAccessUser

RestartDataEngine

UpdateUserDataEngineConfig

UpdateDataEngineConfig

Permission Management APIs

Database APIs

ModifyAdvancedStoreLocation

ModifyGovernEventRule

DescribeAdvancedStoreLocation

Data Source Connection APIs

CheckDataEngineImageCanBeRollback

CheckDataEngineImageCanBeUpgrade

DescribeDataEnginePythonSparkImages

Data Optimization APIs

GetOptimizerPolicy

Data Engine APIs

CreateDataEngine

DescribeDataEnginesScaleDetail

DeleteDataEngine

RenewDataEngine

SuspendResumeDataEngine

UpdateDataEngine

DescribeUpdatableDataEngines

DescribeDataEngine

DescribeUserDataEngineConfig

CheckDataEngineConfigPairsValidity

General Reference

Operation Guide on Connecting Third-Party Software to DLC

Connecting CBoard to DLC

DLC Policy

Data Privacy And Security Agreement

Service Level Agreement

DocumentationData Lake ComputePractical TutorialUsing Apache Airflow to Schedule DLC Engine to Submit Tasks

Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Download PDF

Last updated: 2025-03-21 12:24:55

Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Last updated: 2025-03-21 12:24:55

Download PDF

This article introduces the support of DLC for Apache DolphinScheduler scheduling tools and provides examples to demonstrate how to use Apache DolphinScheduler to run DLC engine to submit tasks.
Background
Apache DolphinScheduler is an open-source distributed workflow scheduling system designed to provide efficient task scheduling and management for big data scenarios. It supports visual workflow design, task dependency management, scheduled scheduling, and other features, suitable for data processing, ETL, and machine learning applications. For more information, please visit Apache DolphinScheduler Official Website.
Prerequisites
1. Apache DolphinScheduler Environment Preparation.
1.1. Apache DolphinScheduler has been installed and started. For more installation and startup operations of Apache DolphinScheduler, please refer to Apache DolphinScheduler Quick Start.
2. Data Lake Compute (DLC) environment preparation.
2.1. The Data Lake Compute DLC engine service has been activated.
2.2. If using the SuperSQL engine, prepare the DLC JDBC driver, click to download the JDBC driver.
Key Steps for Standard Spark Engine Docking with Apache DolphinScheduler
Choosing to Add a Kyuubi or Spark Data Source
Notes:
Note: Apache DolphinScheduler versions below 3.2.1 do not support kyuubi data source and can only choose spark data source.
﻿
﻿
﻿
Set the parameters as shown in the figure:
﻿
﻿
﻿
Parameter description:
Parameters
Required
Description
Source Name
Yes
Customizable.
Description
No
Customizable.
IP hostname
Yes
Fill in the IP that can access the engine.
For intranet access, you can check it in the DLC console as shown below:
﻿
﻿
﻿
For public network access, you need to enable public network access for the engine first. Please refer to Configuring Public Network Access for the Engine.
Port
Yes
10009
Username
Yes
Fill in engine id and resource group id, separated by "&".
Password
Yes
Fill in secretid and secretkey, separated by "&".
Creating a Project and Workflow
1. Create a project under Project Management as shown:
﻿
﻿
﻿
2. Enter the project to create a workflow:
﻿
﻿
﻿
3. Create an SQL node in the workflow, select the newly created data source instance as shown, and enter SQL.
﻿
﻿
﻿
﻿
4. Save the node and workflow, then publish the workflow before running:
﻿
﻿
﻿
5. In the workflow instance, you can see the historical task list, click the instance name to view the results, logs, and other information:
﻿
﻿
﻿
Key Steps for Standard Presto Engine Docking with Apache DolphinScheduler
Currently does not support direct use of kyuubi or presto data sources to connect to the presto engine. Users can access DLC through kyuubi's Python node.
Configuring PYTHON_HOME
Find the configuration file dolphinscheduler_env.sh in the Apache DolphinScheduler installation path, modify the PYTHON_HOME parameter to the current Python path, or link Python to the specified location:
ln -s /usr/bin/python /opt/soft/python
Downloading the Tencent Cloud Python SDK
pip install --upgrade tencentcloud-sdk-python
Creating a Project and Workflow
1. Create a project under Project Management as shown:
﻿
﻿
﻿
2. Configuring the Node:
﻿
﻿
﻿
﻿Sample Code:
Notes:
Note: This script reads the local sql_file_path/dlc.sql file and submits all SQL in the file to the specified DLC Presto engine.
Note: Remember to replace secretId, secretKey, region, engineName, file path, and other parameters in the script.
import json
import types
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.dlc.v20210125 import dlc_client, models
import base64
import time
import sys
﻿
﻿
def create_task(sql):
    global err
    try:
        # Instantiate an authentication object. Pass in the SecretId and SecretKey of your Tencent Cloud account as input parameters and keep the key pair confidential.
        # Code leakage may lead to the exposure of SecretId and SecretKey, threatening the security of all resources under the account. The following code example is for reference only. It is recommended to use a more secure way to handle the keys. Please refer to: https://cloud.tencent.com/document/product/1278/85305
        # You can get the keys from the official console at https://console.cloud.tencent.com/cam/capi
        cred = credential.Credential("secretId", "secretKey")
        # Instantiate an HTTP option, which is optional and can be skipped without specific requirements.
        httpProfile = HttpProfile()
        httpProfile.endpoint = "dlc.tencentcloudapi.com"
﻿
        # Instantiate a client option, which is optional and can be skipped without specific requirements.
        clientProfile = ClientProfile()
        clientProfile.httpProfile = httpProfile
        // Instantiate the client object of the requested product. `clientProfile` is optional
        client = dlc_client.DlcClient(cred, "region", clientProfile)
﻿
        // Instantiate a request object. Each API corresponds to a request object
        req = models.CreateTasksRequest()
        base64sql = base64.b64encode(sql.encode('utf-8')).decode('utf-8')
        # print(base64sql)
        # print(base64sql)
        params = {
            "DatabaseName": "db",
            "Tasks": {
                "TaskType": "SQLTask",
                "FailureTolerance": "Terminate",
                "SQL": base64sql,
            },
            "DatasourceConnectionName": "DataLakeCatalog",
            "DataEngineName": "engineName"
        }
        req.from_json_string(json.dumps(params))
﻿
        // The returned `resp` is an instance of the `CreateTasksResponse` class which corresponds to the request object
        resp = client.CreateTasks(req)
        # A string return packet in JSON format is outputted
        print(resp.to_json_string())
﻿
        return resp
    except TencentCloudSDKException as err:
        print(err)
      
﻿
if __name__ == "__main__":
    try:
        sql_file = "sql_file_path/dlc.sql"
        print(sql_file)
        with open(sql_file, 'r') as file:
            sqls = file.read()
            print(sqls)
        create_rsp = create_task(sqls)
﻿
    except Exception as main_err:
        print(main_err)
Key Steps for SuperSQL Engine Docking with Apache DolphinScheduler
Specifying Environment Variables
﻿
﻿
﻿
Creating a Project and Workflow
1. Create a project under Project Management as shown:
﻿
﻿
﻿
2. Enter the project to create a workflow:
﻿
﻿
﻿
3. Create a new Python script, drag the Python component to the editing area.
﻿
﻿
﻿
4. Set the node.
﻿
﻿
﻿
5. Set the script.
﻿
﻿
﻿
Sample code:
import jaydebeapi
﻿
jdbc_url = "jdbc:dlc:dlc.tencentcloudapi.com?task_type=SQLTask&datasource_connection_name=DataLakeCatalog®ion=ap-guangzhou&data_engine_name=public-engine"
user = "xx"
pwd = "xx"
driver = "com.tencent.cloud.dlc.jdbc.DlcDriver"
jar_file = '/opt/dolphinscheduler/libs/dlc-jdbc-2.2.3-jar-with-dependencies.jar'
sql = "select 1"
conn = jaydebeapi.connect(driver, jdbc_url, [user, pwd], jar_file)
curs = conn.cursor()
curs.execute(sql)
array_size = curs.arraysize.real
rowcount = curs.rowcount.real
print(array_size)
print(rowcount)
rows = curs.rowcount.real
if rows != 0:
    result = curs.fetchall()
    print(result)
﻿
curs.close()
conn.close()
﻿
Parameter description:
Parameters
Description
jdbc_url
JDBC connection address and configuration parameters. For details, see DLC JDBC Access.
user
SecretId
pwd
SecretKey
dirver
Load JDBC driver. For details, see DLC JDBC Access.
jar_file
Path to store driver jar package. For details, see DLC JDBC Access.
6. Click Workflow Save and OK.
﻿
﻿
﻿
7. Deploy the workflow.
﻿
﻿
﻿
8. Execute the workflow.
﻿
﻿
﻿
9. View the executing result.
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Parameters	Required	Description
Source Name	Yes	Customizable.
Description	No	Customizable.
IP hostname	Yes	Fill in the IP that can access the engine. For intranet access, you can check it in the DLC console as shown below: For public network access, you need to enable public network access for the engine first. Please refer to Configuring Public Network Access for the Engine.
Port	Yes	10009
Username	Yes	Fill in engine id and resource group id, separated by "&".
Password	Yes	Fill in secretid and secretkey, separated by "&".

Parameters	Description
jdbc_url	JDBC connection address and configuration parameters. For details, see DLC JDBC Access.
user	SecretId
pwd	SecretKey
dirver	Load JDBC driver. For details, see DLC JDBC Access.
jar_file	Path to store driver jar package. For details, see DLC JDBC Access.

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha