tencent cloud

All product documents
Data Lake Compute
Python Access
Last updated: 2024-07-31 17:33:57
Python Access
Last updated: 2024-07-31 17:33:57
DLC offers tools compliant with the DBAPI 2.0 standard. You can connect to DLC's Presto/Spark engine via Python, allowing for convenient SQL operations on DLC database tables.

Environment preparations

1. Python 3.9 or higher version.
2. Install tencentcloud-dlc-connector.
pip install -i https://mirrors.tencent.com/pypi/simple/ tencentcloud-dlc-connector

Usage Examples

Step 1: Connect to the engine

Code:
import tdlc_connector
import datetime
from tdlc_connector import constants

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
token=None,
endpoint=None,
catalog=constants.Catalog.DATALAKECATALOG,
engine="<ENGINE>",
engine_type=constants.EngineType.AUTO,
result_style=constants.ResultStyles.LIST,
download=False,
mode=constants.Mode.LASY,
database='',
config={},
callback=None,
callback_events=None,
)
Parameter description:
Parameter
Description
region
Engine Region, such as ap-nanjing, ap-beijing, ap-guangzhou, ap-shanghai, ap-chengdu, ap-chongqing, na-siliconvalley, ap-singapore, ap-hongkong
secret_id
Tencent Cloud SecretID
secret_key
Tencent Cloud SecretKey
token
(Optional) Temporary Secret Token
endpoint
(Optional) Connect to the service node
engine
Engine name used, for example "test_python"
engine_type
(Optional) Engine type: corresponding to the engine type of the engine name, default value constants.EngineType.AUTO
For example: AUTO, PRESTO, SPARK, SPARK_BATCH
result_style
(Optional) Format of the returned result, options are LIST/DICT
download
(Optional) Whether to download the data directly True/False, see Download Mode Description
mode
(Optional) Mode. Supports ALL/LAZY/STREAM
database
(Optional) Default database
config
(Optional) Submit to cluster configuration
callback
(Optional) Callback function, function signature def cb(statement_id, status)
callback_events
(Optional) Callback trigger event, used in conjunction with callback, see callback mechanism description for details
driver_size
(Optional) Driver node size, default value constants.PodSize.SMALL (Only valid for SPARK_BATCH clusters)
Optional values: SMALL, MEDIUM, LARGE, XLARGE, M_SMALL, M_MEDIUM, M_LARGE, M_XLARGE
executor_size
(Optional) Executor node size, default value constants.PodSize.SMALL (Only valid for SPARK_BATCH clusters) Optional values: SMALL, MEDIUM, LARGE, XLARGE, M_SMALL, M_MEDIUM, M_LARGE, M_XLARGE
executor_num
(Optional) Number of Executor nodes, default value 1 (Only valid for SPARK_BATCH clusters)
executor_max_num
(Optional) Maximum number of Executor nodes, if not equal to executor_num, then enable Dynamic Resource Allocation (Only valid for SPARK_BATCH clusters)

Download Mode Explanation:

Serial number
download
mode
Description
1
False
ALL
Fetch all data from the interface, can only fetch data after completion
2
False
LASY
Fetch data from the interface, delay fetching data to the server based on the amount fetched
3
False
STREAM
Same as LASY mode
4
True
ALL
Download all results from COS (requires COS read permission) using local temporary storage, recommended for large data volumes
5
True
LASY
Download results from COS (requires COS read permission), delay downloading files based on fetch data volume
6
True
STREAM
Read result stream from COS in real-time (requires COS read permission), slower performance, extremely low local memory disk usage ratio

Step 2: Execute SQL

Code:
# Basic Operations

cursor = conn.cursor()
count = cursor.execute("SELECT 1")
print(cursor.fetchone()) # Read one line of data
for row in cursor.fetchall(): # Read the remaining multiple lines of data
print(row)

# Use the pyformat format

cursor.execute("SELECT * FROM dummy WHERE date < %s", datetime.datetime.now())
cursor.execute("SELECT * FROM dummy WHERE status in %s", (('SUCCESS', 'INIT', 'FAIL'),))
cursor.execute("SELECT * FROM dummy WHERE date < %(date)s AND status = %(status)s", {'date': datetime.datetime.now(), 'status': 'SUCCESS'})

# Use BULK method

cursor.executemany("INSERT INTO dummy VALUES(%s, %s)", [('Zhang San', 18), ('Li Si', 20)])

Basic Operation Procedure

The process of the aforementioned code is as follows:
1. A cursor object is created with conn.cursor().
2. A SQL query statement is executed with cursor.execute("SELECT 1"), and the result is assigned to the variable count.
3. A line of data is read through the cursor.fetchone() method and printed out.

Characteristic function

Description of the callback mechanism

import tdlc_connector
import datetime
from tdlc_connector import constants


def tdlc_connector_callback(statement_id, state):
'''
parmas: statement_id Quest id
params: state Task status. The enumeration value is constants.TaskStatus
'''
print(statement_id, state)


conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>",
engine_type=constants.EngineType.SPARK,
result_style=constants.ResultStyles.LIST,
callback=tdlc_connector_callback,
callback_events=[constants.CallbackEvent.ON_INIT, constants.CallbackEvent.ON_SUCCESS]
)

cursor = conn.cursor()
cursor.execute("SELECT 1")
cursor.fetchone()

# The callback function is called when the task is initialized and the task is successful


Submit the task to the job cluster

Currently, you can submit tasks to the Spark job cluster. For details, see the following example.
from tdlc_connector import constants

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>", # Select the spark job engine
result_style=constants.ResultStyles.LIST,
driver_size=constants.PodSize.SMALL, # Select Driver Specifications
executor_size=constants.PodSize.SMALL, # Select the Executor specification
executor_num=1, # Set the number of Executors
executor_max_num=1, # Set the maximum number of executors. If it is not equal to {executor_num}, enable dynamic resource allocation
)
Note:
Upgrade the connector to >= 1.1.0 to use this feature.

Automatically infer engine type

You do not need to specify the engine type. The connector will automatically infer the engine type. For details, see the following example.
from tdlc_connector import constants

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>",
engine_type=constants.EngineType.AUTO # This parameter can be set to AUTO or not to drive automatic inference
)
Note:
Upgrade the connector to >= 1.1.0 to use this feature.

Null conversion

The current result set is stored in CSV format, the engine will convert the null value into an empty string by default, if you need to distinguish the null value, please specify the null value symbol, such as "\\1", the engine query result will convert the null value into "\\1", while the driver will convert the "\\1" field into None, please refer to the following example.
from tdlc_connector import constants, formats

formats.FORMAT_STRING_NULL = '\\1'

conn = tdlc_connector.connect(region="<REGION>",
secret_id="<SECRET_ID>",
secret_key="<SECRET_KEY>",
engine="<ENGINE>",
result_style=constants.ResultStyles.LIST
)
Note:
Null conversion currently only supports SparkSQL clusters. Upgrade connector to >= 1.1.3.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support