tencent cloud

Feedback

Python Access

Last updated: 2024-07-31 17:33:57
    DLC offers tools compliant with the DBAPI 2.0 standard. You can connect to DLC's Presto/Spark engine via Python, allowing for convenient SQL operations on DLC database tables.

    Environment preparations

    1. Python 3.9 or higher version.
    2. Install tencentcloud-dlc-connector.
    pip install -i https://mirrors.tencent.com/pypi/simple/ tencentcloud-dlc-connector

    Usage Examples

    Step 1: Connect to the engine

    Code:
    import tdlc_connector
    import datetime
    from tdlc_connector import constants
    
    conn = tdlc_connector.connect(region="<REGION>",
    secret_id="<SECRET_ID>",
    secret_key="<SECRET_KEY>",
    token=None,
    endpoint=None,
    catalog=constants.Catalog.DATALAKECATALOG,
    engine="<ENGINE>",
    engine_type=constants.EngineType.AUTO,
    result_style=constants.ResultStyles.LIST,
    download=False,
    mode=constants.Mode.LASY,
    database='',
    config={},
    callback=None,
    callback_events=None,
    )
    Parameter description:
    Parameter
    Description
    region
    Engine Region, such as ap-nanjing, ap-beijing, ap-guangzhou, ap-shanghai, ap-chengdu, ap-chongqing, na-siliconvalley, ap-singapore, ap-hongkong
    secret_id
    Tencent Cloud SecretID
    secret_key
    Tencent Cloud SecretKey
    token
    (Optional) Temporary Secret Token
    endpoint
    (Optional) Connect to the service node
    engine
    Engine name used, for example "test_python"
    engine_type
    (Optional) Engine type: corresponding to the engine type of the engine name, default value constants.EngineType.AUTO
    For example: AUTO, PRESTO, SPARK, SPARK_BATCH
    result_style
    (Optional) Format of the returned result, options are LIST/DICT
    download
    (Optional) Whether to download the data directly True/False, see Download Mode Description
    mode
    (Optional) Mode. Supports ALL/LAZY/STREAM
    database
    (Optional) Default database
    config
    (Optional) Submit to cluster configuration
    callback
    (Optional) Callback function, function signature def cb(statement_id, status)
    callback_events
    (Optional) Callback trigger event, used in conjunction with callback, see callback mechanism description for details
    driver_size
    (Optional) Driver node size, default value constants.PodSize.SMALL (Only valid for SPARK_BATCH clusters)
    Optional values: SMALL, MEDIUM, LARGE, XLARGE, M_SMALL, M_MEDIUM, M_LARGE, M_XLARGE
    executor_size
    (Optional) Executor node size, default value constants.PodSize.SMALL (Only valid for SPARK_BATCH clusters) Optional values: SMALL, MEDIUM, LARGE, XLARGE, M_SMALL, M_MEDIUM, M_LARGE, M_XLARGE
    executor_num
    (Optional) Number of Executor nodes, default value 1 (Only valid for SPARK_BATCH clusters)
    executor_max_num
    (Optional) Maximum number of Executor nodes, if not equal to executor_num, then enable Dynamic Resource Allocation (Only valid for SPARK_BATCH clusters)
    
    Download Mode Explanation:
    
    Serial number
    download
    mode
    Description
    1
    False
    ALL
    Fetch all data from the interface, can only fetch data after completion
    2
    False
    LASY
    Fetch data from the interface, delay fetching data to the server based on the amount fetched
    3
    False
    STREAM
    Same as LASY mode
    4
    True
    ALL
    Download all results from COS (requires COS read permission) using local temporary storage, recommended for large data volumes
    5
    True
    LASY
    Download results from COS (requires COS read permission), delay downloading files based on fetch data volume
    6
    True
    STREAM
    Read result stream from COS in real-time (requires COS read permission), slower performance, extremely low local memory disk usage ratio

    Step 2: Execute SQL

    Code:
    # Basic Operations
    
    cursor = conn.cursor()
    count = cursor.execute("SELECT 1")
    print(cursor.fetchone()) # Read one line of data
    for row in cursor.fetchall(): # Read the remaining multiple lines of data
    print(row)
    
    # Use the pyformat format
    
    cursor.execute("SELECT * FROM dummy WHERE date < %s", datetime.datetime.now())
    cursor.execute("SELECT * FROM dummy WHERE status in %s", (('SUCCESS', 'INIT', 'FAIL'),))
    cursor.execute("SELECT * FROM dummy WHERE date < %(date)s AND status = %(status)s", {'date': datetime.datetime.now(), 'status': 'SUCCESS'})
    
    # Use BULK method
    
    cursor.executemany("INSERT INTO dummy VALUES(%s, %s)", [('Zhang San', 18), ('Li Si', 20)])

    Basic Operation Procedure

    The process of the aforementioned code is as follows:
    1. A cursor object is created with conn.cursor().
    2. A SQL query statement is executed with cursor.execute("SELECT 1"), and the result is assigned to the variable count.
    3. A line of data is read through the cursor.fetchone() method and printed out.

    Characteristic function

    Description of the callback mechanism

    import tdlc_connector
    import datetime
    from tdlc_connector import constants
    
    
    def tdlc_connector_callback(statement_id, state):
    '''
    parmas: statement_id Quest id
    params: state Task status. The enumeration value is constants.TaskStatus
    '''
    print(statement_id, state)
    
    
    conn = tdlc_connector.connect(region="<REGION>",
    secret_id="<SECRET_ID>",
    secret_key="<SECRET_KEY>",
    engine="<ENGINE>",
    engine_type=constants.EngineType.SPARK,
    result_style=constants.ResultStyles.LIST,
    callback=tdlc_connector_callback,
    callback_events=[constants.CallbackEvent.ON_INIT, constants.CallbackEvent.ON_SUCCESS]
    )
    
    cursor = conn.cursor()
    cursor.execute("SELECT 1")
    cursor.fetchone()
    
    # The callback function is called when the task is initialized and the task is successful
    

    Submit the task to the job cluster

    Currently, you can submit tasks to the Spark job cluster. For details, see the following example.
    from tdlc_connector import constants
    
    conn = tdlc_connector.connect(region="<REGION>",
    secret_id="<SECRET_ID>",
    secret_key="<SECRET_KEY>",
    engine="<ENGINE>", # Select the spark job engine
    result_style=constants.ResultStyles.LIST,
    driver_size=constants.PodSize.SMALL, # Select Driver Specifications
    executor_size=constants.PodSize.SMALL, # Select the Executor specification
    executor_num=1, # Set the number of Executors
    executor_max_num=1, # Set the maximum number of executors. If it is not equal to {executor_num}, enable dynamic resource allocation
    )
    Note:
    Upgrade the connector to >= 1.1.0 to use this feature.

    Automatically infer engine type

    You do not need to specify the engine type. The connector will automatically infer the engine type. For details, see the following example.
    from tdlc_connector import constants
    
    conn = tdlc_connector.connect(region="<REGION>",
    secret_id="<SECRET_ID>",
    secret_key="<SECRET_KEY>",
    engine="<ENGINE>",
    engine_type=constants.EngineType.AUTO # This parameter can be set to AUTO or not to drive automatic inference
    )
    Note:
    Upgrade the connector to >= 1.1.0 to use this feature.

    Null conversion

    The current result set is stored in CSV format, the engine will convert the null value into an empty string by default, if you need to distinguish the null value, please specify the null value symbol, such as "\\1", the engine query result will convert the null value into "\\1", while the driver will convert the "\\1" field into None, please refer to the following example.
    from tdlc_connector import constants, formats
    
    formats.FORMAT_STRING_NULL = '\\1'
    
    conn = tdlc_connector.connect(region="<REGION>",
    secret_id="<SECRET_ID>",
    secret_key="<SECRET_KEY>",
    engine="<ENGINE>",
    result_style=constants.ResultStyles.LIST
    )
    Note:
    Null conversion currently only supports SparkSQL clusters. Upgrade connector to >= 1.1.3.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support