tencent cloud

All product documents
Data Lake Compute
SQL Editor
Last updated: 2024-07-17 17:36:45
SQL Editor
Last updated: 2024-07-17 17:36:45
The SQL editor provided by Data Lake Compute (DLC) supports data querying using unified SQL statements, compatible with SparkSQL. You can complete data query tasks using standard SQL.
You can access the SQL editor through data exploration, where you can perform simple data management, multi-session data queries, query record management, and download record management.

Data Management

Data management supports adding data sources, managing databases, and managing data tables.

Creating a data catalog

Currently, Data Lake Compute supports the management of COS and EMR Hive data catalogs. The directions are as follows:
1. Log in to the Data Lake Compute console and select the service region. You need to have the admin permission.
2. Select Data Explore on the left sidebar, hover over

on the Database & table tab, and click Create catalog.

For detailed directions, see Querying Data from Other Sources.

Managing a database

You can create, delete, and view the details of a database in the SQL editor.

Managing a data table

You can create, query, and view the details of a data table in the SQL editor.

Changing the default database

You can use the SQL editor to specify the default database for query tasks. If no database is specified in a query statement, the query will be executed in the default database.
1. Log in to the Data Lake Compute console and select the service region.
2. Select Data Explore on the left sidebar, hover over the target database name, click

, and click Set as default database to set the database as the default database.


3. You can also change the default database in the Default database selection box.



Data Query

Add Query Page

The SQL editor supports adding multiple pages for data querying, with each query page having independent configurations (default database, computation engine used, query records, etc.). This facilitates users in running and managing multiple tasks.
You can create a new query page by clicking on the

icon, and switch the editor interface by clicking on the tab bar.



For your convenience, you can save frequently used query pages by clicking the Save button. You can also quickly open your saved pages by clicking the

icon.



For saved query page information, you can click the Refresh button to update and synchronize the saved information, ensuring the accuracy of the query statement.



The editor supports running multiple different SQL statements simultaneously. Clicking the Run button will execute all SQL statements within the editor, simultaneously dividing them into multiple SQL tasks.
If you need to run a portion of the statement, select the required statement and click Partial run.




Engine Parameter Configuration

After selecting the data engine, you can configure parameters for the data engine. After selecting the data engine, click Add in Advanced Settings to configure.


The currently supported configuration parameters are as follows:
Engine
Configuration name
Start Value
Configuration Notes
SparkSQL
spark.sql.files.maxRecordsPerFile
0
The maximum number of records that can be written to a single file.
If this value is zero or negative, there are no restrictions.
spark.sql.autoBroadcastJoinThreshold
10MB
Configure the maximum byte size of the table of all working nodes displayed when executing a connection.
By setting this value to "-1", the display can be disabled.
spark.sql.shuffle.partitions
200
Default Partition Count.
spark.sql.sources.partitionOverwriteMode
static
When the value is set to static, all qualifying partitions will be deleted prior to executing the overwrite operation.
For instance, in a partitioned table, there is a partition "2022-01". When using the INSERT OVERWRITE statement to write data to the "2022-02" partition, the data in the "2021-01" partition will also be overwritten.
When the value is set to 'dynamic', partitions will not be deleted in advance, but will be overwritten during runtime for those partitions where data is written.
spark.sql.files.maxPartitionBytes
128MB
The maximum number of bytes to be packaged into a single partition when reading a file.
Presto
use_mark_distinct
true
Determines whether the engine redistributes data when executing the distinct function.
If the distinct function is called multiple times in a query, it is recommended to set this parameter to false.
USEHIVEFUNCTION
true
Determines whether to use Hive functions when executing a query; if you need to use Presto native functions, please set the parameter to false.
query_max_execution_time
-
This setting is used to establish a query timeout. If the execution time of a query exceeds the set time, the query will be terminated. The units supported are d-day, h-hour, m-minute, s-second, ms-millisecond (for example, 1d represents 1 day, 3m represents 3 minutes).
dlc.query.execution.mode
async
The engine query execution mode is set to async mode by default. In this mode, the task will perform a complete query calculation, save the results to COS, and then return them to the user, allowing the user to download the query results after the query is completed.
Users can also change this value to sync. In sync mode, queries may not necessarily perform full calculations. Once partial results are available, they will be directly returned to the user by the engine, without being saved to COS. Therefore, users can achieve lower query latency and duration, but the results are only saved in the system for 30 seconds. This mode is recommended for users who do not need to download the complete query results from COS, but expect lower query latency and duration, such as during the query exploration phase or BI result display.

Presto Execution Mode

When the user selects the Presto engine, Data Exploration supports the user to choose to run in "Fast Mode" or "Full Mode".
Quick Query: This offers faster speed, but the query results cannot be persistently saved. It is suitable for the exploration phase.
Full Mode: Execute a full query and save the data to object storage.




Search results

Through the SQL editor, you can directly view the query results. You can expand or collapse the display height of the query results by clicking the

chart.



You can configure the query result storage directory through the configuration button in the upper right corner, supporting configuration to the COS path or built-in storage.



The console will return a maximum of 1000 results for a single task. If more results are needed, the API can be used. For instructions on API-related operations, refer to the API Documentation.
Query results can be downloaded locally when no COS storage path is specified. For detailed instructions, refer to Obtaining Task Results.

Querying statistical data

The query results under the Presto engine and SparkSQL engine support the display of optimized quantification with different characteristics.
The SparkSQL engine supports viewing:
1. Data Scanning Volume
2. Cache Acceleration
3. Adaptive Shuffle
4. Materialized View Acceleration
The Presto engine supports viewing:
1. Data Scanning Volume
2. Cache Acceleration
3. Materialized View Acceleration
Click on the Statistics column to review the statistical data and optimization suggestions for the query results.




Historical Queries

Each query page can save the running history of the past three months and supports viewing the query results of the past 24 hours. You can quickly find past task information through the running history. For detailed operations, refer to Task History Records.

Download History Management

Each query result's download task can be viewed in the Download history, where you can check the status of the download task and related parameter information.




Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon