Export Query Results

Tencent Cloud TCHouse-D

Product Introduction

Purchase Guide

Configuration Adjustment Billing Instructions

Getting Started

Using Tencent Cloud TCHouse-D Through the Console

Using Tencent Cloud TCHouse-D Through a Client

Operation Guide

Cluster Operation

Creating and Terminating

Scale-Out and Scale-In

Configuration Adjustment

Monitoring and Alarm Configuration

Cluster Monitoring

FE Monitoring Metric

BE Monitoring Metric

Business Monitoring Metrics

Alarm Configuration

Account Privilege Management

Privilege Management Through a Client

Data Management

Data Backup

Data Recovery

Cross-Cluster DataMigration

Database Auditing

Cold-Hot Data Tiering

Query Management

Query Analysis

Slow Query Analysis

Modify Configurations

Modify Configurations Through Console

Modify Configurations Through Client

Node Management

Log Analysis

SQL Studio

Enabling Resource Isolation

Development Guide

Design of Data Table

Data Table and Data Model

Data Partition and Bucket

Data Distribution and Replica

Index, Sorted Column and Prefix Index

Importing Data

Import Overview

Import Methods

Stream Load (local file)

Broker Load (HDFS Data)

S3 Load ( Cloud Object Storage, COS)

Spark Load

Routine Load (Kafka Data)

Flink Connector (Real-time or Batch Data with Flink)

INSERT INTO

Synchronizing Data with JDBC

Synchronizing Data Through External Tables

Real-time or Batch Writing of MySQL Data

Importing Data from Kafka

Importing with DataX

Importing Doris from Logstash

Data Transformation, Column Mapping and Filtering

Importing Strict Mode

Importing Data in JSON Format

Exporting Data

Exporting via the EXPORT Statement

Using Mysqldump Tool to Export Table Structure or Data

Basic Feature

Table Structure Change

Two Tables of Atomic Swap (Swap)

Query Optimization

Caching Table or Partition to Memory

Colocation Join

Bucket Shuffle Join

Runtime Filter

Query Profiling and Optimization

Query Cache Configuration

Ecological Expansion Feature

API Documentation

Making API Requests

Cluster Operation APIs

ModifyClusterConfigs

ModifyInstanceKeyValConfigs

ModifyNodeStatus

RestartClusterForConfigs

RestartClusterForNode

DescribeClusterConfigsHistory

ModifyInstance

DescribeCreateTablesDDL

DescribeInstanceOperationHistory

Database and Table APIs

ExecuteParametrizedQuery

Cluster Information Viewing APIs

DescribeClusterConfigs

DescribeAreaRegion

DescribeInstancesHealthState

DescribeSpec

DescribeInstance

DescribeInstanceState

DescribeInstances

DescribeInstanceNodesInfo

DescribeInstanceUsedSubnets

DescribeSqlApis

DescribeInstanceNodes

DescribeInstanceNodesRole

Hot-Cold Data Layering APIs

CheckCoolDownWorkingVariableConfigCorrect

DescribeCoolDownBackends

DescribeCoolDownPolicies

DescribeCoolDownTableData

Database and Operation Audit APIs

DescribeBackUpJobDetail

DescribeBackUpTables

DescribeBackUpTaskDetail

DescribeRestoreTaskDetail

DescribeBackUpSchedules

RecoverBackUpJob

DescribeInstanceOperations

DescribeDatabaseAuditDownload

DescribeDatabaseAuditRecords

DescribeSlowQueryRecords

DescribeSlowQueryRecordsDownload

DescribeQueryAnalyse

User and Permission APIs

ActionAlterUser

ModifyUserPrivilegesV3

DescribeUserPolicy

ModifyDatabaseTableAccess

Resource Group Management APIs

DescribeUserBindWorkloadGroup

CreateWorkloadGroup

DeleteWorkloadGroup

DescribeWorkloadGroup

ModifyWorkloadGroup

ModifyUserBindWorkloadGroup

ModifyWorkloadGroupStatus

Data Types

Error Codes

Cloud Ecosystem

Granting CAM Policies to Sub-accounts

Query Acceleration for Tencent Cloud DLC

Practical Tutorial

Basic Feature Usage

Advanced Features Usage

Resource Specification Selection and Optimization Suggestions

Naming Specifications and Limits to the Database and Data Table

Table Design and Data Import

Query Optimization

Suggested Usage to Avoid

Accessing TCHouse-D via JDBC over the Public Network

Performance Testing

TPC-H Performance Testing

Test Scheme Introduction

Test Results for Reference

SSB Performance Testing

Test Scheme Introduction

Test Results for Reference

TPC-DS 性能测试

Testing Scheme Introduction

Test Results for Reference

FAQs

Common Operational Issues

Common Errors

Glossary

Product Policy

Service Level Agreement

Data Processing And Security Agreement

DocumentationTencent Cloud TCHouse-DDevelopment GuideExporting DataExport Query Results

Export Query Results

Download PDF

Last updated: 2024-06-27 11:07:10

Export Query Results

Last updated: 2024-06-27 11:07:10

Download PDF

This document explains how to use the OUTFILE command to export query results.
Syntax
The SELECT INTO OUTFILE statement can export query results to a file. Currently, it supports exporting to remote storage such as HDFS, S3, BOS, and COS (Tencent Cloud) through the Broker process, the S3 protocol, or direct through the HDFS protocol. The syntax is as follows:
query_stmt
INTO OUTFILE "file_path"
[format_as]
[properties]
file_path``file_path points to the path and file prefix of Cloud File Storage. Such as hdfs://path/to/my_file_.
The final filename will be composed of my_file_, file number and file format suffix. The file number starts from 0, and the quantity is the number of times the file is divided, such as:
my_file_abcdefg_0.csv
my_file_abcdefg_1.csv
my_file_abcdegf_2.csv
[format_as]
FORMAT AS CSV
Specifies the export format. The default is CSV.
[properties] 
Specify related attributes. Currently, it supports exporting through the Broker process, or through the S3 protocol.
Broker related attributes must be prefixed with broker..
HDFS related attributes must be prefixed with hdfs..
For the S3 protocol, you can directly execute the S3 protocol configuration.
("broker.prop_key" = "broker.prop_val", ...)
or
("hdfs.fs.defaultFS" = "xxx", "hdfs.hdfs_user" = "xxx")
or 
("AWS_ENDPOINT" = "xxx", ...)
Other properties:
("key1" = "val1", "key2" = "val2", ...)
The following properties are currently supported:
column_separator: Column separator, only applicable to CSV format. Default is \\t.
line_delimiter: Row separator, only applicable to CSV format. Default is \\n.
max_file_size: Maximum size of a single file. The default is 1GB. The range is between 5MB and 2GB. Files larger than this size will be split.
schema: PARQUET file schema information. Only applicable to PARQUET format. When the files are exported in the PARQUET format, you must specify schema.
Concurrent Export
By default, the export of query result sets is non-concurrent, that is, single-point export. If users wish to export the query result sets concurrently, the following conditions must be met:
1. Enable concurrent export with the session variable 'enable_parallel_outfile': set enable_parallel_outfile = true;.
2. The export mode is through S3 or HDFS, not through a broker.
3. The query can meet the requirements for concurrent export, for example, the top level does not include a single-point node like sort. (Examples will be given later to explain which queries cannot export result sets concurrently).
The above three conditions can trigger the concurrent export of query results. The degree of concurrency is be_instance_num * parallel_fragment_exec_instance_num.
How to Verify Whether a Result Set Has Been Exported Concurrently
After the user enables concurrent export through the session variable Setting, if you want to verify whether the current query can be exported concurrently, you can use the following method.
explain select xxx from xxx where xxx into outfile "s3://xxx" format as csv properties ("AWS_ENDPOINT" = "xxx", ...);
After explaining the query, Doris will return the planning of this query. If you see RESULT FILE SINK appearing in PLAN FRAGMENT 1, it means that the concurrent export was successful.
If RESULT FILE SINK appears in PLAN FRAGMENT 0, it means that the current query cannot be exported concurrently (the current query does not simultaneously satisfy the three conditions of concurrent export).
Concurrent Export Planning Example:
+-----------------------------------------------------------------------------+
| Explain String                                                              |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                             |
|  OUTPUT EXPRS:<slot 2> | <slot 3> | <slot 4> | <slot 5>                     |
|   PARTITION: UNPARTITIONED                                                  |
|                                                                             |
|   RESULT SINK                                                               |
|                                                                             |
|   1:EXCHANGE                                                                |
|                                                                             |
| PLAN FRAGMENT 1                                                             |
|  OUTPUT EXPRS:`k1` + `k2`                                                   |
|   PARTITION: HASH_PARTITIONED: `default_cluster:test`.`multi_tablet`.`k1`   |
|                                                                             |
|   RESULT FILE SINK                                                          |
|   FILE PATH: s3://ml-bd-repo/bpit_test/outfile_1951_                        |
|   STORAGE TYPE: S3                                                          |
|                                                                             |
|   0:OlapScanNode                                                            |
|      TABLE: multi_tablet                                                    |
+-----------------------------------------------------------------------------+
Usage Examples
1. Example 1
Exporting using a broker, exporting the simple query result to the file hdfs://path/to/result.txt. Specify the export format as CSV. Use Broker_Doris and set Kerberos authentication information. Specify the column separator as , and the row separator as \\n.
SELECT * FROM tbl
INTO OUTFILE "hdfs://path/to/result_"
FORMAT AS CSV
PROPERTIES
(
     "broker.name" = "Broker_Doris",
     "broker.hadoop.security.authentication" = "kerberos",
     "broker.kerberos_principal" = "doris@YOUR.COM",
     "broker.kerberos_keytab" = "/home/doris/my.keytab",
     "column_separator" = ",",
     "line_delimiter" = "\\n",
     "max_file_size" = "100MB"
);
If the final file generated is no larger than 100MB, it will be: result_0.csv. If it is larger than 100MB, it may be: result_0.csv, result_1.csv, ....
2. Example 2
Export the simple query result to the file hdfs://path/to/result.parquet. Specify the export format as PARQUET. Use Broker_Doris and set Kerberos authentication information.
SELECT c1, c2, c3 FROM tbl
INTO OUTFILE "hdfs://path/to/result_"
FORMAT AS PARQUET
PROPERTIES
(
     "broker.name" = "Broker_Doris",
     "broker.hadoop.security.authentication" = "kerberos",
     "broker.kerberos_principal" = "doris@YOUR.COM",
     "broker.kerberos_keytab" = "/home/doris/my.keytab",
     "schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
);
Exporting query results to a parquet file requires explicitly specifying the schema.
3. Example 3
Export the query result of the CTE statement to the file hdfs://path/to/result.txt. The default export format is CSV. Use Broker_Doris and set hdfs high availability information. Use the default row and column separators.
WITH
x1 AS
(SELECT k1, k2 FROM tbl1),
x2 AS
(SELECT k3 FROM tbl2)
SELEC k1 FROM x1 UNION SELECT k3 FROM x2
INTO OUTFILE "hdfs://path/to/result_"
PROPERTIES
(
     "broker.name" = "Broker_Doris",
     "broker.username"="user",
     "broker.password"="passwd",
     "broker.dfs.nameservices" = "my_ha",
     "broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
     "broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
     "broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
     "broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
If the final file generated is no larger than 1GB, it will be: result_0.csv. If it is larger than 1GB, it may be: result_0.csv, result_1.csv, ....
4. Example 4
Export the query results of the UNION statement to the file bos://bucket/result.txt. Specify the export format as PARQUET. Use Broker_Doris and set hdfs for high availability. The PARQUET format does not need to specify a column divider. After the export is finished, an identification file will be generated.
SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
INTO OUTFILE "bos://bucket/result_"
FORMAT AS PARQUET
PROPERTIES
(
     "broker.name" = "Broker_Doris",
     "broker.bos_endpoint" = "http://bj.bcebos.com",
     "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
     "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
     "schema"="required,int32,k1;required,byte_array,k2"
);
5. Example 5
Export the query results of the select statement to the file cos://${bucket_name}/path/result.txt. Specify the export format as csv.
After the export is completed, an identification file will be generated.
select k1,k2,v1 from tbl1 limit 100000
into outfile "cosn://my_bucket/export/my_file_"
FORMAT AS CSV
PROPERTIES
(
     "broker.name" = "Broker_Doris",
     "broker.fs.cosn.userinfo.secretId" = "xx",
     "broker.fs.cosn.userinfo.secretKey" = "xx",
     "broker.fs.cosn.bucket.endpoint_suffix" = "cos.<REGION>.myqcloud.com",
     "column_separator" = ",",
     "line_delimiter" = "\\n",
     "max_file_size" = "1024MB",
     "success_file_name" = "SUCCESS"
)
If the final file is not larger than 1GB, it will be: my_file_0.csv. If it is larger than 1GB, it might be: my_file_0.csv, result_1.csv, ....
Verify on cos
Paths that do not exist will be automatically created.
Confirm the access.key/secret.key/endpoint with the cos staff. Especially the value of the endpoint, is not required to be filled in the bucket_name.
6. Example 6
Export to bos using the s3 protocol, and enable concurrency export.
set enable_parallel_outfile = true;
select k1 from tb1 limit 1000
into outfile "s3://my_bucket/export/my_file_"
format as csv
properties
(
     "AWS_ENDPOINT" = "http://cos.<REGION>.mycloud.com",
     "AWS_ACCESS_KEY" = "xxxx",
     "AWS_SECRET_KEY" = "xxx",
     "AWS_REGION" = "<REGION>"
)
The prefix of the final generated file is my_file_{fragment_instance_id}_.
7. Example 7
Export to bos using the s3 protocol, and enable the concurrency export session variable.
set enable_parallel_outfile = true;
select k1 from tb1 order by k1 limit 1000
into outfile "s3://my_bucket/export/my_file_"
format as csv
properties
(
     "AWS_ENDPOINT" = "http://cos.<REGION>.mycloud.com",
     "AWS_ACCESS_KEY" = "xxxx",
     "AWS_SECRET_KEY" = "xxx",
     "AWS_REGION" = "<REGION>"
)
However, as the query statement has a top sort node, this query cannot be concurrency exported even if the concurrency export session variable is enabled.
8. Example 8
Export using the hdfs method. Export a simple query result to the file hdfs://path/to/result.txt. Specify the export format as CSV. Use this and set the kerberos authentication information.
SELECT * FROM tbl
INTO OUTFILE "hdfs://path/to/result_"
FORMAT AS CSV
PROPERTIES
(
     "hdfs.fs.defaultFS" = "hdfs://namenode:port",
     "hdfs.hadoop.security.authentication" = "kerberos",
     "hdfs.kerberos_principal" = "doris@YOUR.COM",
     "hdfs.kerberos_keytab" = "/home/doris/my.keytab"
);
Returned result
The export command is a synchronous command. When the command returns, it means the operation has ended. At the same time, a line of results will be returned to display the execution result of the export.
If the export is successful and returned, the result is as follows:
mysql> select * from tbl1 limit 10 into outfile "file:///home/work/path/result_";
+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                |
+------------+-----------+----------+--------------------------------------------------------------------+
|          1 |         2 |        8 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
1 row in set (0.05 sec)
FileNumber: The total number of files eventually generated.
TotalRows: Number of rows in the result set.
FileSize: The total size of the exported files. Unit: bytes.
URL: If it is exported to a local disk, it shows specifically which Compute Node it was exported to.
If a concurrent export was performed, multiple rows of data will be returned.
+------------+-----------+----------+--------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                |
+------------+-----------+----------+--------------------------------------------------------------------+
|          1 |         3 |        7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
|          1 |         2 |        4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
2 rows in set (2.218 sec)
If an error occurs, an error message would be returned such as:
mysql> SELECT * FROM tbl INTO OUTFILE ...
ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...
Notes
If concurrency is not enabled for exporting, the results are exported by a single BE node in a single thread. Thus, exporting time is directly proportional to the size of the result set. Enabling concurrency for export can reduce the required time.
The export command will not check whether the files or file paths exist. Whether paths will be created automatically or existing files will be overwritten depends entirely on the semantics of the remote storage system.
If there are errors during the export process, some residual export files may be left on the remote storage system. Doris will not clean these files. Users need to clean them manually.
The timeout of the export command is the same as the query timeout. It can be set using SET query_timeout=xxx.
For queries where the result set is empty, a file of size zero will still be generated.
File splitting ensures that a row of data is completely stored in a single file, so the file size doesn't strictly equal max_file_size.
For some functions outputting non-visible characters, such as BITMAP, and HLL types, the output is \\N, that is NULL.
Currently, some geographic information functions, such as ST_Point, output types as VARCHAR, but the actual output values are encoded binary characters. These cases may lead to garbled outputs. For geographic functions, please use ST_AsText for output.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service free trial

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

E-commerce

E-commerce retail solutions

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Financial Services

Financial Services Solution

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha

Cloud Workload Protection Platform

Data Security Governance Center

Key Management Service