tencent cloud

Feedback

DLC Data Import Guide

Last updated: 2024-07-31 17:23:10

    External Table Data Import via COS

    DLC supports querying and analyzing data directly on COS without migrating the data. Therefore, you only need to import the data into COS to start using DLC for seamless data analysis, achieving complete decoupling of data storage and computation. Currently, it supports uploading in multiple formats such as orc, parquet, avro, json, csv, and text files. Currently, COS offers a variety of data import methods. You can choose from the following methods based on your situation.
    log in to COS and proceed with file upload directly. For related operating steps, see Uploading an Object.
    Import data using various upload tools provided by COS. For a list of supported tools, see Tool Overview.
    Import data using SDKs or APIs provided by the COS service. For service-related instructions, see Upload Interface Documentation.
    If you need to analyze logs from CLS, you can directly deliver logs to COS by partition and then analyze and query directly through DLC. For related operations, see Using DLC (Hive) to Analyze CLS Logs.
    If you need to import data from other cloud services (such as database CDB, etc.) into COS, you can use DataInLong to perform the import. When creating a data synchronization link, select the cloud service to export from as the data source and choose COS as the destination to complete the data import.
    If you encounter any issues during data import, you can consult us for a solution by Submitting a Ticket. After importing data into COS, you can perform SQL queries through the DLC console, API, or SDKs, enabling table creation, analysis, and export of results. For detailed operations, see Quick Start with Data Analytics in Data Lake Compute.

    Data import into native tables

    To provide better data query performance, DLC also supports importing data into native tables for query analysis. DLC native tables are arranged in the Iceberg table format, optimizing data during the import process. If you have the following use cases, it is recommended to use native tables for data query analysis.
    In data warehouse analysis scenarios, aiming to leverage the Iceberg index for better analytical performance.
    If there's a need to update data, the DLC service supports performing UPSERT operations through SQL or data jobs.
    Data is written or updated in real-time through DataInLong, Flink, SCS, Spark Streaming, with concurrent reads and writes, requiring transactional guarantees for data processing business.
    Wishing to utilize Iceberg table features, such as time travel, multi-version snapshots, hidden partitions, partition evolution, and other advanced data lake features.
    If you need to import data into a native table, you can choose one of the following methods based on your situation.
    Directly import through the DLC console.
    Caution
    When importing data through the console, there are certain restrictions, mainly for rapid testing and it's not recommended for production use.
    If your original data is in services like MySQL or Kafka and you need to write or update MySQL binlog and message middleware data to DLC in near real-time, this can be achieved through DataInLong DataInlong's real-time import capability. Or through SCS, Flink writing. For operational guidance, you can contact us through a Work Order.
    If the original data is in data services such as MySQL, Kafka, MongoDB, etc., offline synchronization tasks by DataInLong DataInLong can be used to transfer data to native tables. During the data warehouse modeling process, external tables are used as the source layer of original data. In the process of transferring data to native tables, business-specific data distributions can be reorganized through building sparse indexes, etc., to achieve excellent query analysis performance of native tables. If guidance is needed, you can Contact Us.
    Use SQL statements SELECT INSERT to query the data from the external table and then write it into the native table. For example, after creating a native table in DLC with the same table structure as the external table, the transfer can be completed by executing SQL syntax with the SparkSQL engine. Syntax example is as follows:
    --- External table name: outtertable, Native table name: innertable
    insert into innertable select * from outtertable
    If you encounter any issues during data import, you can consult us for solutions by submitting a work order.

    Multiple data sources federated query analysis

    If you do not wish to export data to the native tables of COS or DLC, DLC also offers the capability of data federation query analysis. It supports rapid association and analysis of data from multiple data sources through SQL without relocating data. Currently, it supports a variety of data sources including MySQL, SQLServer, clickhouse, PostgreSQL, EMR on HDFS, and EMR on COS. When using federated analysis, it is necessary for the data source and data engine to be on the same network, ensuring network connectivity. Management can refer to Engine Network Configuration.
    When querying EMR data through DLC federated analysis, the query performance will be on par with or even exceed that of EMR, making it suitable for production environments. It allows for the full utilization of DLC's fully-managed elastic capabilities to reduce costs and increase efficiency without relocating EMR services.
    Federated analysis enables quick unification and analysis of data from multiple data sources, providing a convenient method for data insights and rapid analysis. With the support of DLC's fully-managed elastic capabilities, it effectively reduces the cost of use. It also supports the use of INSERT INTO/INSERT OVERWRITE syntax to write federated data into DLC native tables, completing data import.
    When analyzing data from other data sources through federated analysis, since the computation process involves synchronizing data to the DLC for analysis, there is some performance loss compared to directly querying the original data sources. If high query performance is required, data can be imported into native tables for analysis. The operation can be seen in Data import into native tables.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support