tencent cloud

Feedback

Iceberg External Table

Last updated: 2024-07-04 10:15:06
    Note:
    The content showcased in this document is only applicable to Tencent Cloud TCHouse-D 1.1 and below versions. It is recommended to use the Multi-Catalog feature to interface with external catalogs in subsequent versions.
    The Iceberg External Table of Doris provides the capability for Doris to directly access Iceberg external tables. External tables eliminate the tedious data import work and leverage the OLAP capabilities of Doris for analyzing data from Iceberg table:
    1. Support for connecting the Iceberg data source to Doris.
    2. Support for joint queries between Doris and the tables in the Iceberg data source for more complex analysis operations.
    This document mainly introduces the usage and considerations of this feature.

    Usage

    Creating an Iceberg External Table in Doris

    There are two ways to create an Iceberg external table in Doris. You do not need to claim the column definition when creating an external table, as Doris can automatically convert it based on the column definition in the Iceberg table.
    1. Creating an independent external table for mounting an Iceberg table. For the specific syntax, please see CREATE TABLE.
    -- Syntax
    CREATE [EXTERNAL] TABLE table_name
    ENGINE = ICEBERG
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.table" = "icberg_table_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
    
    
    -- Example 1: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg
    CREATE TABLE `t_iceberg`
    ENGINE = ICEBERG
    PROPERTIES (
    "iceberg.database" = "iceberg_db",
    "iceberg.table" = "iceberg_table",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
    
    
    -- Example 2: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg with HDFS HA enabled
    CREATE TABLE `t_iceberg`
    ENGINE = ICEBERG
    PROPERTIES (
    "iceberg.database" = "iceberg_db",
    "iceberg.table" = "iceberg_table",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG",
    "dfs.nameservices"="HDFS8000463",
    "dfs.ha.namenodes.HDFS8000463"="nn2,nn1",
    "dfs.namenode.rpc-address.HDFS8000463.nn2"="172.21.16.5:4007",
    "dfs.namenode.rpc-address.HDFS8000463.nn1"="172.21.16.26:4007",
    "dfs.client.failover.proxy.provider.HDFS8000463"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
    );
    2. Creating an Iceberg database for mounting the corresponding Iceberg database in the remote end, and all the tables under the database. For the specific syntax, please see CREATE DATABASE.
    -- Syntax
    CREATE DATABASE db_name
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
    
    -- Example: Mounting 'iceberg_db' in Iceberg and all tables under this db
    CREATE DATABASE `iceberg_test_db`
    PROPERTIES (
    "iceberg.database" = "iceberg_db",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
    The progress of creating the table in iceberg_test_db can be viewed by HELP SHOW TABLE CREATION.
    You can also create an Iceberg external table with a specific column definition according to your need.
    1. Creating an Iceberg External Table
    -- Syntax
    CREATE [EXTERNAL] TABLE table_name (
    col_name col_type [NULL | NOT NULL] [COMMENT "comment"]
    ) ENGINE = ICEBERG
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.table" = "icberg_table_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
    
    -- Example 1: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg
    CREATE TABLE `t_iceberg` (
    `id` int NOT NULL COMMENT "id number",
    `name` varchar(10) NOT NULL COMMENT "user name"
    ) ENGINE = ICEBERG
    PROPERTIES (
    "iceberg.database" = "iceberg_db",
    "iceberg.table" = "iceberg_table",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
    
    -- Example 2: Mounting the 'iceberg_table' in 'iceberg_db' in Iceberg with HDFS HA enabled
    CREATE TABLE `t_iceberg` (
    `id` int NOT NULL COMMENT "id number",
    `name` varchar(10) NOT NULL COMMENT "user name"
    ) ENGINE = ICEBERG
    PROPERTIES (
    "iceberg.database" = "iceberg_db",
    "iceberg.table" = "iceberg_table",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG",
    "dfs.nameservices"="HDFS8000463",
    "dfs.ha.namenodes.HDFS8000463"="nn2,nn1",
    "dfs.namenode.rpc-address.HDFS8000463.nn2"="172.21.16.5:4007",
    "dfs.namenode.rpc-address.HDFS8000463.nn1"="172.21.16.26:4007",
    "dfs.client.failover.proxy.provider.HDFS8000463"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
    );

    Parameter description:

    External Table Column
    Column names should correspond to the Iceberg table one by one.
    The order of the columns needs to match the Iceberg Table.
    The ENGINE needs to be specified as ICEBERG.
    PROPERTIES:
    iceberg.hive.metastore.uris: Hive Metastore server address.
    iceberg.database: The database name corresponding to the mounted Iceberg.
    iceberg.table: The table name corresponding to the mounted Iceberg, no need to specify when mounting Iceberg database.
    iceberg.catalog.type: The catalog method used in Iceberg, which is HIVE_CATALOG by default. At present, only this method is supported, and more Iceberg catalog methods will be supported later.

    Display Table Structure

    You can view the displayed table structure through SHOW CREATE TABLE.

    Synchronous Mounting

    When the Schema of Iceberg Table changes, you can manually synchronize it using the REFRESH command, which will delete and rebuild the Iceberg External Table in Doris. For detailed information, see HELP REFRESH.
    -- Synchronizing Iceberg Table
    REFRESH TABLE t_iceberg;
    
    -- Synchronizing Iceberg Database
    REFRESH DATABASE iceberg_test_db;

    Type Matching

    The Iceberg Column Type supported and the corresponding relationship with Doris are as follows:
    Iceberg
    Doris
    Description
    BOOLEAN
    BOOLEAN
    -
    INTEGER
    INT
    -
    LONG
    BIGINT
    -
    FLOAT
    FLOAT
    -
    DOUBLE
    DOUBLE
    -
    DATE
    DATE
    -
    TIMESTAMP
    DATETIME
    Converting timestamp to datetime may result in loss of precision
    STRING
    STRING
    -
    UUID
    VARCHAR
    Use VARCHAR as a substitute
    DECIMAL
    DECIMAL
    -
    TIME
    -
    Not supported
    FIXED
    -
    Not supported
    BINARY
    -
    Not supported
    STRUCT
    -
    Not supported
    LIST
    -
    Not supported
    MAP
    -
    Not supported
    Note
    The Schema of Iceberg Table changes will not automatically synchronize, it is necessary to synchronize the Iceberg external table or database in Doris using the REFRESH command.
    The current supported Iceberg versions are 0.12.0 and 0.13.2 by default. Other versions have not been tested. More versions will be supported in the future.

    Query Usage

    You can use the Iceberg external tables in Doris the same way as using Doris internal tables, except that the Doris data models (Rollup, Pre-Aggregation, and Materialized Views, etc.), it is no different from a regular Doris OLAP table.
    select * from t_iceberg where k1 > 1000 and k3 ='term' or k4 like '%doris';

    Related System Configuration

    FE Configuration

    The following configurations belong to the system-level configurations of the Iceberg external table. You could modify the fe.conf or use ADMIN SET CONFIG to configure.
    iceberg_table_creation_strict_mode Create an Iceberg table, and the strict mode is enabled by default. The strict mode involves a strict filter on the column types of the Iceberg table. If there are data types that Doris does not currently support, the creation of the external table fails.
    iceberg_table_creation_interval_second The interval between executing background tasks when Iceberg table is automatically created. The default value is 10s.
    max_iceberg_table_creation_record_size The maximum value retained when an Iceberg table is created. The defaut value is 2000. This only applies to the creation of Iceberg database records.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support