tencent cloud

Feedback

Hudi Catalog

Last updated: 2024-06-27 11:15:30
    Mutil Catalog Hudi offers Doris the capability of accessing Hudi external tables directly without tedious data import task, and give full play to its own OLAP capabilities to perform data analysis on Hudi:
    1. Hudi data source can be connected to Doris.
    2. It enables joint queries across Doris and Hudi data source and thus allows more complex analysis.
    This document mainly introduces how to use lceberg Catalog and preacautions during usage.
    Note:
    This feature is applicable to TCHouse-D 1.2 and later versions.
    Hudi currently only supports Snapshot Query for Copy On Write tables, and Read Optimized Query for Merge On Read tables.

    Creation Method

    Data on HDFS:
    CREATE CATALOG hudi PROPERTIES (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://172.21.xxx:7004',
    'hadoop.username' = 'hadoop',
    'dfs.nameservices'='your-nameservice',
    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.xxx:4007',
    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.xxx:4007',
    'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    );
    Data on COS:
    CREATE CATALOG `hudi_cos` PROPERTIES (
    "AWS_ENDPOINT" = "cos.ap-guangzhou.myqcloud.com",
    "AWS_REGION" = "ap-guangzhou",
    "hive.metastore.uris" = "thrift://172.16.xxxx:7004",
    "type" = "hms",
    "AWS_SECRET_KEY" = "Wu9ByN6g4D8seHj0770jJxxxx",
    "AWS_ACCESS_KEY" = "AKIDaWJcCi9Rc4TqjV9hYHn9NRxxxxx"
    );

    Type Matching

    The following table shows the corresponding relationships between supported column types of Hudi and Doris:
    HMS Type
    Doris Type
    Comment
    boolean
    boolean
     -
    tinyint
    tinyint
     -
    smallint
    smallint
     -
    int
    int
     -
    bigint
    bigint
     -
    date
    date
     -
    timestamp
    datetime
     -
    float
    float
     -
    double
    double
     -
    char
    char
     -
    varchar
    varchar
     -
    decimal
    decimal
     -
    array<type>
    array<type>
    Nested arrays, such as array<array<int>>, are supported
    map<KeyType, ValueType>
    map<KeyType, ValueType>
    Nested structure is not supported; KeyType and ValueType must be basic types.
    struct<col1: Type1, col2: Type2, ...>
    struct<col1: Type1, col2: Type2, ...>
    Nested structure is not supported; Type1, Type2, ... need to be basic types
    other
    unsupported
     -

    Query Usage

    It is the same with the regular Doris OLAP table.
    select * from hudi_catalog_name.database_name.table_name;
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support