tencent cloud

Feedback

Accessing Iceberg Data with Hive

Last updated: 2024-10-30 11:41:59

    Development Preparation

    Make sure you have activated Tencent Cloud and created an EMR cluster. For more details, see Creating a Cluster.
    During the creation of an EMR cluster, select the Hive, Spark, and Iceberg components in the software configuration interface.

    Using Spark to Create an Iceberg Table

    Log in to the Master node, switch to the hadoop user, and execute the following command to start SparkSQL:
    spark-sql --master local[*] --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.local.type=hadoop --conf spark.sql.catalog.local.warehouse=/usr/hive/warehouse --jars /usr/local/service/iceberg/iceberg-spark-runtime-3.2_2.12-0.13.0.jar
    
    Note:
    The Iceberg-related packages are located in the /usr/local/service/iceberg/ directory. The versions of the dependency packages used by --jars may vary between different EMR versions, so check and use the correct dependency packages.
    Create a table:
    spark-sql> CREATE TABLE local.default.t1 (id int, name string) USING iceberg;
    Time taken: 2.752 seconds
    Insert data:
    spark-sql> INSERT INTO local.default.t1 values(1, "tom");
    Time taken: 2.71 seconds
    Query data:
    spark-sql> SELECT * from local.default.t1;
    1 tom
    Time taken: 0.558 seconds, Fetched 1 row(s)

    Using Hive to View Iceberg Data

    Log in to the Master node, switch to the hadoop user, and execute the following command to connect to Hive:
    hive
    Add the Iceberg dependency package:
    hive> add jar /usr/local/service/iceberg/iceberg-hive-runtime-0.13.0.jar;
    Create an external table:
    hive> CREATE EXTERNAL TABLE t1
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    LOCATION '/usr/hive/warehouse/default/t1'
    TBLPROPERTIES ('iceberg.catalog'='location_based_table');
    Query the record count of the t1 table:
    hive> select count(*) from t1;
    OK
    1
    Time taken: 26.255 seconds, Fetched: 1 row(s)
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support