Hive Overview

Hive is a data warehouse architecture built on the Hadoop file system, offering various features for data warehouse management, including ETL (Extract, Transform, Load) tools, data storage management, and capabilities for querying and analyzing large datasets. Hive also defines a SQL-like development language that allows users to map structured data files to a database table and provides simple SQL query features.
In EMR, Hive is installed in the /usr/local/service/hive path under EMR nodes.
For more details about Hive, see the Apache Hive Official Website.
Hive Service Roles
Role Name
Description
HiveServer2
The ThriftServer service of Hive is used to receive client query requests, perform SQL compilation and parsing, and support multiple client concurrency and authentication.
An EMR cluster can deploy multiple HiveServer2 instances, which supports scaling to Router nodes and configuring load balancing.
Hive MetaStore
Hive’s metadata service maintains metadata information for Hive databases and Hive tables. The metadata management capability of this module is also integrated with engines such as Spark and Trino.
An EMR cluster can deploy multiple Hive MetaStore instances, with support for expansion to Router nodes.
Hive Client
The Hive client provides applications like Beeline and JDBC, allowing users to submit SQL jobs to HiveServer2. Hive service is installed on all nodes where the service is deployed.
Hive WebHCat
WebHCat is a service that provides a REST API for HCatalog, allowing the execution of Hive commands and submission of MapReduce tasks through REST APIs.
Multiple WebHCat instances can be deployed within a cluster, with support for scaling to Router nodes.
Internal Table and External Table in Hive
Internal Table: Hive manages both the metadata and the actual data of internal tables. When you use the DROP command to delete an internal table, both the metadata and the corresponding data are deleted. After an internal table is created, HDFS files are mapped into a table, and Hive’s data warehouse generates a corresponding directory. The default warehouse path in EMR is /usr/hive/warehouse/${tablename}, where ${tablename} is the name of the table you create, located on HDFS.
External Table: External tables in Hive are similar to internal tables, but their data is not stored in the directory associated with the table itself; instead, it is stored elsewhere. The benefit of this is that if you delete the external table, the data it points to will not be deleted; only the metadata corresponding to the external table will be removed.
Hive Syntax
Hive in EMR is fully compatible with the open-source community syntax. For more details, see the HiveQL Community Syntax Manual.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Role Name	Description
HiveServer2	The ThriftServer service of Hive is used to receive client query requests, perform SQL compilation and parsing, and support multiple client concurrency and authentication. An EMR cluster can deploy multiple HiveServer2 instances, which supports scaling to Router nodes and configuring load balancing.
Hive MetaStore	Hive’s metadata service maintains metadata information for Hive databases and Hive tables. The metadata management capability of this module is also integrated with engines such as Spark and Trino. An EMR cluster can deploy multiple Hive MetaStore instances, with support for expansion to Router nodes.
Hive Client	The Hive client provides applications like Beeline and JDBC, allowing users to submit SQL jobs to HiveServer2. Hive service is installed on all nodes where the service is deployed.
Hive WebHCat	WebHCat is a service that provides a REST API for HCatalog, allowing the execution of Hive commands and submission of MapReduce tasks through REST APIs. Multiple WebHCat instances can be deployed within a cluster, with support for scaling to Router nodes.

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Image Creation

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Tencent Big Model

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Voice Technology

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Hive Service Roles

Internal Table and External Table in Hive

Hive Syntax