Basic Hive Operations

This document demonstrates how to use Hive on EMR to create databases and tables, import data, and perform basic queries.
Development Preparation
Make sure you have activated Tencent Cloud and created an EMR cluster. For more details, see Creating a Cluster.
During the creation of an EMR cluster, select the Hive component in the software configuration interface.
In the example, there are optional contents that require access to Tencent Cloud Object Storage (COS). You can see Creating a Bucket to create a bucket in COS, and enable COS authorization on the Instance Information page of the EMR console.
Note:
You can see Logging in to a Linux Instance for the method to log in to an EMR node. On the cluster details page, select Cluster Resources > Resource Management, click the corresponding node resource ID to enter the CVM list, and click Login on the right to use WebShell to log in to the instance.
The default username for logging in to a Linux instance is root, and the password is the one set by the user during the creation of the EMR. After entering the correct information, you will be taken to the command line interface.
All operations in this document are performed as the hadoop user. Switch the user identity after logging in to the command line interface.
Preparing Sample Data
Log in to the Master node and use the following command in the EMR command line to switch to the hadoop user and navigate to the Hive folder:
su hadoop
cd /usr/local/service/hive
Create a bash script file named gen_data.sh and add the following code to it:
#!/bin/bash
MAXROW=1000000 #Specifies the number of rows of data to generate.
for((i = 0; i < $MAXROW; i++))
do
 echo $RANDOM, \\"$RANDOM\\"
done
Grant permissions and execute the script as follows. This script will generate 1,000,000 pairs of random numbers and save them to the file hive_test.data:
chmod +x script name
./gen_data.sh > hive_test.data
Use the following command to upload the generated test data to HDFS, where ${hdfspath} is the path on HDFS where you want to store the file:
hdfs dfs -put ./hive_test.data /${hdfspath}
You can also use data stored in COS. Upload the data to COS. If the data is stored locally, you can use the COS console to upload it. If the data is in the EMR cluster, use the following command to upload the data, where ${bucketname} is the name of the COS bucket you created:
hdfs dfs -put ./hive_test.data cosn://${bucketname}/
Hive Basic Operations
Log in to the Master node of the EMR cluster, switch to the hadoop user, and enter the Hive command line using the Hive client:
hive
Creating Databases and Tables
Use the SHOW syntax to display all current databases:
hive> show databases;
OK
default
Time taken: 0.26 seconds, Fetched: 1 row(s)
Use the CREATE DATABASE syntax to create a database named test:
hive> create database if not exists test;
OK
Time taken: 0.176 seconds
Use the USE syntax to switch to the created test database:
hive> use test;
OK
Time taken: 0.176 seconds
Use the CREATE TABLE syntax to create an internal table named hive_test under the test database:
hive> create table hive_test (a int, b string)
hive> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- Create the hive_test data table and specify the column delimiter as commas (,).
OK
Time taken: 0.204 seconds
Finally, use the SHOW TABLES syntax to check if the table has been created successfully:
hive> show tables;
OK
hive_test
Time taken: 0.176 seconds, Fetched: 1 row(s)
Importing Data
For data stored in HDFS, use the following command to import it into the table:
hive> load data inpath "/${hdfspath}/hive_test.data" into table hive_test;
For data stored in COS, use the following command to import it into the table:
hive> load data inpath "cosn://${bucketname}/hive_test.data" into table hive_test;
﻿
Data stored locally in the EMR cluster can also be imported into Hive using the following command:
hive>load data local inpath "/${localpath}/hive_test.data" into table hive_test;
Note:
${hdfspath} is the path where your file is stored on HDFS, ${bucketname} is your COS bucket name, and ${localpath} is the path where your data is stored locally on the EMR cluster. After the import is completed, the source data will be deleted.
Executing Queries
Query the first 10 elements in the table:
hive> select * from hive_test limit 10;
OK
30847	"31583"
14887	"32053"
19741	"16590"
8104	"20321"
29030	"32724"
27274	"5231"
10028	"22594"
924	"32569"
10603	"27927"
4018	"30518"
Time taken:2.133 seconds, Fetched:10 row(s)
Count the total number of rows in the table:
hive> select count(*) from hive_test;
OK
1000000
Time taken:18.504 seconds, Fetched:1 row(s)
Deleting Databases and Tables
Use the DROP TABLE syntax to delete a Hive table:
hive> drop table if exists hive_test;
Moved: 'hdfs://HDFS/usr/hive/warehouse/hive_test' to trash at: hdfs://HDFS/user/hadoop/.Trash/Current
OK
Time taken: 2.327 seconds
Use the DROP DATABASE syntax to delete a Hive database:
hive> drop database if exists test;
OK
Time taken: 0.531 seconds

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

Development Preparation

Preparing Sample Data

Hive Basic Operations

Creating Databases and Tables

Importing Data

Executing Queries

Deleting Databases and Tables

About Tencent Cloud

Help & Support

Resources

User Center

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Tencent Big Model

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Voice Technology

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Image Creation

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Development Preparation

Preparing Sample Data

Hive Basic Operations

Creating Databases and Tables

Importing Data

Executing Queries

Deleting Databases and Tables

About Tencent Cloud

Help & Support

Resources

User Center