/data/Impala
directory of the CVM instance for the EMR cluster.root
, and the password is the one you set when creating the EMR cluster. Once the correct credentials are entered, you can enter the command line interface.[root@10 ~]# su hadoop[hadoop@10 root]$ cd /data/Impala/
gen_data.sh
and add the following code to it:#!/bin/bashMAXROW=1000000 # Specify the number of data rows to be generatedfor((i = 0; i < $MAXROW; i++))doecho $RANDOM, \\"$RANDOM\\"done
[hadoop@10 ~]$ ./gen_data.sh > impala_test.data
impala_test.data
file. Then, upload the generated test data to HDFS and run the following commands:[hadoop@10 ~]$ hdfspath="/impala_test_dir"[hadoop@10 ~]$ hdfs dfs -mkdir $hdfspath[hadoop@10 ~]$ hdfs dfs -put ./impala_test.data $hdfspath
$hdfspath
is the path of your file in HDFS. Finally, you can run the following command to verify whether the data has been properly put in HDFS.[hadoop@10 ~]$ hdfs dfs -ls $hdfspath
impala-shell
path varies by the community component API protocol and default path of the Impala version as shown below:lmpala Version | impala-shell Path | Default Communication Port of impala-shell |
4.1.0/4.0.0 | /data/lmpala/shell | 27009 |
3.4.0 | /data/lmpala/shell | 27001 |
2.10.0 | /data/lmpala/bin | 27001 |
[root@10 Impala]# cd /data/Impala/shell;./impala-shell -i $core_ip:27001
core_ip
is the IP of the core node of the EMR cluster. The IP of a task node can also be used. After login succeeds, the following will be displayed:Connected to $core_ip:27001Server version: impalad version 3.4.1-RELEASE RELEASE (build Could not obtain git hash)***********************************************************************************Welcome to the Impala shell.(Impala Shell 3.4.1-RELEASE (ebled66) built on Tue Nov 20 17:28:10 CST 2021)The SET command shows the current value of all shell and query options.***********************************************************************************[$core_ip:27001] >
cd /data/Impala/shell;./impala-shell -i localhost:27001
[10.1.0.215:27001] > show databases;Query: show databases+------------------+----------------------------------------------+| name | comment |+------------------+----------------------------------------------+| _impala_builtins | System database for Impala builtin functions || default | Default Hive database |+------------------+----------------------------------------------+Fetched 2 row(s) in 0.09s
create
command to create a database:[localhost:27001] > create database experiments;Query: create database experimentsFetched 0 row(s) in 0.41s
use
command to go to the test
database you just created:[localhost:27001] > use experiments;Query: use experiments
select current_database();
create
command to create an internal table named impala_test
in the experiments
database:[localhost:27001] > create table t1 (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';Query: create table t1 (a int, b string)Fetched 0 row(s) in 0.13s
[localhost:27001] > show tables;Query: show tables+------+| name |+------+| t1 |+------+Fetched 1 row(s) in 0.01s
[localhost:27001] > desc t1;Query: describe t1+------+--------+---------+| name | type | comment |+------+--------+---------+| a | int | || b | string | |+------+--------+---------+Fetched 2 row(s) in 0.01s
LOAD DATA INPATH '$hdfspath/impala_test.data' INTO TABLE t1;
$hdfspath
is the path of your file in HDFS. After the import is completed, the source data file in the import path in HDFS will be deleted and then stored in the /usr/hive/warehouse/experiments.db/t1
path of the Impala internal table. You can also create an external table by running the following statement:CREATE EXTERNAL TABLE t2(a INT,b string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION '/impala_test_dir';
[localhost:27001] > select count(*) from experiments.t1;Query: select count(*) from experiments.t1Query submitted at: 2019-03-01 11:20:20 (Coordinator: http://10.1.0.215:20004)Query progress can be monitored at: http://10.1.0.215:20004/query_plan?query_id=f1441478dba3a1c5:fa7a8eef00000000+----------+| count(*) |+----------+| 1000000 |+----------+Fetched 1 row(s) in 0.63s
[localhost:27001] > drop table experiments.t1;Query: drop table experiments.t1
$hs2host
and $hsport
, where $hs2host
is the IP of any core node or task node in the EMR cluster and $hsport
can be viewed in the conf/impalad.flgs
configuration file under the Impala directory of the corresponding node.[root@10 ~]# su hadoop[hadoop@10 root]$ cd /data/Impala/[hadoop@10 Impala]$ grep hs2_port conf/impalad.flgs
Was this page helpful?