Hive migration involves two parts: data migration and metadata migration. Hive table data is primarily stored on HDFS, so data migration mainly occurs at the HDFS level. Hive metadata is mainly stored in relational databases and can be smoothly migrated to TencentDB on the cloud to ensure high availability.
Hive Metadata Migration
1. Dump the source Hive Metastore.
mysqldump -hX.X.X.X -uroot -pXXXX --single-transaction --set-gtid-purged=OFF hivemetastore > hivemetastore-src.sql
2. Confirm the default storage path of Hive table data in HDFS for the target cluster.
The default storage path of Hive table data in HDFS is specified by the hive.metastore.warehouse.dir
configuration item in hive-site.xml
. If the storage path of Hive tables in HDFS for the target cluster needs to match that of the source cluster, you can modify the configuration file accordingly. For example, in the source cluster’s hive-site.xml
, hive.metastore.warehouse.dir
is set as follows:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
</property>
In the target cluster hive-site.xml
, hive.metastore.warehouse.dir
is set as follows:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
If the target cluster’s Hive tables still need to maintain the same storage location in HDFS as the source cluster, then modify the hive-site.xml
in the target to set hive.metastore.warehouse.dir
to:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
</property>
3. Verify the target Hive metadata SDS.LOCATION and DBS.DB_LOCATION_URI fields.Use the following query to get the current values of SDS.LOCATION and DBS.DB_LOCATION_URI fields.
SELECT DB_LOCATION_URI from DBS;
SELECT LOCATION from SDS;
The query results are similar to the following:
mysql> SELECT LOCATION from SDS;
+--------------------------------------------------+
| LOCATION |
+--------------------------------------------------+
| hdfs://HDFS2648/usr/hive/warehouse/hitest.db/t1 |
| hdfs://HDFS2648/usr/hive/warehouse/wyp |
+--------------------------------------------------+
mysql> SELECT DB_LOCATION_URI from DBS;
+-----------------------------------------------+
| DB_LOCATION_URI |
+-----------------------------------------------+
| hdfs://HDFS2648/usr/hive/warehouse |
| hdfs://HDFS2648/usr/hive/warehouse/hitest.db |
+-----------------------------------------------+
hdfs://HDFS2648
is the default HDFS file system name, specified by the fs.defaultFS
parameter in core-site.xml
.
<property>
<name>fs.defaultFS</name>
<value>hdfs://HDFS2648</value>
</property>
/usr/hive/warehouse
is the default storage path for Hive tables in HDFS, which is also the value specified by hive.metastore.warehouse.dir
in hive-site.xml
. Therefore, we need to modify the SDS.LOCATION and DBS.DB_LOCATION_URI fields in the source Hive metadata SQL file. Ensure that these two fields in the imported Hive Metastore use the correct paths. You can use the following sed command to batch modify the SQL file.
Replace the IP address: sed -i 's/oldcluster-ip:4007/newcluster-ip:4007/g' hivemetastore-src.sql
Replace the defaultFS: sed -i 's/old-defaultFS/new-defaultFS/g' hivemetastore-src.sql
Among them, oldcluster-ip and newcluster-ip are the IP addresses of the source and target cluster namenodes respectively, while old-defaultFS and new-defaultFS represent the fs.defaultFS configuration values for the source and target clusters.
Note
If components such as Kudu or Hbase are used and rely on Metastore for metadata services, the corresponding location fields in the target Hive metadata should also be modified accordingly.
4. Stop the target Hive services: MetaStore, HiveServer2, and WebHCatalog.
5. Back up the target Hive Metastore.
mysqldump -hX.X.X.X -uroot -pXXXX --single-transaction --set-gtid-purged=OFF hivemetastore > hivemetastore-target.sql
6. Drop/Create the target Hive metadata.
mysql> drop database hivemetastore;
mysql> create database hivemetastore;
7. Import the source Hive Metastore into the target database.
mysql -hX.X.X.X -uroot -pXXXX hivemetastore < hivemetastore-src.sql
8. Hive metadata upgrade.If the target and source Hive versions are the same, this step can be skipped. Otherwise, query the Hive version in both the source and target clusters.
The Hive upgrade scripts are located in the /usr/local/service/hive/scripts/metastore/upgrade/mysql/
directory.Hive does not support cross-version upgrades; for example, upgrading Hive from 1.2 to 2.3.0 requires executing each intermediate upgrade sequentially.
upgrade-1.2.0-to-2.0.0.mysql.sql -> upgrade-2.0.0-to-2.1.0.mysql.sql -> upgrade-2.1.0-to-2.2.0.mysql.sql -> upgrade-2.2.0-to-2.3.0.mysql.sql
The upgrade scripts mainly perform operations such as creating tables, adding fields, and modifying content. If tables or fields already exist, exceptions for existing fields during the upgrade process can be ignored. For example, upgrading Hive from version 2.3.3 to version 3.1.1.
mysql> source upgrade-2.3.0-to-3.0.0.mysql.sql;
mysql> source upgrade-3.0.0-to-3.1.0.mysql.sql;
9. If there are Phoenix tables in the source Hive, modify the Phoenix table’s ZooKeeper address in the target Hive metadata.Use the following query to get the phoenix.zookeeper.quorum
configuration for the Phoenix table.
mysql> SELECT PARAM_VALUE from TABLE_PARAMS where PARAM_KEY = 'phoenix.zookeeper.quorum';
+--------------------------------------------------+
| PARAM_VALUE |
+--------------------------------------------------+
| 172.17.64.57,172.17.64.78,172.17.64.54 |
+--------------------------------------------------+
Check the ZooKeeper address of the target cluster, which is specified in the hbase.zookeeper.quorum
field in the hive-site.xml configuration file.
<property>
<name>hbase.zookeeper.quorum</name>
<value>172.17.64.98:2181,172.17.64.112:2181,172.17.64.223:2181</value>
</property>
Modify the ZooKeeper address of the Phoenix table in the target Hive metadata to match the ZooKeeper address of the target cluster.
mysql> UPDATE TABLE_PARAMS set PARAM_VALUE = '172.17.64.98,172.17.64.112,172.17.64.223' where PARAM_KEY = 'phoenix.zookeeper.quorum';
10. Check the case format of table names in the target Hive metadata, and see the following example to convert all lowercase table names to uppercase:
alter table metastore_db_properties rename to METASTORE_DB_PROPERTIES;
11. Start the target Hive services: MetaStore, HiveServer2, and WebHcataLog.
12. Finally, you can verify by running a simple Hive SQL query.
Was this page helpful?