Apache Oozie is an open-source workflow engine. It is designed to orchestrate the tasks of Hadoop ecosystem components into workflows and then schedule, execute, and monitor them. This document briefly describes how to use Oozie in EMR. For detailed directions, visit the website. Here, we recommend you use Oozie through Hue's GUI as instructed in the Hue development documentation.
You have created an EMR Hadoop cluster and selected the Oozie service. For more information, see Creating EMR Cluster.
As the EMR cluster is preinstalled with ShareLib, you no longer need to install it when using Oozie to submit a workflow job. Of course, you can edit and update ShareLib as instructed below:
cd /usr/local/service/oozie
Add `tar -xf oozie-sharelib.tar.gz` to `bin/oozie-setup.sh sharelib create -fs hdfs://active-namenode-ip:4007 -locallib shareoozie admin --oozie http://oozie-server-ip:12000/oozie -sharelibupdate` in the directory of the action to be supported in the `share` directory generated by decompressing the JAR package.
Decompress the oozie-examples.tar.gz
file in the Oozie installation directory /usr/local/service/oozie
, which provides the sample workflows of the components supported by Oozie:
tar -xf oozie-examples.tar.gz
Take action hive2
as an example:
job.properties
:namenode
to the value of fs.defaultFS
in core-site.xml
.yarn.resourcemanager.ha.rm-ids
in yarn-site.xml
in HA mode, or to the value of yarn.resourcemanager.address
in non-HA mode.jdbc:hive2://hive2-server:7001/default
.Take action hive2
as an example again. Check the README file in the hive2
directory for other notes.
job.properties
:namenode
to the value of fs.defaultFS
in core-site.xml
.yarn.resourcemanager.ha.rm-ids
in yarn-site.xml
in HA mode, or to the value of yarn.resourcemanager.address
in non-HA mode.jdbcURL
is jdbc:hive2://hive2-server:7001/default
.jdbcPrincipal
is the value of hive.server2.authentication.kerberos.principal
.
Was this page helpful?