Apache Oozie – A Hadoop job Scheduler
What is Apache Oozie
Apache oozie is a workflow scheduler system to manage
Apache Hadoop jobs.
Main task of Apache oozie is it can schedule jobs to run
in different time or schedule and data availability.
But oozie is not limited to Hadoop jobs it as well supports
Java MR jobs , streaming MR jobs , pig , Hive , Sqoop and some specific Java
Jobs.
A very nice definition of Oozie , I grabbed from internet-
Oozie is systems for
describing the workflow of a job, where that job may contain a set of map
reduce jobs, pig scripts, fs operations etc and supports fork and joining of
the data flow.
It doesn't however allow you to stream the input of one MR
job as the input to another - the map-reduce action in oozie still requires an
output format of some type, typically a File based on, so your output from job
1 will still be serialized via HDFS, before being processed by job 2.
Installation
If using Cloudera, then get the recommended version based on
your Hadoop.
Tried with oozie-3.3.0-cdh4.2.2 provided in cloudera
repository.
If we are trying to work directly with oozie setup without
build not with cloudera provided, the only difference is that you need to build
the oozie.
So to build the oozie (not Cloudera provided setup)
$ cd oozie-3.3.2/bin
$ ./mkdistro.sh -DskipTests
·
Prepare the war once
bin/oozie-setup.sh prepare-war
bin/oozie-setup.sh -extjs libext/ext-2.2.zip
- · Now create the oozie schema ( using mysql not the default derby)
Before run the command, change conf/oozie-site.xml –
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://localhost:3306/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie</value>
</property>
Then run the command-
bin/ooziedb.sh create -sqlfile oozie.sql -run
- · Update Hadoop access for oozie user in conf/oozie-site.xml
- ·
- Now turn on the oozie –
bin/oozied.sh start
- · To check the oozie status-
Oozie console - http://localhost:11000/oozie
From command
line-
bin/oozie admin -oozie http://localhost:11000/oozie -status
- · Run Oozie example
oozie job -oozie http://localhost:11000/oozie -config /home/training/tmp/examples/apps/map-reduce/job.properties -run
- · Check the status of job in command line-
oozie job -oozie http://localhost:11000/oozie -info 64-256532451321-oozie-tucu
or you can check the status in http://localhost:11000/oozie
Problems during setup
Oozie impersonate user error-
Always set the oozie user name properly in
conf/oozie-site.xml . you must be sure about the name of your oozie user else
it will break.
Then main point,
Restart Hadoop name node, data node and all other slaves and
then it should work.
Couldn’t load oozie service class
org.apache.oozie.service.ServiceException: E0103: Could not
load service classes, Cannot create PoolableConnectionFactory
Run Command
bin/oozie-setup.sh -extjs libext/ext-2.2.zip
MySql oozie connection refused
Then change oozie-site.xml