Friday, April 25, 2014

Apache Oozie - Hadoop Job Scheduler Problems & installation

Apache Oozie – A Hadoop job Scheduler

What is Apache Oozie

Apache oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Main task of Apache oozie is it can schedule jobs to run in different time or schedule and data availability.
But oozie is not limited to Hadoop jobs it as well supports Java MR jobs , streaming MR jobs , pig , Hive , Sqoop and some specific Java Jobs.

A very nice definition of Oozie , I grabbed from internet-
Oozie is systems for describing the workflow of a job, where that job may contain a set of map reduce jobs, pig scripts, fs operations etc and supports fork and joining of the data flow.
It doesn't however allow you to stream the input of one MR job as the input to another - the map-reduce action in oozie still requires an output format of some type, typically a File based on, so your output from job 1 will still be serialized via HDFS, before being processed by job 2.

Installation

If using Cloudera, then get the recommended version based on your Hadoop.
Tried with oozie-3.3.0-cdh4.2.2 provided in cloudera repository.
If we are trying to work directly with oozie setup without build not with cloudera provided, the only difference is that you need to build the oozie.
So to build the oozie (not Cloudera provided setup)

 $ cd oozie-3.3.2/bin  
 $ ./mkdistro.sh -DskipTests  

·        Prepare the war once

 bin/oozie-setup.sh prepare-war  



·        As I was expecting to find the oozie output in console so ran the following command


 bin/oozie-setup.sh -extjs libext/ext-2.2.zip  



  • ·        Now create the oozie schema ( using mysql not the default derby)



Before run the command, change conf/oozie-site.xml –

 <property>  
     <name>oozie.service.JPAService.jdbc.driver</name>  
     <value>com.mysql.jdbc.Driver</value>  
   </property>  
   <property>  
     <name>oozie.service.JPAService.jdbc.url</name>  
     <value>jdbc:mysql://localhost:3306/oozie</value>  
   </property>  
   <property>  
     <name>oozie.service.JPAService.jdbc.username</name>  
     <value>oozie</value>  
   </property>  
   <property>  
     <name>oozie.service.JPAService.jdbc.password</name>  
     <value>oozie</value>  
   </property>  

Then run the command-

 bin/ooziedb.sh create -sqlfile oozie.sql -run  
  • ·        Update Hadoop access for oozie user in conf/oozie-site.xml 

  • ·



  •         Now turn on the oozie –
 bin/oozied.sh start  


  • ·       To check the oozie status-

From command line-
 bin/oozie admin -oozie http://localhost:11000/oozie -status  

  • ·        Run Oozie example
 oozie job -oozie http://localhost:11000/oozie -config /home/training/tmp/examples/apps/map-reduce/job.properties -run  


  • ·        Check the status of job in command line-
 oozie job -oozie http://localhost:11000/oozie -info 64-256532451321-oozie-tucu  


or you can check the status in http://localhost:11000/oozie

Problems during setup

Oozie impersonate user error-

Always set the oozie user name properly in conf/oozie-site.xml . you must be sure about the name of your oozie user else it will break.
Then main point,
Restart Hadoop name node, data node and all other slaves and then it should work.

Couldn’t load oozie service class

org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Cannot create PoolableConnectionFactory








Run Command
 bin/oozie-setup.sh -extjs libext/ext-2.2.zip  


MySql oozie connection refused










Then change oozie-site.xml



No comments: