Apache Oozie
Apache Oozie is a workflow manager, designed especially for running Hadoop MapReduce jobs
It contains 2 parts:
- workflow engine: runs workflow jobs (MR, Pig, Hive)
- coordinator engine: coordinates the execution
It’s a service:
- Oozie is a service that runs on the cluster
- the client submits only workflow definitions
- so, unlike hadoop
JobControl
, it doesn’t submit the tasks itself
Workflow
A ‘‘workflow’’ is a DAG of ‘‘action nodes’’ and ‘‘control-flow nodes’’
Action Nodes
- perform workflow tasks
- e.g. running Hadoop MapReduce, Pig or Hive jobs
- can also be an arbitrary shell script or a Java program
Control Flow Nodes
- Conditional logic (if, else, etc)
- Parallel execution
The Oozie workflow is written in XML using Hadoop Process Definition language
Editors
Hue has Oozie Workflow editor - so it is possible to design workflows manually
- see http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/