Apache Oozie

Apache Oozie is a workflow manager, designed especially for running Hadoop MapReduce jobs

It contains 2 parts:

  • workflow engine: runs workflow jobs (MR, Pig, Hive)
  • coordinator engine: coordinates the execution


It's a service:

  • Oozie is a service that runs on the cluster
  • the client submits only workflow definitions
  • so, unlike hadoop JobControl, it doesn't submit the tasks itself


Workflow

A workflow is a DAG of action nodes and control-flow nodes

Action Nodes

  • perform workflow tasks
  • e.g. running Hadoop MapReduce, Pig or Hive jobs
  • can also be an arbitrary shell script or a Java program


Control Flow Nodes

  • Conditional logic (if, else, etc)
  • Parallel execution

The Oozie workflow is written in XML using Hadoop Process Definition language


Editors

Hue has Oozie Workflow editor - so it is possible to design workflows manually


Sources