ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Oozie

Apache Oozie

Apache Oozie is a workflow manager, designed especially for running Hadoop MapReduce jobs

It contains 2 parts:

  • workflow engine: runs workflow jobs (MR, Pig, Hive)
  • coordinator engine: coordinates the execution

It’s a service:

  • Oozie is a service that runs on the cluster
  • the client submits only workflow definitions
  • so, unlike hadoop JobControl, it doesn’t submit the tasks itself

Workflow

A ‘‘workflow’’ is a DAG of ‘‘action nodes’’ and ‘‘control-flow nodes’’

Action Nodes

  • perform workflow tasks
  • e.g. running Hadoop MapReduce, Pig or Hive jobs
  • can also be an arbitrary shell script or a Java program

Control Flow Nodes

  • Conditional logic (if, else, etc)
  • Parallel execution

The Oozie workflow is written in XML using Hadoop Process Definition language

Editors

Hue has Oozie Workflow editor - so it is possible to design workflows manually

  • see http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/