(Created page with "== Apache Oozie == Apache Oozie is a workflow manager, designed especially for running Hadoop MapReduce jobs It contains 2 parts: * workflow engine: runs workfl...")
 
 
Line 28: Line 28:
  
 
The Oozie workflow is written in [[XML]] using Hadoop Process Definition language
 
The Oozie workflow is written in [[XML]] using Hadoop Process Definition language
 +
  
 
== Editors ==
 
== Editors ==
Line 33: Line 34:
 
* see http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/
 
* see http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/
  
 +
 +
== Sources ==
 +
* [[Hadoop: The Definitive Guide (book)]]
  
 
[[Category:Hadoop]]
 
[[Category:Hadoop]]
 
[[Category:Workflow Management]]
 
[[Category:Workflow Management]]
 
[[Category:ETL]]
 
[[Category:ETL]]

Latest revision as of 13:37, 23 November 2015

Apache Oozie

Apache Oozie is a workflow manager, designed especially for running Hadoop MapReduce jobs

It contains 2 parts:

  • workflow engine: runs workflow jobs (MR, Pig, Hive)
  • coordinator engine: coordinates the execution


It's a service:

  • Oozie is a service that runs on the cluster
  • the client submits only workflow definitions
  • so, unlike hadoop JobControl, it doesn't submit the tasks itself


Workflow

A workflow is a DAG of action nodes and control-flow nodes

Action Nodes

  • perform workflow tasks
  • e.g. running Hadoop MapReduce, Pig or Hive jobs
  • can also be an arbitrary shell script or a Java program


Control Flow Nodes

  • Conditional logic (if, else, etc)
  • Parallel execution

The Oozie workflow is written in XML using Hadoop Process Definition language


Editors

Hue has Oozie Workflow editor - so it is possible to design workflows manually


Sources