ML Wiki
Machine Learning Wiki - A collection of ML concepts, algorithms, and resources.

Pig

Pig

Pig Latin is a SQL-like declarative query language that runs on top of Hadoop

Pig Latin

  • needs data model in form of UDF (user defined function)
  • first it generated a query plan
  • then compiles it into a set of MR jobs
  • some optimizations are applied

Example

SQL:

SELECT SUM(s.Sale), c.City 
FROM Sales s, Cities c
WHERE s.AddrId = c.AddrId
GROUP BY City;

Pig Latin ```text only – 1 tmp = COGROUP Sales BY AddrId, Cities BY AddrId – 2 join = FOREACH tmp GENERATE FLATTEN(Sales), FLATTEN(Cities) – 3 grp = GROUP join BY City

– 4 res = FOREACH grp GENERATE SUM(Sale) ```

in Pig FOREACH $\approx$ Map

See also

Sources