(Created page with "== MapReduce/Joins == How to implement a Join from Relational Algebra using MapReduce? There are several types of joins: * broadcast join * reduce-side join == Broa...") |
(No difference)
|
How to implement a Join from Relational Algebra using MapReduce?
There are several types of joins:
note
Suppose we have the following schema:
We want to have the following join:
Our tagged dataset
Emp | Sue | 999 |
Emp | Tony | 777 |
Dep | 999 | Accounts |
Dep | 777 | Sales |
Dep | 777 | Marketing |
After applying map we get
999 | (Emp, Sue, 999) |
777 | (Emp, Tony, 777) |
999 | (Dep, 999, Accounts) |
777 | (Dep, 777, Sales) |
777 | (Dep, 777, Marketing) |
And finally after the reduce stage we get
key=999 | [(Emp, Sue, 999), (Dep, 999, Accounts)] |
key=777 | [(Emp, Tony, 777), (Dep, 777, Sales), (Dep, 777, Marketing)] |
Source: [1]
def mapper(record): id = record[1] emit(id, record) def reducer(key, list_of_values): grouped = itertools.groupby(list_of_values, operator.itemgetter(0)) g = {k: list(v) for (k, v) in grouped} order = g['order'][0] for line_item in g['line_item']: emit(order + line_item)
From AIM3: