public final class GroupByJoinPlan extends MapReducePlan
select r((kx,ky),sum(z)) from x in X, y in Y, z = (x,y) where jx(x) = jy(y) group by (kx,ky): (gx(x),gy(y));where:
select ( s(z), i, j ) from (x,i,k) in X, (y,k,j) in Y, z = (x,y) group by (i,j);where the summation s is based on the accumulator acc(c,(x,y))=c+x*y and zero=0
mapX ( x ) for i = 0,n-1 emit ( ((hash(gx(x)) % m)+m*i, jx(x), 1), (1,x) ) mapY ( y ) for i = 0,m-1 emit ( ((hash(gy(y)) % n)*m+i, jy(y), 2), (2,y) )mapper output key: (partition,joinkey,tag), value: (tag,data)
reduce ( (p,_,_), s ) if p != current_partition flush() current_partition = p read x from s first and store it to xs for each y from the rest of s for each x in xs H[(gx(x),gy(y))] = acc( H[(gx(x),gy(y))], (x,y) )where flush() is: for each ((kx,ky),v) in H: emit r((kx,ky),v)
Plan.MRContainerKeyComparator
cache, conf, counter_container, counter_key, max_input_files, temporary_paths, value_container
Constructor and Description |
---|
GroupByJoinPlan() |
Modifier and Type | Method and Description |
---|---|
static DataSet |
groupByJoin(Tree left_join_key_fnc,
Tree right_join_key_fnc,
Tree left_groupby_fnc,
Tree right_groupby_fnc,
Tree accumulator_fnc,
Tree zero,
Tree reduce_fnc,
DataSet X,
DataSet Y,
int num_reducers,
int n,
int m,
String stop_counter)
the GroupByJoin operation:
an equi-join combined with a group-by implemented using hashing
|
aggregate, closure, outputRecords, repeat, setupSplits, setupSplits
, binarySource, binarySource, clean, collect, collect, distribute_compiled_arguments, fileCache, functional_argument, generator, generator, getCache, merge, merge, new_path, parsedSource, parsedSource, print_stream, setCache, size
public static final DataSet groupByJoin(Tree left_join_key_fnc, Tree right_join_key_fnc, Tree left_groupby_fnc, Tree right_groupby_fnc, Tree accumulator_fnc, Tree zero, Tree reduce_fnc, DataSet X, DataSet Y, int num_reducers, int n, int m, String stop_counter) throws Exception
left_join_key_fnc
- left join key function from a to kright_join_key_fnc
- right join key function from b to kleft_groupby_fnc
- left group-by function from a to k1right_groupby_fnc
- right group-by function from b to k2accumulator_fnc
- accumulator function from (c,(a,b)) to czero
- the left zero of accumulator of type creduce_fnc
- reduce function from ((k1,k2),c) to dX
- left data set of type {a}Y
- right data set of type {b}num_reducers
- number of reducersn
- left dimension of the reducer gridm
- right dimension of the reducer gridstop_counter
- optional counter used in repeat operationException
Copyright © 2013-2015 The Apache Software Foundation. All Rights Reserved.