org.apache.mahout.clustering.minhash
Class MinHashMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
org.apache.mahout.clustering.minhash.MinHashMapper
public class MinHashMapper
- extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper |
org.apache.hadoop.mapreduce.Mapper.Context |
Method Summary |
void |
map(org.apache.hadoop.io.Text item,
org.apache.hadoop.io.Writable features,
org.apache.hadoop.mapreduce.Mapper.Context context)
Hash all items with each function and retain min. |
protected void |
setup(org.apache.hadoop.mapreduce.Mapper.Context context)
|
Methods inherited from class org.apache.hadoop.mapreduce.Mapper |
cleanup, run |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MinHashMapper
public MinHashMapper()
setup
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
throws java.io.IOException,
java.lang.InterruptedException
- Overrides:
setup
in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
- Throws:
java.io.IOException
java.lang.InterruptedException
map
public void map(org.apache.hadoop.io.Text item,
org.apache.hadoop.io.Writable features,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws java.io.IOException,
java.lang.InterruptedException
- Hash all items with each function and retain min. value for each iteration.
We up with X number of minhash signatures.
Now depending upon the number of key-groups (1 - 4) concatenate that many
minhash values to form cluster-id as 'key' and item-id as 'value'
- Overrides:
map
in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
- Throws:
java.io.IOException
java.lang.InterruptedException
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.