org.apache.mahout.clustering.minhash
Class MinHashMapper

java.lang.Object
  extended by org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,VectorWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
      extended by org.apache.mahout.clustering.minhash.MinHashMapper

public class MinHashMapper
extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,VectorWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context
 
Constructor Summary
MinHashMapper()
           
 
Method Summary
 void map(org.apache.hadoop.io.Text item, VectorWritable features, org.apache.hadoop.mapreduce.Mapper.Context context)
          Hash all items with each function and retain min.
protected  void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.Mapper
cleanup, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MinHashMapper

public MinHashMapper()
Method Detail

setup

protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
              throws IOException,
                     InterruptedException
Overrides:
setup in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,VectorWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
Throws:
IOException
InterruptedException

map

public void map(org.apache.hadoop.io.Text item,
                VectorWritable features,
                org.apache.hadoop.mapreduce.Mapper.Context context)
         throws IOException,
                InterruptedException
Hash all items with each function and retain min. value for each iteration. We up with X number of minhash signatures.

Now depending upon the number of key-groups (1 - 4) concatenate that many minhash values to form cluster-id as 'key' and item-id as 'value'

Overrides:
map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.Text,VectorWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
Throws:
IOException
InterruptedException


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.