org.apache.mahout.clustering.canopy
Class Canopy

java.lang.Object
  extended by org.apache.mahout.clustering.ClusterBase
      extended by org.apache.mahout.clustering.canopy.Canopy
All Implemented Interfaces:
org.apache.hadoop.io.Writable, Printable

public class Canopy
extends ClusterBase

This class models a canopy as a center point, the number of points that are contained within it according to the application of some distance metric, and a point total which is the sum of all the points and is used to compute the centroid when needed.


Constructor Summary
Canopy()
          Used for deserializaztion as a writable
Canopy(Vector point, int canopyId)
          Create a new Canopy containing the given point and canopyId
 
Method Summary
 void addPoint(Vector point)
          Add a point to the canopy
 java.lang.String asFormatString()
           
 Vector computeCentroid()
          Compute the centroid by averaging the pointTotals
static Canopy decodeCanopy(java.lang.String formattedString)
          Decodes and returns a Canopy from the formattedString
 void emitPoint(Vector point, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Vector> collector)
          Emit the point to the collector, keyed by the canopy's formatted representation
static java.lang.String formatCanopy(Canopy canopy)
          Format the canopy for output
 java.lang.String getIdentifier()
           
 void readFields(java.io.DataInput in)
          Reads in the id, nothing else
 java.lang.String toString()
           
 void write(java.io.DataOutput out)
          Simply writes out the id, and that's it!
 
Methods inherited from class org.apache.mahout.clustering.ClusterBase
asFormatString, asJsonString, formatVector, getCenter, getId, getNumPoints, getPointTotal, setCenter, setId, setNumPoints, setPointTotal
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Canopy

public Canopy()
Used for deserializaztion as a writable


Canopy

public Canopy(Vector point,
              int canopyId)
Create a new Canopy containing the given point and canopyId

Parameters:
point - a point in vector space
canopyId - an int identifying the canopy local to this process only
Method Detail

write

public void write(java.io.DataOutput out)
           throws java.io.IOException
Description copied from class: ClusterBase
Simply writes out the id, and that's it!

Specified by:
write in interface org.apache.hadoop.io.Writable
Overrides:
write in class ClusterBase
Parameters:
out - The DataOutput
Throws:
java.io.IOException

readFields

public void readFields(java.io.DataInput in)
                throws java.io.IOException
Description copied from class: ClusterBase
Reads in the id, nothing else

Specified by:
readFields in interface org.apache.hadoop.io.Writable
Overrides:
readFields in class ClusterBase
Throws:
java.io.IOException

formatCanopy

public static java.lang.String formatCanopy(Canopy canopy)
Format the canopy for output


asFormatString

public java.lang.String asFormatString()
Specified by:
asFormatString in class ClusterBase
Returns:

decodeCanopy

public static Canopy decodeCanopy(java.lang.String formattedString)
Decodes and returns a Canopy from the formattedString

Parameters:
formattedString - a String prouced by formatCanopy
Returns:
a new Canopy

addPoint

public void addPoint(Vector point)
Add a point to the canopy

Parameters:
point - some point to add

emitPoint

public void emitPoint(Vector point,
                      org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,Vector> collector)
               throws java.io.IOException
Emit the point to the collector, keyed by the canopy's formatted representation

Parameters:
point - a point to emit.
Throws:
java.io.IOException

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

getIdentifier

public java.lang.String getIdentifier()
Specified by:
getIdentifier in class ClusterBase

computeCentroid

public Vector computeCentroid()
Compute the centroid by averaging the pointTotals

Specified by:
computeCentroid in class ClusterBase
Returns:
a RandomAccessSparseVector (required by Mapper) which is the new centroid


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.