Package org.apache.mahout.clustering

This package provides several clustering algorithm implementations.

See:
          Description

Interface Summary
Printable Implementations of this interface have a printable representation.
 

Class Summary
ClusterBase  
 

Package org.apache.mahout.clustering Description

This package provides several clustering algorithm implementations. Clustering usually groups a set of objects into groups of similar items. The definition of similarity usually is up to you - for text documents, cosine-distance/-similarity is recommended. Mahout also features other types of distance measure like Euclidean distance.
Input of each clustering algorithm is a set of vectors representing your items. For texts in general these are TFIDF or Bag of words representations of the documents.
Output of each clustering algorithm is either a hard or soft assigment of items to clusters.



Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.