org.apache.mahout.fpm.pfpgrowth
Class PFPGrowth

java.lang.Object
  extended by org.apache.mahout.fpm.pfpgrowth.PFPGrowth

public final class PFPGrowth
extends java.lang.Object

Parallel FP Growth Driver Class. Runs each stage of PFPGrowth as described in the paper http://infolab.stanford.edu/~echang/recsys08-69.pdf


Field Summary
static java.util.regex.Pattern SPLITTER
           
 
Method Summary
static java.util.List<Pair<java.lang.String,java.lang.Long>> deserializeList(Parameters params, java.lang.String key, org.apache.hadoop.conf.Configuration conf)
          Generates the fList from the serialized string representation
static java.util.Map<java.lang.String,java.lang.Long> deserializeMap(Parameters params, java.lang.String key, org.apache.hadoop.conf.Configuration conf)
          Generates the gList(Group ID Mapping of Various frequent Features) Map from the corresponding serialized representation
static java.util.List<Pair<java.lang.String,java.lang.Long>> readFList(Parameters params)
          read the feature frequency List which is built at the end of the Parallel counting job
static java.util.List<Pair<java.lang.String,TopKStringPatterns>> readFrequentPattern(Parameters params)
          Read the Frequent Patterns generated from Text
static void runPFPGrowth(Parameters params)
           
static void startAggregating(Parameters params)
          Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature
static void startGroupingItems(Parameters params)
          Group the given Features into g groups as defined by the numGroups parameter in params
static void startParallelCounting(Parameters params)
          Count the frequencies of various features in parallel using Map/Reduce
static void startParallelFPGrowth(Parameters params)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
static void startTransactionSorting(Parameters params)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SPLITTER

public static final java.util.regex.Pattern SPLITTER
Method Detail

deserializeList

public static java.util.List<Pair<java.lang.String,java.lang.Long>> deserializeList(Parameters params,
                                                                                    java.lang.String key,
                                                                                    org.apache.hadoop.conf.Configuration conf)
                                                                             throws java.io.IOException
Generates the fList from the serialized string representation

Parameters:
params -
key -
conf -
Returns:
Deserialized Feature Frequency List
Throws:
java.io.IOException

deserializeMap

public static java.util.Map<java.lang.String,java.lang.Long> deserializeMap(Parameters params,
                                                                            java.lang.String key,
                                                                            org.apache.hadoop.conf.Configuration conf)
                                                                     throws java.io.IOException
Generates the gList(Group ID Mapping of Various frequent Features) Map from the corresponding serialized representation

Parameters:
params -
key -
conf -
Returns:
Deserialized Group List
Throws:
java.io.IOException

readFList

public static java.util.List<Pair<java.lang.String,java.lang.Long>> readFList(Parameters params)
                                                                       throws java.io.IOException
read the feature frequency List which is built at the end of the Parallel counting job

Parameters:
params -
Returns:
Feature Frequency List
Throws:
java.io.IOException

readFrequentPattern

public static java.util.List<Pair<java.lang.String,TopKStringPatterns>> readFrequentPattern(Parameters params)
                                                                                     throws java.io.IOException
Read the Frequent Patterns generated from Text

Parameters:
params -
Returns:
List of TopK patterns for each string frequent feature
Throws:
java.io.IOException

runPFPGrowth

public static void runPFPGrowth(Parameters params)
                         throws java.io.IOException,
                                java.lang.InterruptedException,
                                java.lang.ClassNotFoundException
Parameters:
params - params should contain input and output locations as a string value, the additional parameters include minSupport(3), maxHeapSize(50), numGroups(1000)
Throws:
java.io.IOException
java.lang.ClassNotFoundException
java.lang.InterruptedException

startAggregating

public static void startAggregating(Parameters params)
                             throws java.io.IOException,
                                    java.lang.InterruptedException,
                                    java.lang.ClassNotFoundException
Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature

Parameters:
params -
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startGroupingItems

public static void startGroupingItems(Parameters params)
                               throws java.io.IOException
Group the given Features into g groups as defined by the numGroups parameter in params

Parameters:
params -
Throws:
java.io.IOException

startParallelCounting

public static void startParallelCounting(Parameters params)
                                  throws java.io.IOException,
                                         java.lang.InterruptedException,
                                         java.lang.ClassNotFoundException
Count the frequencies of various features in parallel using Map/Reduce

Parameters:
params -
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startTransactionSorting

public static void startTransactionSorting(Parameters params)
                                    throws java.io.IOException,
                                           java.lang.InterruptedException,
                                           java.lang.ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Parameters:
params -
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startParallelFPGrowth

public static void startParallelFPGrowth(Parameters params)
                                  throws java.io.IOException,
                                         java.lang.InterruptedException,
                                         java.lang.ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Parameters:
params -
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.