org.apache.mahout.fpm.pfpgrowth
Class PFPGrowth

java.lang.Object
  extended by org.apache.mahout.fpm.pfpgrowth.PFPGrowth

public final class PFPGrowth
extends java.lang.Object

Parallel FP Growth Driver Class. Runs each stage of PFPGrowth as described in the paper http://infolab.stanford.edu/~echang/recsys08-69.pdf


Field Summary
static java.lang.String ENCODING
           
static java.lang.String F_LIST
           
static java.lang.String FILE_PATTERN
           
static java.lang.String FPGROWTH
           
static java.lang.String FREQUENT_PATTERNS
           
static java.lang.String G_LIST
           
static java.lang.String INPUT
           
static java.lang.String MAX_HEAPSIZE
           
static java.lang.String MIN_SUPPORT
           
static java.lang.String NUM_GROUPS
           
static java.lang.String OUTPUT
           
static java.lang.String PARALLEL_COUNTING
           
static java.lang.String PFP_PARAMETERS
           
static java.lang.String SORTED_OUTPUT
           
static java.lang.String SPLIT_PATTERN
           
static java.util.regex.Pattern SPLITTER
           
static java.lang.String TREE_CACHE_SIZE
           
 
Method Summary
static java.util.List<Pair<java.lang.String,java.lang.Long>> deserializeList(Parameters params, java.lang.String key, org.apache.hadoop.conf.Configuration conf)
          Generates the fList from the serialized string representation
static java.util.Map<java.lang.String,java.lang.Long> deserializeMap(Parameters params, java.lang.String key, org.apache.hadoop.conf.Configuration conf)
          Generates the gList(Group ID Mapping of Various frequent Features) Map from the corresponding serialized representation
static java.util.List<Pair<java.lang.String,java.lang.Long>> readFList(Parameters params)
          read the feature frequency List which is built at the end of the Parallel counting job
static java.util.List<Pair<java.lang.String,TopKStringPatterns>> readFrequentPattern(Parameters params)
          Read the Frequent Patterns generated from Text
static void runPFPGrowth(Parameters params)
           
static void startAggregating(Parameters params)
          Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature
static void startGroupingItems(Parameters params)
          Group the given Features into g groups as defined by the numGroups parameter in params
static void startParallelCounting(Parameters params)
          Count the frequencies of various features in parallel using Map/Reduce
static void startParallelFPGrowth(Parameters params)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
static void startTransactionSorting(Parameters params)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ENCODING

public static final java.lang.String ENCODING
See Also:
Constant Field Values

F_LIST

public static final java.lang.String F_LIST
See Also:
Constant Field Values

G_LIST

public static final java.lang.String G_LIST
See Also:
Constant Field Values

NUM_GROUPS

public static final java.lang.String NUM_GROUPS
See Also:
Constant Field Values

OUTPUT

public static final java.lang.String OUTPUT
See Also:
Constant Field Values

MIN_SUPPORT

public static final java.lang.String MIN_SUPPORT
See Also:
Constant Field Values

MAX_HEAPSIZE

public static final java.lang.String MAX_HEAPSIZE
See Also:
Constant Field Values

INPUT

public static final java.lang.String INPUT
See Also:
Constant Field Values

PFP_PARAMETERS

public static final java.lang.String PFP_PARAMETERS
See Also:
Constant Field Values

FILE_PATTERN

public static final java.lang.String FILE_PATTERN
See Also:
Constant Field Values

FPGROWTH

public static final java.lang.String FPGROWTH
See Also:
Constant Field Values

FREQUENT_PATTERNS

public static final java.lang.String FREQUENT_PATTERNS
See Also:
Constant Field Values

PARALLEL_COUNTING

public static final java.lang.String PARALLEL_COUNTING
See Also:
Constant Field Values

SORTED_OUTPUT

public static final java.lang.String SORTED_OUTPUT
See Also:
Constant Field Values

SPLIT_PATTERN

public static final java.lang.String SPLIT_PATTERN
See Also:
Constant Field Values

TREE_CACHE_SIZE

public static final java.lang.String TREE_CACHE_SIZE
See Also:
Constant Field Values

SPLITTER

public static final java.util.regex.Pattern SPLITTER
Method Detail

deserializeList

public static java.util.List<Pair<java.lang.String,java.lang.Long>> deserializeList(Parameters params,
                                                                                    java.lang.String key,
                                                                                    org.apache.hadoop.conf.Configuration conf)
                                                                             throws java.io.IOException
Generates the fList from the serialized string representation

Parameters:
params -
key -
conf -
Returns:
Deserialized Feature Frequency List
Throws:
java.io.IOException

deserializeMap

public static java.util.Map<java.lang.String,java.lang.Long> deserializeMap(Parameters params,
                                                                            java.lang.String key,
                                                                            org.apache.hadoop.conf.Configuration conf)
                                                                     throws java.io.IOException
Generates the gList(Group ID Mapping of Various frequent Features) Map from the corresponding serialized representation

Parameters:
params -
key -
conf -
Returns:
Deserialized Group List
Throws:
java.io.IOException

readFList

public static java.util.List<Pair<java.lang.String,java.lang.Long>> readFList(Parameters params)
                                                                       throws java.io.IOException
read the feature frequency List which is built at the end of the Parallel counting job

Returns:
Feature Frequency List
Throws:
java.io.IOException

readFrequentPattern

public static java.util.List<Pair<java.lang.String,TopKStringPatterns>> readFrequentPattern(Parameters params)
                                                                                     throws java.io.IOException
Read the Frequent Patterns generated from Text

Returns:
List of TopK patterns for each string frequent feature
Throws:
java.io.IOException

runPFPGrowth

public static void runPFPGrowth(Parameters params)
                         throws java.io.IOException,
                                java.lang.InterruptedException,
                                java.lang.ClassNotFoundException
Parameters:
params - params should contain input and output locations as a string value, the additional parameters include minSupport(3), maxHeapSize(50), numGroups(1000)
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startAggregating

public static void startAggregating(Parameters params)
                             throws java.io.IOException,
                                    java.lang.InterruptedException,
                                    java.lang.ClassNotFoundException
Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature

Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startGroupingItems

public static void startGroupingItems(Parameters params)
                               throws java.io.IOException
Group the given Features into g groups as defined by the numGroups parameter in params

Parameters:
params -
Throws:
java.io.IOException

startParallelCounting

public static void startParallelCounting(Parameters params)
                                  throws java.io.IOException,
                                         java.lang.InterruptedException,
                                         java.lang.ClassNotFoundException
Count the frequencies of various features in parallel using Map/Reduce

Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startTransactionSorting

public static void startTransactionSorting(Parameters params)
                                    throws java.io.IOException,
                                           java.lang.InterruptedException,
                                           java.lang.ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

startParallelFPGrowth

public static void startParallelFPGrowth(Parameters params)
                                  throws java.io.IOException,
                                         java.lang.InterruptedException,
                                         java.lang.ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.