org.apache.mahout.fpm.pfpgrowth
Class PFPGrowth

java.lang.Object
  extended by org.apache.mahout.fpm.pfpgrowth.PFPGrowth

public final class PFPGrowth
extends Object

Parallel FP Growth Driver Class. Runs each stage of PFPGrowth as described in the paper http://infolab.stanford.edu/~echang/recsys08-69.pdf


Field Summary
static String ENCODING
           
static String F_LIST
           
static String FILE_PATTERN
           
static String FPGROWTH
           
static String FREQUENT_PATTERNS
           
static String G_LIST
           
static String INPUT
           
static String MAX_HEAPSIZE
           
static String MIN_SUPPORT
           
static String NUM_GROUPS
           
static String OUTPUT
           
static String PARALLEL_COUNTING
           
static String PFP_PARAMETERS
           
static String SORTED_OUTPUT
           
static String SPLIT_PATTERN
           
static Pattern SPLITTER
           
 
Method Summary
static List<Pair<String,Long>> deserializeList(Parameters params, String key, org.apache.hadoop.conf.Configuration conf)
          Generates the fList from the serialized string representation
static Map<String,Long> deserializeMap(Parameters params, String key, org.apache.hadoop.conf.Configuration conf)
          Generates the gList(Group ID Mapping of Various frequent Features) Map from the corresponding serialized representation
static List<Pair<String,Long>> readFList(Parameters params)
          read the feature frequency List which is built at the end of the Parallel counting job
static List<Pair<String,TopKStringPatterns>> readFrequentPattern(Parameters params)
          Read the Frequent Patterns generated from Text
static void runPFPGrowth(Parameters params)
           
static void startAggregating(Parameters params)
          Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature
static void startGroupingItems(Parameters params)
          Group the given Features into g groups as defined by the numGroups parameter in params
static void startParallelCounting(Parameters params)
          Count the frequencies of various features in parallel using Map/Reduce
static void startParallelFPGrowth(Parameters params)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
static void startTransactionSorting(Parameters params)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ENCODING

public static final String ENCODING
See Also:
Constant Field Values

F_LIST

public static final String F_LIST
See Also:
Constant Field Values

G_LIST

public static final String G_LIST
See Also:
Constant Field Values

NUM_GROUPS

public static final String NUM_GROUPS
See Also:
Constant Field Values

OUTPUT

public static final String OUTPUT
See Also:
Constant Field Values

MIN_SUPPORT

public static final String MIN_SUPPORT
See Also:
Constant Field Values

MAX_HEAPSIZE

public static final String MAX_HEAPSIZE
See Also:
Constant Field Values

INPUT

public static final String INPUT
See Also:
Constant Field Values

PFP_PARAMETERS

public static final String PFP_PARAMETERS
See Also:
Constant Field Values

FILE_PATTERN

public static final String FILE_PATTERN
See Also:
Constant Field Values

FPGROWTH

public static final String FPGROWTH
See Also:
Constant Field Values

FREQUENT_PATTERNS

public static final String FREQUENT_PATTERNS
See Also:
Constant Field Values

PARALLEL_COUNTING

public static final String PARALLEL_COUNTING
See Also:
Constant Field Values

SORTED_OUTPUT

public static final String SORTED_OUTPUT
See Also:
Constant Field Values

SPLIT_PATTERN

public static final String SPLIT_PATTERN
See Also:
Constant Field Values

SPLITTER

public static final Pattern SPLITTER
Method Detail

deserializeList

public static List<Pair<String,Long>> deserializeList(Parameters params,
                                                      String key,
                                                      org.apache.hadoop.conf.Configuration conf)
                                               throws IOException
Generates the fList from the serialized string representation

Returns:
Deserialized Feature Frequency List
Throws:
IOException

deserializeMap

public static Map<String,Long> deserializeMap(Parameters params,
                                              String key,
                                              org.apache.hadoop.conf.Configuration conf)
                                       throws IOException
Generates the gList(Group ID Mapping of Various frequent Features) Map from the corresponding serialized representation

Returns:
Deserialized Group List
Throws:
IOException

readFList

public static List<Pair<String,Long>> readFList(Parameters params)
read the feature frequency List which is built at the end of the Parallel counting job

Returns:
Feature Frequency List

readFrequentPattern

public static List<Pair<String,TopKStringPatterns>> readFrequentPattern(Parameters params)
                                                                 throws IOException
Read the Frequent Patterns generated from Text

Returns:
List of TopK patterns for each string frequent feature
Throws:
IOException

runPFPGrowth

public static void runPFPGrowth(Parameters params)
                         throws IOException,
                                InterruptedException,
                                ClassNotFoundException
Parameters:
params - params should contain input and output locations as a string value, the additional parameters include minSupport(3), maxHeapSize(50), numGroups(1000)
Throws:
IOException
InterruptedException
ClassNotFoundException

startAggregating

public static void startAggregating(Parameters params)
                             throws IOException,
                                    InterruptedException,
                                    ClassNotFoundException
Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature

Throws:
IOException
InterruptedException
ClassNotFoundException

startGroupingItems

public static void startGroupingItems(Parameters params)
                               throws IOException
Group the given Features into g groups as defined by the numGroups parameter in params

Parameters:
params -
Throws:
IOException

startParallelCounting

public static void startParallelCounting(Parameters params)
                                  throws IOException,
                                         InterruptedException,
                                         ClassNotFoundException
Count the frequencies of various features in parallel using Map/Reduce

Throws:
IOException
InterruptedException
ClassNotFoundException

startTransactionSorting

public static void startTransactionSorting(Parameters params)
                                    throws IOException,
                                           InterruptedException,
                                           ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Throws:
IOException
InterruptedException
ClassNotFoundException

startParallelFPGrowth

public static void startParallelFPGrowth(Parameters params)
                                  throws IOException,
                                         InterruptedException,
                                         ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008-2011 The Apache Software Foundation. All Rights Reserved.