|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
AggregatorMapper | outputs the pattern for each item in the pattern, so that reducer can group them and select the top K frequent patterns |
AggregatorReducer | groups all Frequent Patterns containing an item and outputs the top K patterns containing that particular item |
FPGrowthDriver | |
MultiTransactionTreeIterator | Iterates over multiple transaction trees to produce a single iterator of transactions |
ParallelCountingMapper | maps all items in a particular transaction like the way it is done in Hadoop WordCount example |
ParallelCountingReducer | sums up the item count and output the item and the count This can also be used as a local Combiner. |
ParallelFPGrowthCombiner | takes each group of dependent transactions and\ compacts it in a TransactionTree structure |
ParallelFPGrowthMapper | maps each transaction to all unique items groups in the transaction. |
ParallelFPGrowthReducer | takes each group of transactions and runs Vanilla FPGrowth on it and outputs the the Top K frequent Patterns for each group. |
PFPGrowth | Parallel FP Growth Driver Class. |
TransactionSortingMapper | maps each transaction to all unique items groups in the transaction. |
TransactionSortingReducer | takes each group of transactions and runs Vanilla FPGrowth on it and outputs the the Top K frequent Patterns for each group. |
TransactionTree | A compact representation of transactions modeled on the lines to
FPTree This reduces plenty of space and speeds up
Map/Reduce of PFPGrowth algorithm by reducing data size passed from the Mapper to the reducer where
FPGrowth mining is done |
FPGrowth<String> fp = new FPGrowth<String>(); Set<String> features = new HashSet<String>(); fp.generateTopKStringFrequentPatterns( new StringRecordIterator(new FileLineIterable(new File(input), encoding, false), pattern), fp.generateFList( new StringRecordIterator(new FileLineIterable(new File(input), encoding, false), pattern), minSupport), minSupport, maxHeapSize, features, new StringOutputConvertor(new SequenceFileOutputCollector<Text, TopKStringPatterns>(writer)) );
The command line launcher for string transaction data org.apache.mahout.fpm.pfpgrowth.FPGrowthJob has other features including specifying the regex pattern for spitting a string line of a transaction into the constituent features
The numGroups parameter in FPGrowthJob specifies the number of groups into which transactions have to be decomposed. The numTreeCacheEntries parameter specifies the number of generated conditional FP-Trees to be kept in memory so as not to regenerate them. Increasing this number increases the memory consumption but might improve speed until a certain point. This depends entirely on the dataset in question. A value of 5-10 is recommended for mining up to top 100 patterns for each feature
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |