org.apache.mahout.text
Class SequenceFilesFromMailArchives

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.text.SequenceFilesFromMailArchives
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public final class SequenceFilesFromMailArchives
extends AbstractJob

Converts a directory of gzipped mail archives into SequenceFiles of specified chunkSize. This class is similar to SequenceFilesFromDirectory except it uses block-compressed SequenceFiles and parses out the subject and body text of each mail message into a separate key/value pair.


Nested Class Summary
 class SequenceFilesFromMailArchives.PrefixAdditionFilter
           
 
Field Summary
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Constructor Summary
SequenceFilesFromMailArchives()
           
 
Method Summary
 void createSequenceFiles(MailOptions options)
           
static void main(String[] args)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getDimensions, getGroup, getInputFile, getInputPath, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf
 

Constructor Detail

SequenceFilesFromMailArchives

public SequenceFilesFromMailArchives()
Method Detail

createSequenceFiles

public void createSequenceFiles(MailOptions options)
                         throws IOException
Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.