org.apache.mahout.text
Class SequenceFilesFromCsvFilter

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.text.SequenceFilesFromDirectory
              extended by org.apache.mahout.text.SequenceFilesFromDirectoryFilter
                  extended by org.apache.mahout.text.SequenceFilesFromCsvFilter
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.fs.PathFilter, org.apache.hadoop.util.Tool

public final class SequenceFilesFromCsvFilter
extends SequenceFilesFromDirectoryFilter

Implements an example csv to sequence file parser.


Field Summary
static String[] KEY_COLUMN_OPTION
           
static String[] VALUE_COLUMN_OPTION
           
 
Fields inherited from class org.apache.mahout.text.SequenceFilesFromDirectoryFilter
charset, conf, fs, options, prefix, writer
 
Fields inherited from class org.apache.mahout.text.SequenceFilesFromDirectory
CHARSET_OPTION, CHUNK_SIZE_OPTION, FILE_FILTER_CLASS_OPTION, KEY_PREFIX_OPTION
 
Constructor Summary
SequenceFilesFromCsvFilter(org.apache.hadoop.conf.Configuration conf, String keyPrefix, Map<String,String> options, ChunkedWriter writer)
           
 
Method Summary
 void addOptions()
          Override this method in order to add additional options to the command line of the SequenceFileFromDirectory job.
static void main(String[] args)
           
 Map<String,String> parseOptions()
          Override this method in order to parse your additional options from the command line.
protected  void process(org.apache.hadoop.fs.FileStatus fst, org.apache.hadoop.fs.Path current)
           
 
Methods inherited from class org.apache.mahout.text.SequenceFilesFromDirectoryFilter
accept, getOptions
 
Methods inherited from class org.apache.mahout.text.SequenceFilesFromDirectory
run, run
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, getInputPath, getOption, getOutputPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

KEY_COLUMN_OPTION

public static final String[] KEY_COLUMN_OPTION

VALUE_COLUMN_OPTION

public static final String[] VALUE_COLUMN_OPTION
Constructor Detail

SequenceFilesFromCsvFilter

public SequenceFilesFromCsvFilter(org.apache.hadoop.conf.Configuration conf,
                                  String keyPrefix,
                                  Map<String,String> options,
                                  ChunkedWriter writer)
                           throws IOException
Throws:
IOException
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

addOptions

public void addOptions()
Description copied from class: SequenceFilesFromDirectory
Override this method in order to add additional options to the command line of the SequenceFileFromDirectory job. Do not forget to call super() otherwise all standard options (input/output dirs etc) will not be available.

Overrides:
addOptions in class SequenceFilesFromDirectory

parseOptions

public Map<String,String> parseOptions()
                                throws IOException
Description copied from class: SequenceFilesFromDirectory
Override this method in order to parse your additional options from the command line. Do not forget to call super() otherwise standard options (input/output dirs etc) will not be available.

Overrides:
parseOptions in class SequenceFilesFromDirectory
Throws:
IOException

process

protected void process(org.apache.hadoop.fs.FileStatus fst,
                       org.apache.hadoop.fs.Path current)
                throws IOException
Specified by:
process in class SequenceFilesFromDirectoryFilter
Throws:
IOException


Copyright © 2008-2011 The Apache Software Foundation. All Rights Reserved.