org.apache.mahout.common
Class AbstractJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
Direct Known Subclasses:
CanopyDriver, CollocDriver, DirichletDriver, DistributedLanczosSolver.DistributedLanczosSolverJob, EigencutsDriver, EigenVerificationJob, FuzzyKMeansDriver, ItemSimilarityJob, KMeansDriver, LDADriver, MatrixMultiplicationJob, MeanShiftCanopyDriver, MinHashDriver, RecommenderJob, RecommenderJob, RowSimilarityJob, SlopeOneAverageDiffsJob, SpectralKMeansDriver, TransposeJob

public abstract class AbstractJob
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

Superclass of many Mahout Hadoop "jobs". A job drives configuration and launch of one or more maps and reduces in order to accomplish some task.

Command line arguments available to all subclasses are:

In addition, note some key command line parameters that are parsed by Hadoop, which jobs may need to set:

Note that because of how Hadoop parses arguments, all "-D" arguments must appear before all other arguments.


Constructor Summary
protected AbstractJob()
           
 
Method Summary
protected  void addFlag(java.lang.String name, java.lang.String shortName, java.lang.String description)
          Add an option with no argument whose presence can be checked for using containsKey method on the map returned by parseArguments(String[]);
protected  void addInputOption()
          Add the default input directory option, '-i' which takes a directory name as an argument.
protected  org.apache.commons.cli2.Option addOption(org.apache.commons.cli2.Option option)
          Add an arbitrary option to the set of options this job will parse when parseArguments(String[]) is called.
protected  void addOption(java.lang.String name, java.lang.String shortName, java.lang.String description)
          Add an option to the the set of options this job will parse when parseArguments(String[]) is called.
protected  void addOption(java.lang.String name, java.lang.String shortName, java.lang.String description, boolean required)
          Add an option to the the set of options this job will parse when parseArguments(String[]) is called.
protected  void addOption(java.lang.String name, java.lang.String shortName, java.lang.String description, java.lang.String defaultValue)
          Add an option to the the set of options this job will parse when parseArguments(String[]) is called.
protected  void addOutputOption()
          Add the default output directory option, '-o' which takes a directory name as an argument.
protected  org.apache.hadoop.fs.Path getInputPath()
          Returns the input path established by a call to parseArguments(String[]).
 java.lang.String getOption(java.lang.String optionName)
           
protected  org.apache.hadoop.fs.Path getOutputPath()
          Returns the output path established by a call to parseArguments(String[]).
 boolean hasOption(java.lang.String optionName)
           
static java.lang.String keyFor(java.lang.String optionName)
          Build the option key (--name) from the option name
protected static void maybePut(java.util.Map<java.lang.String,java.lang.String> args, org.apache.commons.cli2.CommandLine cmdLine, org.apache.commons.cli2.Option... opt)
           
 java.util.Map<java.lang.String,java.lang.String> parseArguments(java.lang.String[] args)
          Parse the arguments specified based on the options defined using the various addOption methods.
protected  void parseDirectories(org.apache.commons.cli2.CommandLine cmdLine)
          Obtain input and output directories from command-line options or hadoop properties.
protected  org.apache.hadoop.mapreduce.Job prepareJob(org.apache.hadoop.fs.Path inputPath, org.apache.hadoop.fs.Path outputPath, java.lang.Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat, java.lang.Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper, java.lang.Class<? extends org.apache.hadoop.io.Writable> mapperKey, java.lang.Class<? extends org.apache.hadoop.io.Writable> mapperValue, java.lang.Class<? extends org.apache.hadoop.mapreduce.Reducer> reducer, java.lang.Class<? extends org.apache.hadoop.io.Writable> reducerKey, java.lang.Class<? extends org.apache.hadoop.io.Writable> reducerValue, java.lang.Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat)
           
protected static boolean shouldRunNextPhase(java.util.Map<java.lang.String,java.lang.String> args, java.util.concurrent.atomic.AtomicInteger currentPhase)
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.util.Tool
run
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

AbstractJob

protected AbstractJob()
Method Detail

getInputPath

protected org.apache.hadoop.fs.Path getInputPath()
Returns the input path established by a call to parseArguments(String[]). The source of the path may be an input option added using addInputOption() or it may be the value of the mapred.input.dir configuration property.


getOutputPath

protected org.apache.hadoop.fs.Path getOutputPath()
Returns the output path established by a call to parseArguments(String[]). The source of the path may be an output option added using addOutputOption() or it may be the value of the mapred.input.dir configuration property.


addFlag

protected void addFlag(java.lang.String name,
                       java.lang.String shortName,
                       java.lang.String description)
Add an option with no argument whose presence can be checked for using containsKey method on the map returned by parseArguments(String[]);


addOption

protected void addOption(java.lang.String name,
                         java.lang.String shortName,
                         java.lang.String description)
Add an option to the the set of options this job will parse when parseArguments(String[]) is called. This options has an argument with null as its default value.


addOption

protected void addOption(java.lang.String name,
                         java.lang.String shortName,
                         java.lang.String description,
                         boolean required)
Add an option to the the set of options this job will parse when parseArguments(String[]) is called.

Parameters:
required - if true the parseArguments(String[]) will throw fail with an error and usage message if this option is not specified on the command line.

addOption

protected void addOption(java.lang.String name,
                         java.lang.String shortName,
                         java.lang.String description,
                         java.lang.String defaultValue)
Add an option to the the set of options this job will parse when parseArguments(String[]) is called. If this option is not specified on the command line the default value will be used.

Parameters:
defaultValue - the default argument value if this argument is not found on the command-line. null is allowed.

addOption

protected org.apache.commons.cli2.Option addOption(org.apache.commons.cli2.Option option)
Add an arbitrary option to the set of options this job will parse when parseArguments(String[]) is called. If this option has no argument, use containsKey on the map returned by parseArguments to check for its presence. Otherwise, the string value of the option will be placed in the map using a key equal to this options long name preceded by '--'.

Returns:
the option added.

addInputOption

protected void addInputOption()
Add the default input directory option, '-i' which takes a directory name as an argument. When parseArguments(String[]) is called, the inputPath will be set based upon the value for this option. If this method is called, the input is required.


addOutputOption

protected void addOutputOption()
Add the default output directory option, '-o' which takes a directory name as an argument. When parseArguments(String[]) is called, the outputPath will be set based upon the value for this option. If this method is called, the output is required.


parseArguments

public java.util.Map<java.lang.String,java.lang.String> parseArguments(java.lang.String[] args)
Parse the arguments specified based on the options defined using the various addOption methods. If -h is specified or an exception is encountered print help and return null. Has the side effect of setting inputPath and outputPath if addInputOption or addOutputOption or mapred.input.dir or mapred.output.dir are present in the Configuration.

Returns:
a Map containing options and their argument values. The presence of a flag can be tested using containsKey, while argument values can be retrieved using get(optionName. The names used for keys are the option name parameter prefixed by '--'.

keyFor

public static java.lang.String keyFor(java.lang.String optionName)
Build the option key (--name) from the option name


getOption

public java.lang.String getOption(java.lang.String optionName)
Returns:
the requested option, or null if it has not been specified

hasOption

public boolean hasOption(java.lang.String optionName)
Returns:
if the requested option has been specified

parseDirectories

protected void parseDirectories(org.apache.commons.cli2.CommandLine cmdLine)
Obtain input and output directories from command-line options or hadoop properties. If addInputOption or addOutputOption has been called, this method will throw an OptionException if no source (command-line or property) for that value is present. Otherwise, inputPath or outputPath will be non-null only if specified as a hadoop property. Command-line options take precedence over hadoop properties.

Parameters:
cmdLine -
Throws:
java.lang.IllegalArgumentException - if either inputOption is present, and neither --input nor -Dmapred.input dir are specified or outputOption is present and neither --output nor -Dmapred.output.dir are specified.

maybePut

protected static void maybePut(java.util.Map<java.lang.String,java.lang.String> args,
                               org.apache.commons.cli2.CommandLine cmdLine,
                               org.apache.commons.cli2.Option... opt)

shouldRunNextPhase

protected static boolean shouldRunNextPhase(java.util.Map<java.lang.String,java.lang.String> args,
                                            java.util.concurrent.atomic.AtomicInteger currentPhase)

prepareJob

protected org.apache.hadoop.mapreduce.Job prepareJob(org.apache.hadoop.fs.Path inputPath,
                                                     org.apache.hadoop.fs.Path outputPath,
                                                     java.lang.Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormat,
                                                     java.lang.Class<? extends org.apache.hadoop.mapreduce.Mapper> mapper,
                                                     java.lang.Class<? extends org.apache.hadoop.io.Writable> mapperKey,
                                                     java.lang.Class<? extends org.apache.hadoop.io.Writable> mapperValue,
                                                     java.lang.Class<? extends org.apache.hadoop.mapreduce.Reducer> reducer,
                                                     java.lang.Class<? extends org.apache.hadoop.io.Writable> reducerKey,
                                                     java.lang.Class<? extends org.apache.hadoop.io.Writable> reducerValue,
                                                     java.lang.Class<? extends org.apache.hadoop.mapreduce.OutputFormat> outputFormat)
                                              throws java.io.IOException
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.