org.apache.mahout.cf.taste.hadoop.pseudo
Class RecommenderJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public final class RecommenderJob
extends AbstractJob

This job runs a "pseudo-distributed" recommendation process on Hadoop. It merely runs many Recommender instances on Hadoop, where each instance is a normal non-distributed implementation.

This class configures and runs a RecommenderReducer using Hadoop.

Command line arguments specific to this class are:

  1. -Dmapred.input.dir=(path): Location of a data model file containing preference data, suitable for use with FileDataModel
  2. -Dmapred.output.dir=(path): output path where recommender output should go
  3. --recommenderClassName (string): Fully-qualified class name of Recommender to use to make recommendations. Note that it must have a constructor which takes a DataModel argument.
  4. --numRecommendations (integer): Number of recommendations to compute per user
  5. --usersFile (path): file containing user IDs to recommend for (optional)

General command line options are documented in AbstractJob.

Note that because of how Hadoop parses arguments, all "-D" arguments must appear before all other arguments.

For example, to get started trying this out, set up Hadoop in a pseudo-distributed manner: http://hadoop.apache.org/common/docs/current/quickstart.html You can stop at the point where it instructs you to copy files into HDFS.

Assume your preference data file is input.csv. You will also need to create a file containing all user IDs to write recommendations for, as something like users.txt. Place this input on HDFS like so:

hadoop fs -put input.csv input/input.csv; hadoop fs -put users.txt input/users.txt *

Build Mahout code with mvn package in the core/ directory. Locate target/mahout-core-X.Y-SNAPSHOT.job. This is a JAR file; copy it out to a convenient location and name it recommender.jar.

Now add your own custom recommender code and dependencies. Your IDE produced compiled .class files somewhere and they need to be packaged up as well:

jar uf recommender.jar -C (your classes directory) . *

And launch:

hadoop jar recommender.jar \ org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob \ -Dmapred.input.dir=input/users.csv \ -Dmapred.output.dir=output \ --recommenderClassName your.project.Recommender \ --numRecommendations 10 *


Constructor Summary
RecommenderJob()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getGroup, getInputPath, getOption, getOption, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

RecommenderJob

public RecommenderJob()
Method Detail

run

public int run(String[] args)
        throws IOException,
               ClassNotFoundException,
               InterruptedException
Throws:
IOException
ClassNotFoundException
InterruptedException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.