org.apache.mahout.cf.taste.hadoop.pseudo
Class RecommenderJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public final class RecommenderJob
- extends AbstractJob
This job runs a "pseudo-distributed" recommendation process on Hadoop. It merely runs many
Recommender
instances on Hadoop,
where each instance is a normal non-distributed implementation.
This class configures and runs a RecommenderReducer
using Hadoop.
Command line arguments specific to this class are:
- -Dmapred.input.dir=(path): Location of a data model file containing preference data, suitable for use with
FileDataModel
- -Dmapred.output.dir=(path): output path where recommender output should go
- --recommenderClassName (string): Fully-qualified class name of
Recommender
to use to make recommendations.
Note that it must have a constructor which takes a DataModel
argument.
- --numRecommendations (integer): Number of recommendations to compute per user
- --usersFile (path): file containing user IDs to recommend for (optional)
General command line options are documented in AbstractJob
.
Note that because of how Hadoop parses arguments, all "-D" arguments must appear before all other
arguments.
For example, to get started trying this out, set up Hadoop in a pseudo-distributed manner:
http://hadoop.apache.org/common/docs/current/quickstart.html You can stop at the point where it instructs
you to copy files into HDFS.
Assume your preference data file is input.csv
. You will also need to create a file containing
all user IDs to write recommendations for, as something like users.txt
. Place this input on
HDFS like so:
hadoop fs -put input.csv input/input.csv; hadoop fs -put users.txt input/users.txt *
Build Mahout code with mvn package
in the core/ directory. Locate
target/mahout-core-X.Y-SNAPSHOT.job
. This is a JAR file; copy it out to a convenient location
and name it recommender.jar
.
Now add your own custom recommender code and dependencies. Your IDE produced compiled .class files
somewhere and they need to be packaged up as well:
jar uf recommender.jar -C (your classes directory) . *
And launch:
hadoop jar recommender.jar \
org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob \
-Dmapred.input.dir=input/users.csv \
-Dmapred.output.dir=output \
--recommenderClassName your.project.Recommender \
--numRecommendations 10 *
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getDimensions, getGroup, getInputFile, getInputPath, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf |
RecommenderJob
public RecommenderJob()
run
public int run(String[] args)
throws IOException,
ClassNotFoundException,
InterruptedException
- Throws:
IOException
ClassNotFoundException
InterruptedException
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.