org.apache.mahout.classifier.bayes
Class WikipediaDatasetCreatorDriver

java.lang.Object
  extended by org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver

public final class WikipediaDatasetCreatorDriver
extends java.lang.Object

Create and run the Wikipedia Dataset Creator.


Method Summary
static void main(java.lang.String[] args)
          Takes in two arguments: The input Path where the input documents live The output Path where to write the classifier as a SequenceFile
static void runJob(java.lang.String input, java.lang.String output, java.lang.String catFile, boolean exactMatchOnly, java.lang.Class<? extends org.apache.lucene.analysis.Analyzer> analyzerClass)
          Run the job
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Takes in two arguments:
  1. The input Path where the input documents live
  2. The output Path where to write the classifier as a SequenceFile

Parameters:
args - The args
Throws:
java.io.IOException

runJob

public static void runJob(java.lang.String input,
                          java.lang.String output,
                          java.lang.String catFile,
                          boolean exactMatchOnly,
                          java.lang.Class<? extends org.apache.lucene.analysis.Analyzer> analyzerClass)
                   throws java.io.IOException
Run the job

Parameters:
input - the input pathname String
output - the output pathname String
catFile - the file containing the Wikipedia categories
exactMatchOnly - if true, then the Wikipedia category must match exactly instead of simply containing the category string
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.