org.apache.mahout.text
Class WikipediaToSequenceFile

java.lang.Object
  extended by org.apache.mahout.text.WikipediaToSequenceFile

public final class WikipediaToSequenceFile
extends java.lang.Object

Create and run the Wikipedia Dataset Creator.


Method Summary
static void main(java.lang.String[] args)
          Takes in two arguments: The input Path where the input documents live The output Path where to write the classifier as a SequenceFile
static void runJob(java.lang.String input, java.lang.String output, java.lang.String catFile, boolean exactMatchOnly, boolean all)
          Run the job
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Takes in two arguments:
  1. The input Path where the input documents live
  2. The output Path where to write the classifier as a SequenceFile

Parameters:
args - The args
Throws:
java.io.IOException

runJob

public static void runJob(java.lang.String input,
                          java.lang.String output,
                          java.lang.String catFile,
                          boolean exactMatchOnly,
                          boolean all)
                   throws java.io.IOException
Run the job

Parameters:
input - the input pathname String
output - the output pathname String
catFile - the file containing the Wikipedia categories
exactMatchOnly - if true, then the Wikipedia category must match exactly instead of simply containing the category string
all - if true select all categories
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.