|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.classifier.BayesFileFormatter
public final class BayesFileFormatter
Flatten a file into format that can be read by the Bayes M/R job.
One document per line, first token is the label followed by a tab, rest of the line are the terms.
Method Summary | |
---|---|
static void |
collapse(java.lang.String label,
org.apache.lucene.analysis.Analyzer analyzer,
java.io.File inputDir,
java.nio.charset.Charset charset,
java.io.File outputFile)
Collapse all the files in the inputDir into a single file in the proper Bayes format, 1 document per line |
static void |
format(java.lang.String label,
org.apache.lucene.analysis.Analyzer analyzer,
java.io.File input,
java.nio.charset.Charset charset,
java.io.File outDir)
Write the input files to the outdir, one output file per input file |
static void |
main(java.lang.String[] args)
Run the FileFormatter |
static java.lang.String[] |
readerToDocument(org.apache.lucene.analysis.Analyzer analyzer,
java.io.Reader reader)
Convert a Reader to a vector |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static void collapse(java.lang.String label, org.apache.lucene.analysis.Analyzer analyzer, java.io.File inputDir, java.nio.charset.Charset charset, java.io.File outputFile) throws java.io.IOException
label
- The labelanalyzer
- The analyzer to useinputDir
- The input Directorycharset
- The charset of the input filesoutputFile
- The file to collapse to
java.io.IOException
public static void format(java.lang.String label, org.apache.lucene.analysis.Analyzer analyzer, java.io.File input, java.nio.charset.Charset charset, java.io.File outDir) throws java.io.IOException
label
- The label of the fileanalyzer
- The analyzer to useinput
- The input file or directory. May not be nullcharset
- The Character set of the input filesoutDir
- The output directory. Files will be written there with the same name as the input file
java.io.IOException
public static java.lang.String[] readerToDocument(org.apache.lucene.analysis.Analyzer analyzer, java.io.Reader reader) throws java.io.IOException
analyzer
- The Analyzer to usereader
- The reader to feed to the Analyzer
java.io.IOException
public static void main(java.lang.String[] args) throws java.lang.Exception
args
- The input args. Run with -h to see the help
java.lang.ClassNotFoundException
- if the Analyzer can't be found
java.lang.IllegalAccessException
- if the Analyzer can't be constructed
java.lang.InstantiationException
- if the Analyzer can't be constructed
java.io.IOException
- if the files can't be dealt with properly
java.lang.Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |