|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
SplitBayesInput.SplitCallback | Used to pass information back to a caller once a file has been split without the need for a data object |
Class Summary | |
---|---|
PrepareTwentyNewsgroups | Prepare the 20 Newsgroups files for training using the
BayesFileFormatter . |
SplitBayesInput | A utility for splitting files in the input format used by the Bayes classifiers into training and test sets in order to perform cross-validation. |
WikipediaDatasetCreatorDriver | Create and run the Wikipedia Dataset Creator. |
WikipediaDatasetCreatorMapper | Maps over Wikipedia xml format and output all document having the category listed in the input category file |
WikipediaDatasetCreatorOutputFormat | This class extends the MultipleOutputFormat, allowing to write the output data to different output files in sequence file output format. |
WikipediaDatasetCreatorReducer | Can also be used as a local Combiner |
WikipediaXmlSplitter | Splits the wikipedia xml file in to chunks of size as specified by command line parameter |
XmlInputFormat | Reads records that are delimited by a specific begin/end tag. |
XmlInputFormat.XmlRecordReader | XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag |
PrepareTwentyNewsgroups
for details on running the trainer. See
PrepareTwentyNewsgroups
for details on formatting the Twenty Newsgroups data
properly for the training.
ant extract-20news-18828This runs the arg line:
-p ${working.dir}/20news-18828/ -o ${working.dir}/20news-18828-collapse -a ${analyzer} -c UTF-8
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |