org.apache.mahout.classifier.bayes
Class XmlInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
      extended by org.apache.hadoop.mapred.TextInputFormat
          extended by org.apache.mahout.classifier.bayes.XmlInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>, org.apache.hadoop.mapred.JobConfigurable

public class XmlInputFormat
extends org.apache.hadoop.mapred.TextInputFormat

Reads records that are delimited by a specifc begin/end tag.


Nested Class Summary
static class XmlInputFormat.XmlRecordReader
          XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag
 
Field Summary
static java.lang.String END_TAG_KEY
           
static java.lang.String START_TAG_KEY
           
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
XmlInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit, org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.TextInputFormat
configure, isSplitable
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

START_TAG_KEY

public static final java.lang.String START_TAG_KEY
See Also:
Constant Field Values

END_TAG_KEY

public static final java.lang.String END_TAG_KEY
See Also:
Constant Field Values
Constructor Detail

XmlInputFormat

public XmlInputFormat()
Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
                                                                                                                          org.apache.hadoop.mapred.JobConf jobConf,
                                                                                                                          org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                   throws java.io.IOException
Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Overrides:
getRecordReader in class org.apache.hadoop.mapred.TextInputFormat
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.