org.apache.accumulo.examples.wikisearch.ingest
Class WikipediaMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>
org.apache.accumulo.examples.wikisearch.ingest.WikipediaMapper
public class WikipediaMapper
- extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper |
org.apache.hadoop.mapreduce.Mapper.Context |
Method Summary |
static int |
getPartitionId(ArticleExtractor.Article article,
int numPartitions)
We will partition the documents based on the document id |
protected void |
map(org.apache.hadoop.io.LongWritable key,
org.apache.hadoop.io.Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
|
void |
setup(org.apache.hadoop.mapreduce.Mapper.Context context)
|
Methods inherited from class org.apache.hadoop.mapreduce.Mapper |
cleanup, run |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
UTF8
public static final Charset UTF8
DOCUMENT_COLUMN_FAMILY
public static final String DOCUMENT_COLUMN_FAMILY
- See Also:
- Constant Field Values
METADATA_EVENT_COLUMN_FAMILY
public static final String METADATA_EVENT_COLUMN_FAMILY
- See Also:
- Constant Field Values
METADATA_INDEX_COLUMN_FAMILY
public static final String METADATA_INDEX_COLUMN_FAMILY
- See Also:
- Constant Field Values
TOKENS_FIELD_NAME
public static final String TOKENS_FIELD_NAME
- See Also:
- Constant Field Values
WikipediaMapper
public WikipediaMapper()
setup
public void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
- Overrides:
setup
in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>
getPartitionId
public static int getPartitionId(ArticleExtractor.Article article,
int numPartitions)
throws IllegalFormatException
- We will partition the documents based on the document id
- Parameters:
article
- numPartitions
-
- Returns:
- The number of the partition for a given article.
- Throws:
IllegalFormatException
map
protected void map(org.apache.hadoop.io.LongWritable key,
org.apache.hadoop.io.Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
- Overrides:
map
in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>
- Throws:
IOException
InterruptedException
Copyright © 2012 The Apache Software Foundation. All Rights Reserved.