org.apache.accumulo.examples.wikisearch.ingest
Class WikipediaMapper

java.lang.Object
  extended by org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>
      extended by org.apache.accumulo.examples.wikisearch.ingest.WikipediaMapper

public class WikipediaMapper
extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context
 
Field Summary
static String DOCUMENT_COLUMN_FAMILY
           
static String METADATA_EVENT_COLUMN_FAMILY
           
static String METADATA_INDEX_COLUMN_FAMILY
           
static String TOKENS_FIELD_NAME
           
static Charset UTF8
           
 
Constructor Summary
WikipediaMapper()
           
 
Method Summary
static int getPartitionId(ArticleExtractor.Article article, int numPartitions)
          We will partition the documents based on the document id
protected  void map(org.apache.hadoop.io.LongWritable key, org.apache.hadoop.io.Text value, org.apache.hadoop.mapreduce.Mapper.Context context)
           
 void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.Mapper
cleanup, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UTF8

public static final Charset UTF8

DOCUMENT_COLUMN_FAMILY

public static final String DOCUMENT_COLUMN_FAMILY
See Also:
Constant Field Values

METADATA_EVENT_COLUMN_FAMILY

public static final String METADATA_EVENT_COLUMN_FAMILY
See Also:
Constant Field Values

METADATA_INDEX_COLUMN_FAMILY

public static final String METADATA_INDEX_COLUMN_FAMILY
See Also:
Constant Field Values

TOKENS_FIELD_NAME

public static final String TOKENS_FIELD_NAME
See Also:
Constant Field Values
Constructor Detail

WikipediaMapper

public WikipediaMapper()
Method Detail

setup

public void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
Overrides:
setup in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>

getPartitionId

public static int getPartitionId(ArticleExtractor.Article article,
                                 int numPartitions)
                          throws IllegalFormatException
We will partition the documents based on the document id

Parameters:
article -
numPartitions -
Returns:
The number of the partition for a given article.
Throws:
IllegalFormatException

map

protected void map(org.apache.hadoop.io.LongWritable key,
                   org.apache.hadoop.io.Text value,
                   org.apache.hadoop.mapreduce.Mapper.Context context)
            throws IOException,
                   InterruptedException
Overrides:
map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,Mutation>
Throws:
IOException
InterruptedException


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.