|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.accumulo.core.iterators.user.IntersectingIterator
org.apache.accumulo.core.iterators.user.IndexedDocIterator
public class IndexedDocIterator
This iterator facilitates document-partitioned indexing. It is an example of extending the IntersectingIterator to customize the placement of the term and docID. As with the IntersectingIterator, documents are grouped together and indexed into a single row of an Accumulo table. This allows a tablet server to perform boolean AND operations on terms in the index. This iterator also stores the document contents in a separate column family in the same row so that the full document can be returned with each query. The table structure should have the following form: row: shardID, colfam: docColf\0doctype, colqual: docID, value: doc row: shardID, colfam: indexColf, colqual: term\0doctype\0docID\0info, value: (empty) When you configure this iterator with a set of terms, it will return only the docIDs and docs that appear with all of the specified terms. The result will have the following form: row: shardID, colfam: indexColf, colqual: doctype\0docID\0info, value: doc This iterator is commonly used with BatchScanner or AccumuloInputFormat, to parallelize the search over all shardIDs.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.apache.accumulo.core.iterators.user.IntersectingIterator |
|---|
IntersectingIterator.TermSource |
| Field Summary | |
|---|---|
static org.apache.hadoop.io.Text |
DEFAULT_DOC_COLF
|
static org.apache.hadoop.io.Text |
DEFAULT_INDEX_COLF
|
SortedKeyValueIterator<Key,Value> |
docSource
|
| Fields inherited from class org.apache.accumulo.core.iterators.user.IntersectingIterator |
|---|
currentDocID, currentPartition, inclusive, log, nullText, seekColumnFamilies, topKey, value |
| Constructor Summary | |
|---|---|
IndexedDocIterator()
|
|
| Method Summary | |
|---|---|
protected void |
advanceToIntersection()
|
protected Key |
buildDocKey()
|
protected Key |
buildKey(org.apache.hadoop.io.Text partition,
org.apache.hadoop.io.Text term)
|
protected Key |
buildKey(org.apache.hadoop.io.Text partition,
org.apache.hadoop.io.Text term,
org.apache.hadoop.io.Text docID)
|
SortedKeyValueIterator<Key,Value> |
deepCopy(IteratorEnvironment env)
Creates a deep copy of this iterator as though seek had not yet been called. |
protected org.apache.hadoop.io.Text |
getDocID(Key key)
|
protected org.apache.hadoop.io.Text |
getTerm(Key key)
|
void |
init(SortedKeyValueIterator<Key,Value> source,
Map<String,String> options,
IteratorEnvironment env)
Initializes the iterator. |
static org.apache.hadoop.io.Text |
parseDocID(Key key)
|
void |
seek(Range range,
Collection<ByteSequence> seekColumnFamilies,
boolean inclusive)
Seeks to the first key in the Range, restricting the resulting K,V pairs to those with the specified columns. |
static void |
setColfs(IteratorSetting is,
String indexColf,
String docColfPrefix)
A convenience method for setting the index column family and document column family prefix. |
static void |
setDocColfPrefix(IteratorSetting is,
String docColfPrefix)
A convenience method for setting the document column family prefix. |
static void |
setIndexColf(IteratorSetting is,
String indexColf)
A convenience method for setting the index column family. |
| Methods inherited from class org.apache.accumulo.core.iterators.user.IntersectingIterator |
|---|
addSource, buildFollowingPartitionKey, decodeBooleans, decodeColumns, encodeBooleans, encodeColumns, getPartition, getTopKey, getTopValue, hasTop, next, setColumnFamilies, setColumnFamilies, stringTopKey |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final org.apache.hadoop.io.Text DEFAULT_INDEX_COLF
public static final org.apache.hadoop.io.Text DEFAULT_DOC_COLF
public SortedKeyValueIterator<Key,Value> docSource
| Constructor Detail |
|---|
public IndexedDocIterator()
| Method Detail |
|---|
protected Key buildKey(org.apache.hadoop.io.Text partition,
org.apache.hadoop.io.Text term,
org.apache.hadoop.io.Text docID)
buildKey in class IntersectingIterator
protected Key buildKey(org.apache.hadoop.io.Text partition,
org.apache.hadoop.io.Text term)
buildKey in class IntersectingIteratorprotected org.apache.hadoop.io.Text getDocID(Key key)
getDocID in class IntersectingIteratorpublic static org.apache.hadoop.io.Text parseDocID(Key key)
protected org.apache.hadoop.io.Text getTerm(Key key)
getTerm in class IntersectingIterator
public void init(SortedKeyValueIterator<Key,Value> source,
Map<String,String> options,
IteratorEnvironment env)
throws IOException
SortedKeyValueIterator
init in interface SortedKeyValueIterator<Key,Value>init in class IntersectingIteratorsource - SortedKeyValueIterator source to read data from.options - Map map of string option names to option values.env - IteratorEnvironment environment in which iterator is being run.
IOException - unused.public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env)
SortedKeyValueIterator
deepCopy in interface SortedKeyValueIterator<Key,Value>deepCopy in class IntersectingIteratorenv - IteratorEnvironment environment in which iterator is being run.
public void seek(Range range,
Collection<ByteSequence> seekColumnFamilies,
boolean inclusive)
throws IOException
SortedKeyValueIterator
seek in interface SortedKeyValueIterator<Key,Value>seek in class IntersectingIteratorrange - Range of keys to iterate over.seekColumnFamilies - Collection of column families to include or exclude.inclusive - boolean that indicates whether to include (true) or exclude (false) column families.
IOException - if an I/O error occurs.
protected void advanceToIntersection()
throws IOException
advanceToIntersection in class IntersectingIteratorIOExceptionprotected Key buildDocKey()
public static void setIndexColf(IteratorSetting is,
String indexColf)
is - IteratorSetting object to configure.indexColf - the index column family
public static void setDocColfPrefix(IteratorSetting is,
String docColfPrefix)
is - IteratorSetting object to configure.docColfPrefix - the prefix of the document column family (colf will be of the form docColfPrefix\0doctype)
public static void setColfs(IteratorSetting is,
String indexColf,
String docColfPrefix)
is - IteratorSetting object to configure.indexColf - the index column familydocColfPrefix - the prefix of the document column family (colf will be of the form docColfPrefix\0doctype)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||