|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.lucene.index.DocTermOrds
public class DocTermOrds
This class enables fast access to multiple term ords for
a specified field across all docIDs.
Like FieldCache, it uninverts the index and holds a
packed data structure in RAM to enable fast access.
Unlike FieldCache, it can handle multi-valued fields,
and, it does not hold the term bytes in RAM. Rather, you
must obtain a TermsEnum from the getOrdTermsEnum(org.apache.lucene.index.AtomicReader)
method, and then seek-by-ord to get the term's bytes.
While normally term ords are type long, in this API they are
int as the internal representation here cannot address
more than MAX_INT unique terms. Also, typically this
class is used on fields with relatively few unique terms
vs the number of documents. In addition, there is an
internal limit (16 MB) on how many bytes each chunk of
documents may consume. If you trip this limit you'll hit
an IllegalStateException.
Deleted documents are skipped during uninversion, and if
you look them up you'll get 0 ords.
The returned per-document ords do not retain their
original order in the document. Instead they are returned
in sorted (by ord, ie term's BytesRef comparator) order. They
are also de-dup'd (ie if doc has same term more than once
in this field, you'll only get that ord back once).
This class tests whether the provided reader is able to
retrieve terms by ord (ie, it's single segment, and it
uses an ord-capable terms index). If not, this class
will create its own term index internally, allowing to
create a wrapped TermsEnum that can handle ord. The
getOrdTermsEnum(org.apache.lucene.index.AtomicReader)
method then provides this
wrapped enum, if necessary.
The RAM consumption of this class can be high!
Nested Class Summary | |
---|---|
class |
DocTermOrds.TermOrdsIterator
|
Field Summary | |
---|---|
static int |
DEFAULT_INDEX_INTERVAL_BITS
|
protected DocsEnum |
docsEnum
|
protected String |
field
|
protected int[] |
index
|
protected BytesRef[] |
indexedTermsArray
|
protected int |
maxTermDocFreq
|
protected int |
numTermsInField
|
protected int |
ordBase
|
protected int |
phase1_time
|
protected BytesRef |
prefix
|
protected long |
sizeOfIndexedStrings
|
protected long |
termInstances
|
protected byte[][] |
tnums
|
protected int |
total_time
|
Constructor Summary | |
---|---|
|
DocTermOrds(AtomicReader reader,
String field)
Inverts all terms |
|
DocTermOrds(AtomicReader reader,
String field,
BytesRef termPrefix)
Inverts only terms starting w/ prefix |
|
DocTermOrds(AtomicReader reader,
String field,
BytesRef termPrefix,
int maxTermDocFreq)
Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq |
|
DocTermOrds(AtomicReader reader,
String field,
BytesRef termPrefix,
int maxTermDocFreq,
int indexIntervalBits)
Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq, with a custom indexing interval (default is every 128nd term). |
protected |
DocTermOrds(String field,
int maxTermDocFreq,
int indexIntervalBits)
Subclass inits w/ this, but be sure you then call uninvert, only once |
Method Summary | |
---|---|
TermsEnum |
getOrdTermsEnum(AtomicReader reader)
Returns a TermsEnum that implements ord. |
boolean |
isEmpty()
|
DocTermOrds.TermOrdsIterator |
lookup(int doc,
DocTermOrds.TermOrdsIterator reuse)
Returns an iterator to step through the term ords for this document. |
BytesRef |
lookupTerm(TermsEnum termsEnum,
int ord)
|
int |
numTerms()
|
long |
ramUsedInBytes()
|
protected void |
setActualDocFreq(int termNum,
int df)
|
protected void |
uninvert(AtomicReader reader,
BytesRef termPrefix)
|
protected void |
visitTerm(TermsEnum te,
int termNum)
Subclass can override this |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_INDEX_INTERVAL_BITS
protected final int maxTermDocFreq
protected final String field
protected int numTermsInField
protected long termInstances
protected int total_time
protected int phase1_time
protected int[] index
protected byte[][] tnums
protected long sizeOfIndexedStrings
protected BytesRef[] indexedTermsArray
protected BytesRef prefix
protected int ordBase
protected DocsEnum docsEnum
Constructor Detail |
---|
public DocTermOrds(AtomicReader reader, String field) throws IOException
IOException
public DocTermOrds(AtomicReader reader, String field, BytesRef termPrefix) throws IOException
IOException
public DocTermOrds(AtomicReader reader, String field, BytesRef termPrefix, int maxTermDocFreq) throws IOException
IOException
public DocTermOrds(AtomicReader reader, String field, BytesRef termPrefix, int maxTermDocFreq, int indexIntervalBits) throws IOException
IOException
protected DocTermOrds(String field, int maxTermDocFreq, int indexIntervalBits)
Method Detail |
---|
public long ramUsedInBytes()
public TermsEnum getOrdTermsEnum(AtomicReader reader) throws IOException
NOTE: you must pass the same reader that was used when creating this class
IOException
public int numTerms()
public boolean isEmpty()
DocTermOrds
instance is empty.protected void visitTerm(TermsEnum te, int termNum) throws IOException
IOException
protected void setActualDocFreq(int termNum, int df) throws IOException
IOException
protected void uninvert(AtomicReader reader, BytesRef termPrefix) throws IOException
IOException
public DocTermOrds.TermOrdsIterator lookup(int doc, DocTermOrds.TermOrdsIterator reuse)
public BytesRef lookupTerm(TermsEnum termsEnum, int ord) throws IOException
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |