org.apache.lucene.codecs
Class BlockTreeTermsReader

java.lang.Object
  extended by org.apache.lucene.index.Fields
      extended by org.apache.lucene.codecs.FieldsProducer
          extended by org.apache.lucene.codecs.BlockTreeTermsReader
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
AppendingTermsReader

public class BlockTreeTermsReader
extends FieldsProducer

A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that seekExact is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).

NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.

The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.

Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary. See BlockTreeTermsWriter.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 class BlockTreeTermsReader.FieldReader
           
static class BlockTreeTermsReader.Stats
          BlockTree statistics for a single field returned by BlockTreeTermsReader.FieldReader.computeStats().
 
Field Summary
protected  long dirOffset
           
protected  long indexDirOffset
           
 
Fields inherited from class org.apache.lucene.index.Fields
EMPTY_ARRAY
 
Constructor Summary
BlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, String segment, PostingsReaderBase postingsReader, IOContext ioContext, String segmentSuffix, int indexDivisor)
           
 
Method Summary
 void close()
           
 FieldsEnum iterator()
          Returns an iterator that will step through all fields names.
protected  void readHeader(IndexInput input)
           
protected  void readIndexHeader(IndexInput input)
           
protected  void seekDir(IndexInput input, long dirOffset)
           
 int size()
          Returns the number of terms for all fields, or -1 if this measure isn't stored by the codec.
 Terms terms(String field)
          Get the Terms for this field.
 
Methods inherited from class org.apache.lucene.index.Fields
getUniqueTermCount
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dirOffset

protected long dirOffset

indexDirOffset

protected long indexDirOffset
Constructor Detail

BlockTreeTermsReader

public BlockTreeTermsReader(Directory dir,
                            FieldInfos fieldInfos,
                            String segment,
                            PostingsReaderBase postingsReader,
                            IOContext ioContext,
                            String segmentSuffix,
                            int indexDivisor)
                     throws IOException
Throws:
IOException
Method Detail

readHeader

protected void readHeader(IndexInput input)
                   throws IOException
Throws:
IOException

readIndexHeader

protected void readIndexHeader(IndexInput input)
                        throws IOException
Throws:
IOException

seekDir

protected void seekDir(IndexInput input,
                       long dirOffset)
                throws IOException
Throws:
IOException

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Specified by:
close in class FieldsProducer
Throws:
IOException

iterator

public FieldsEnum iterator()
Description copied from class: Fields
Returns an iterator that will step through all fields names. This will not return null.

Specified by:
iterator in class Fields

terms

public Terms terms(String field)
            throws IOException
Description copied from class: Fields
Get the Terms for this field. This will return null if the field does not exist.

Specified by:
terms in class Fields
Throws:
IOException

size

public int size()
Description copied from class: Fields
Returns the number of terms for all fields, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.

Specified by:
size in class Fields


Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.