org.apache.lucene.codecs.block
Class BlockPostingsFormat

java.lang.Object
  extended by org.apache.lucene.codecs.PostingsFormat
      extended by org.apache.lucene.codecs.block.BlockPostingsFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

public final class BlockPostingsFormat
extends PostingsFormat

Block postings format, which encodes postings in packed int blocks for faster decode.

NOTE: this format is still experimental and subject to change without backwards compatibility.

Basic idea:

Files and detailed format:

Term Dictionary

The .tim file format is quite similar to Lucene40PostingsFormat, with minor difference in MetadataBlock

Notes:

Term Index

The .tim file format is mentioned in Lucene40PostingsFormat:TermIndex

Frequencies and Skip Data

The .doc file contains the lists of documents which contain each term, along with the frequency of the term in that document (except when frequencies are omitted: FieldInfo.IndexOptions.DOCS_ONLY). It also saves skip data to the beginning of each packed or VInt block, when the length of document list is larger than packed block size.

Notes:

Positions

The .pos file contains the lists of positions that each term occurs at within documents. It also sometimes stores part of payloads and offsets for speedup.

Notes:

Payloads and Offsets

The .pay file will store payloads and offsets associated with certain term-document positions. Some payloads and offsets will be separated out into .pos file, for speedup reason.

Notes:

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static int BLOCK_SIZE
          Fixed packed block size, number of integers encoded in a single packed block.
static String DOC_EXTENSION
          Filename extension for document number, frequencies, and skip data.
static String PAY_EXTENSION
          Filename extension for payloads and offsets.
static String POS_EXTENSION
          Filename extension for positions.
 
Fields inherited from class org.apache.lucene.codecs.PostingsFormat
EMPTY
 
Constructor Summary
BlockPostingsFormat()
           
BlockPostingsFormat(int minTermBlockSize, int maxTermBlockSize)
           
 
Method Summary
 FieldsConsumer fieldsConsumer(SegmentWriteState state)
           
 FieldsProducer fieldsProducer(SegmentReadState state)
           
 String toString()
           
 
Methods inherited from class org.apache.lucene.codecs.PostingsFormat
availablePostingsFormats, forName, getName, reloadPostingsFormats
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DOC_EXTENSION

public static final String DOC_EXTENSION
Filename extension for document number, frequencies, and skip data. See chapter: Frequencies and Skip Data

See Also:
Constant Field Values

POS_EXTENSION

public static final String POS_EXTENSION
Filename extension for positions. See chapter: Positions

See Also:
Constant Field Values

PAY_EXTENSION

public static final String PAY_EXTENSION
Filename extension for payloads and offsets. See chapter: Payloads and Offsets

See Also:
Constant Field Values

BLOCK_SIZE

public static final int BLOCK_SIZE
Fixed packed block size, number of integers encoded in a single packed block.

See Also:
Constant Field Values
Constructor Detail

BlockPostingsFormat

public BlockPostingsFormat()

BlockPostingsFormat

public BlockPostingsFormat(int minTermBlockSize,
                           int maxTermBlockSize)
Method Detail

toString

public String toString()
Overrides:
toString in class PostingsFormat

fieldsConsumer

public FieldsConsumer fieldsConsumer(SegmentWriteState state)
                              throws IOException
Specified by:
fieldsConsumer in class PostingsFormat
Throws:
IOException

fieldsProducer

public FieldsProducer fieldsProducer(SegmentReadState state)
                              throws IOException
Specified by:
fieldsProducer in class PostingsFormat
Throws:
IOException


Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.