|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.lucene.codecs.StoredFieldsFormat
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat
org.apache.lucene.codecs.lucene41.Lucene41StoredFieldsFormat
public final class Lucene41StoredFieldsFormat
Lucene 4.1 stored fields format.
Principle
This StoredFieldsFormat
compresses blocks of 16KB of documents in
order to improve the compression ratio compared to document-level
compression. It uses the LZ4
compression algorithm, which is fast to compress and very fast to decompress
data. Although the compression method that is used focuses more on speed
than on compression ratio, it should provide interesting compression ratios
for redundant inputs (such as log files, HTML or plain text).
File formats
Stored fields are represented by two files:
A fields data file (extension .fdt). This file stores a compact representation of documents in compressed blocks of 16KB or more. When writing a segment, documents are appended to an in-memory byte[] buffer. When its size reaches 16KB or more, some metadata about the documents is flushed to disk, immediately followed by a compressed representation of the buffer using the LZ4 compression format.
Here is a more detailed description of the field data file format:
CodecHeader
PackedInts.VERSION_CURRENT
as a VInt
VInt
VInt
VLong
, whose 3 last bits are Type and other bits are FieldNumString
| BinaryValue | Int | Float | Long | Double depending on TypeNotes
A fields index file (extension .fdx). The data stored in this file is read to load an in-memory data-structure that can be used to locate the start offset of a block containing any document in the fields data file.
In order to have a compact in-memory representation, for every block of 1024 chunks, this stored fields index computes the average number of bytes per chunk and for every chunk, only stores the difference between
Data is written as follows:
CodecHeader
PackedInts.VERSION_CURRENT
as a VInt
VInt
, this marks the end of blocks since blocks are not allowed to start with 0VInt
which is the number of chunks encoded in the blockVInt
VInt
packed
array of BlockChunks elements of BitsPerDocBaseDelta bits each, representing the deltas from the average doc base using ZigZag encoding.VLong
VLong
packed
array of BlockChunks elements of BitsPerStartPointerDelta bits each, representing the deltas from the average start pointer using ZigZag encodingNotes
DocBase + AvgChunkDocs * n + DocBaseDeltas[n]
.StartPointerBase + AvgChunkSize * n + StartPointerDeltas[n]
.Known limitations
This StoredFieldsFormat
does not support individual documents
larger than (231 - 214) bytes. In case this
is a problem, you should use another format, such as
Lucene40StoredFieldsFormat
.
Constructor Summary | |
---|---|
Lucene41StoredFieldsFormat()
Sole constructor. |
Method Summary |
---|
Methods inherited from class org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat |
---|
fieldsReader, fieldsWriter, toString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public Lucene41StoredFieldsFormat()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |