Package org.apache.lucene.codecs

Codecs API: API for customization of the encoding and structure of the index.

See:
          Description

Class Summary
BlockTermsReader Handles a terms dict, but decouples all details of doc/freqs/positions reading to an instance of PostingsReaderBase.
BlockTermState Holds all state required for PostingsReaderBase to produce a DocsEnum without re-seeking the terms dict.
BlockTermsWriter Writes terms dict, block-encoding (column stride) each term's metadata for each set of terms between two index terms.
BlockTreeTermsReader A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes.
BlockTreeTermsReader.Stats BlockTree statistics for a single field returned by BlockTreeTermsReader.FieldReader.computeStats().
BlockTreeTermsWriter block-based terms index and dictionary writer.
Codec Encodes/decodes an inverted index segment.
CodecUtil Utility class for reading and writing versioned headers.
DocValuesArraySource DocValues DocValues.Source implementation backed by simple arrays.
DocValuesConsumer Abstract API that consumes IndexableFields.
DocValuesFormat Encodes/decodes DocValues
FieldInfosFormat Encodes/decodes FieldInfos
FieldInfosReader Codec API for reading FieldInfos.
FieldInfosWriter Codec API for writing FieldInfos.
FieldsConsumer Abstract API that consumes terms, doc, freq, prox, offset and payloads postings.
FieldsProducer Abstract API that produces terms, doc, freq, prox and payloads postings.
FixedGapTermsIndexReader TermsIndexReader for simple every-nth terms indexes.
FixedGapTermsIndexWriter Selects every Nth term as and index term, and hold term bytes (mostly) fully expanded in memory.
LiveDocsFormat Format for live/deleted documents
MappingMultiDocsAndPositionsEnum Exposes flex API, merged from flex API of sub-segments, remapping docIDs (this is used for segment merging).
MappingMultiDocsEnum Exposes flex API, merged from flex API of sub-segments, remapping docIDs (this is used for segment merging).
MultiLevelSkipListReader This abstract class reads skip lists with multiple levels.
MultiLevelSkipListWriter This abstract class writes skip lists with multiple levels.
NormsFormat format for normalization factors
PerDocConsumer Abstract API that consumes per document values.
PerDocProducer Abstract API that provides access to one or more per-document storage features.
PerDocProducerBase Abstract base class for PerDocProducer implementations
PostingsBaseFormat Provides a PostingsReaderBase and PostingsWriterBase.
PostingsConsumer Abstract API that consumes postings for an individual term.
PostingsFormat Encodes/decodes terms, postings, and proximity data.
PostingsReaderBase The core terms dictionaries (BlockTermsReader, BlockTreeTermsReader) interact with a single instance of this class to manage creation of DocsEnum and DocsAndPositionsEnum instances.
PostingsWriterBase Extension of PostingsConsumer to support pluggable term dictionaries.
SegmentInfoFormat Expert: Controls the format of the SegmentInfo (segment metadata file).
SegmentInfoReader Specifies an API for classes that can read SegmentInfo information.
SegmentInfoWriter Specifies an API for classes that can write out SegmentInfo data.
StoredFieldsFormat Controls the format of stored fields
StoredFieldsReader Codec API for reading stored fields: You need to implement StoredFieldsReader.visitDocument(int, StoredFieldVisitor) to read the stored fields for a document, implement StoredFieldsReader.clone() (creating clones of any IndexInputs used, etc), and Closeable.close()
StoredFieldsWriter Codec API for writing stored fields:
TermsConsumer Abstract API that consumes terms for an individual field.
TermsIndexReaderBase BlockTermsReader interacts with an instance of this class to manage its terms index.
TermsIndexReaderBase.FieldIndexEnum Similar to TermsEnum, except, the only "metadata" it reports for a given indexed term is the long fileOffset into the main terms dictionary file:
TermsIndexWriterBase Base class for terms index implementations to plug into BlockTermsWriter.
TermStats Holder for per-term statistics.
TermVectorsFormat Controls the format of term vectors
TermVectorsReader Codec API for reading term vectors:
TermVectorsWriter Codec API for writing term vectors:
VariableGapTermsIndexReader See VariableGapTermsIndexWriter
VariableGapTermsIndexWriter Selects index terms according to provided pluggable VariableGapTermsIndexWriter.IndexTermSelector, and stores them in a prefix trie that's loaded entirely in RAM stored as an FST.
VariableGapTermsIndexWriter.EveryNOrDocFreqTermSelector Sets an index term when docFreq >= docFreqThresh, or every interval terms.
VariableGapTermsIndexWriter.EveryNTermSelector Same policy as FixedGapTermsIndexWriter
VariableGapTermsIndexWriter.IndexTermSelector Hook for selecting which terms should be placed in the terms index.
 

Package org.apache.lucene.codecs Description

Codecs API: API for customization of the encoding and structure of the index.

The Codec API allows you to customise the way the following pieces of index information are stored:

Codecs are identified by name through the Java Service Provider Interface. To create your own codec, extend Codec and pass the new codec's name to the super() constructor:

public class MyCodec extends Codec {

    public MyCodec() {
        super("MyCodecName");
    }

    ...
}
You will need to register the Codec class so that the ServiceLoader can find it, by including a META-INF/services/org.apache.lucene.codecs.Codec file on your classpath that contains the package-qualified name of your codec.

If you just want to customise the PostingsFormat, or use different postings formats for different fields, then you can register your custom postings format in the same way (in META-INF/services/org.apache.lucene.codecs.PostingsFormat), and then extend the default Lucene40Codec and override Lucene40Codec.getPostingsFormatForField(String) to return your custom postings format.



Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.