org.apache.hadoop.hbase.regionserver
Interface KeyValueScanner

All Known Implementing Classes:
CollectionBackedScanner, KeyValueHeap, MemStore.MemStoreScanner, NonLazyKeyValueScanner, StoreFileScanner, StoreScanner

public interface KeyValueScanner

Scanner that returns the next KeyValue.


Method Summary
 void close()
          Close the KeyValue scanner.
 void enforceSeek()
          Does the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?).
 long getSequenceID()
          Get the sequence id associated with this KeyValueScanner.
 boolean isFileScanner()
           
 KeyValue next()
          Return the next KeyValue in this scanner, iterating the scanner
 KeyValue peek()
          Look at the next KeyValue in this scanner, but do not iterate scanner.
 boolean realSeekDone()
          We optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap.
 boolean requestSeek(KeyValue kv, boolean forward, boolean useBloom)
          Similar to seek(org.apache.hadoop.hbase.KeyValue) (or reseek(org.apache.hadoop.hbase.KeyValue) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter.
 boolean reseek(KeyValue key)
          Reseek the scanner at or after the specified KeyValue.
 boolean seek(KeyValue key)
          Seek the scanner at or after the specified KeyValue.
 boolean shouldUseScanner(Scan scan, SortedSet<byte[]> columns, long oldestUnexpiredTS)
          Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.
 

Method Detail

peek

KeyValue peek()
Look at the next KeyValue in this scanner, but do not iterate scanner.

Returns:
the next KeyValue

next

KeyValue next()
              throws IOException
Return the next KeyValue in this scanner, iterating the scanner

Returns:
the next KeyValue
Throws:
IOException

seek

boolean seek(KeyValue key)
             throws IOException
Seek the scanner at or after the specified KeyValue.

Parameters:
key - seek value
Returns:
true if scanner has values left, false if end of scanner
Throws:
IOException

reseek

boolean reseek(KeyValue key)
               throws IOException
Reseek the scanner at or after the specified KeyValue. This method is guaranteed to seek at or after the required key only if the key comes after the current position of the scanner. Should not be used to seek to a key which may come before the current position.

Parameters:
key - seek value (should be non-null)
Returns:
true if scanner has values left, false if end of scanner
Throws:
IOException

getSequenceID

long getSequenceID()
Get the sequence id associated with this KeyValueScanner. This is required for comparing multiple files to find out which one has the latest data. The default implementation for this would be to return 0. A file having lower sequence id will be considered to be the older one.


close

void close()
Close the KeyValue scanner.


shouldUseScanner

boolean shouldUseScanner(Scan scan,
                         SortedSet<byte[]> columns,
                         long oldestUnexpiredTS)
Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.

Parameters:
scan - the scan that we are selecting scanners for
columns - the set of columns in the current column family, or null if not specified by the scan
oldestUnexpiredTS - the oldest timestamp we are interested in for this query, based on TTL
Returns:
true if the scanner should be included in the query

requestSeek

boolean requestSeek(KeyValue kv,
                    boolean forward,
                    boolean useBloom)
                    throws IOException
Similar to seek(org.apache.hadoop.hbase.KeyValue) (or reseek(org.apache.hadoop.hbase.KeyValue) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter. This function was added to avoid unnecessary disk seeks by checking row-column Bloom filters before a seek on multi-column get/scan queries, and to optimize by looking up more recent files first.

Parameters:
forward - do a forward-only "reseek" instead of a random-access seek
useBloom - whether to enable multi-column Bloom filter optimization
Throws:
IOException

realSeekDone

boolean realSeekDone()
We optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap. This method is then used to ensure the top store file scanner has done a seek operation.


enforceSeek

void enforceSeek()
                 throws IOException
Does the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?). Note that this function should be never called on scanners that always do real seek operations (i.e. most of the scanners). The easiest way to achieve this is to call realSeekDone() first.

Throws:
IOException

isFileScanner

boolean isFileScanner()
Returns:
true if this is a file scanner. Otherwise a memory scanner is assumed.


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.