org.apache.hadoop.hbase.regionserver
Class KeyValueHeap

java.lang.Object
  extended by org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner
      extended by org.apache.hadoop.hbase.regionserver.NonReversedNonLazyKeyValueScanner
          extended by org.apache.hadoop.hbase.regionserver.KeyValueHeap
All Implemented Interfaces:
Closeable, InternalScanner, KeyValueScanner
Direct Known Subclasses:
ReversedKeyValueHeap

@InterfaceAudience.Private
public class KeyValueHeap
extends NonReversedNonLazyKeyValueScanner
implements KeyValueScanner, InternalScanner

Implements a heap merge across any number of KeyValueScanners.

Implements KeyValueScanner itself.

This class is used at the Region level to merge across Stores and at the Store level to merge across the memstore and StoreFiles.

In the Region case, we also need InternalScanner.next(List), so this class also implements InternalScanner. WARNING: As is, if you try to use this as an InternalScanner at the Store level, you will get runtime exceptions.


Nested Class Summary
protected static class KeyValueHeap.KVScannerComparator
           
 
Field Summary
protected  KeyValueHeap.KVScannerComparator comparator
           
protected  KeyValueScanner current
          The current sub-scanner, i.e.
protected  PriorityQueue<KeyValueScanner> heap
           
 
Constructor Summary
KeyValueHeap(List<? extends KeyValueScanner> scanners, KeyValue.KVComparator comparator)
          Constructor.
 
Method Summary
 void close()
          Close the KeyValue scanner.
 PriorityQueue<KeyValueScanner> getHeap()
           
 byte[] getNextIndexedKey()
           
 long getSequenceID()
          Get the sequence id associated with this KeyValueScanner.
 KeyValue next()
          Return the next KeyValue in this scanner, iterating the scanner
 boolean next(List<Cell> result)
          Gets the next row of keys from the top-most scanner.
 boolean next(List<Cell> result, int limit)
          Gets the next row of keys from the top-most scanner.
 KeyValue peek()
          Look at the next KeyValue in this scanner, but do not iterate scanner.
protected  KeyValueScanner pollRealKV()
          Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it.
 boolean requestSeek(KeyValue key, boolean forward, boolean useBloom)
          Similar to KeyValueScanner.seek(org.apache.hadoop.hbase.KeyValue) (or KeyValueScanner.reseek(org.apache.hadoop.hbase.KeyValue) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter.
 boolean reseek(KeyValue seekKey)
          This function is identical to the seek(KeyValue) function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).
 boolean seek(KeyValue seekKey)
          Seeks all scanners at or below the specified seek key.
 
Methods inherited from class org.apache.hadoop.hbase.regionserver.NonReversedNonLazyKeyValueScanner
backwardSeek, seekToLastRow, seekToPreviousRow
 
Methods inherited from class org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner
doRealSeek, enforceSeek, isFileScanner, realSeekDone, shouldUseScanner
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
backwardSeek, enforceSeek, isFileScanner, realSeekDone, seekToLastRow, seekToPreviousRow, shouldUseScanner
 

Field Detail

heap

protected PriorityQueue<KeyValueScanner> heap

current

protected KeyValueScanner current
The current sub-scanner, i.e. the one that contains the next key/value to return to the client. This scanner is NOT included in heap (but we frequently add it back to the heap and pull the new winner out). We maintain an invariant that the current sub-scanner has already done a real seek, and that current.peek() is always a real key/value (or null) except for the fake last-key-on-row-column supplied by the multi-column Bloom filter optimization, which is OK to propagate to StoreScanner. In order to ensure that, always use pollRealKV() to update current.


comparator

protected KeyValueHeap.KVScannerComparator comparator
Constructor Detail

KeyValueHeap

public KeyValueHeap(List<? extends KeyValueScanner> scanners,
                    KeyValue.KVComparator comparator)
             throws IOException
Constructor. This KeyValueHeap will handle closing of passed in KeyValueScanners.

Parameters:
scanners -
comparator -
Throws:
IOException
Method Detail

peek

public KeyValue peek()
Description copied from interface: KeyValueScanner
Look at the next KeyValue in this scanner, but do not iterate scanner.

Specified by:
peek in interface KeyValueScanner
Returns:
the next KeyValue

next

public KeyValue next()
              throws IOException
Description copied from interface: KeyValueScanner
Return the next KeyValue in this scanner, iterating the scanner

Specified by:
next in interface KeyValueScanner
Returns:
the next KeyValue
Throws:
IOException

next

public boolean next(List<Cell> result,
                    int limit)
             throws IOException
Gets the next row of keys from the top-most scanner.

This method takes care of updating the heap.

This can ONLY be called when you are using Scanners that implement InternalScanner as well as KeyValueScanner (a StoreScanner).

Specified by:
next in interface InternalScanner
Parameters:
result -
limit -
Returns:
true if there are more keys, false if all scanners are done
Throws:
IOException - e

next

public boolean next(List<Cell> result)
             throws IOException
Gets the next row of keys from the top-most scanner.

This method takes care of updating the heap.

This can ONLY be called when you are using Scanners that implement InternalScanner as well as KeyValueScanner (a StoreScanner).

Specified by:
next in interface InternalScanner
Parameters:
result -
Returns:
true if there are more keys, false if all scanners are done
Throws:
IOException - e

close

public void close()
Description copied from interface: KeyValueScanner
Close the KeyValue scanner.

Specified by:
close in interface Closeable
Specified by:
close in interface InternalScanner
Specified by:
close in interface KeyValueScanner

seek

public boolean seek(KeyValue seekKey)
             throws IOException
Seeks all scanners at or below the specified seek key. If we earlied-out of a row, we may end up skipping values that were never reached yet. Rather than iterating down, we want to give the opportunity to re-seek.

As individual scanners may run past their ends, those scanners are automatically closed and removed from the heap.

This function (and reseek(KeyValue)) does not do multi-column Bloom filter and lazy-seek optimizations. To enable those, call requestSeek(KeyValue, boolean, boolean).

Specified by:
seek in interface KeyValueScanner
Parameters:
seekKey - KeyValue to seek at or after
Returns:
true if KeyValues exist at or after specified key, false if not
Throws:
IOException

reseek

public boolean reseek(KeyValue seekKey)
               throws IOException
This function is identical to the seek(KeyValue) function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).

Specified by:
reseek in interface KeyValueScanner
Parameters:
seekKey - seek value (should be non-null)
Returns:
true if scanner has values left, false if end of scanner
Throws:
IOException

requestSeek

public boolean requestSeek(KeyValue key,
                           boolean forward,
                           boolean useBloom)
                    throws IOException
Similar to KeyValueScanner.seek(org.apache.hadoop.hbase.KeyValue) (or KeyValueScanner.reseek(org.apache.hadoop.hbase.KeyValue) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter. This function was added to avoid unnecessary disk seeks by checking row-column Bloom filters before a seek on multi-column get/scan queries, and to optimize by looking up more recent files first.

Specified by:
requestSeek in interface KeyValueScanner
Overrides:
requestSeek in class NonLazyKeyValueScanner
forward - do a forward-only "reseek" instead of a random-access seek
useBloom - whether to enable multi-column Bloom filter optimization
Throws:
IOException

pollRealKV

protected KeyValueScanner pollRealKV()
                              throws IOException
Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it. Works by fetching the top sub-scanner, and if it has not done a real seek, making it do so (which will modify its top KV), putting it back, and repeating this until success. Relies on the fact that on a lazy seek we set the current key of a StoreFileScanner to a KV that is not greater than the real next KV to be read from that file, so the scanner that bubbles up to the top of the heap will have global next KV in this scanner heap if (1) it has done a real seek and (2) its KV is the top among all top KVs (some of which are fake) in the scanner heap.

Throws:
IOException

getHeap

public PriorityQueue<KeyValueScanner> getHeap()
Returns:
the current Heap

getSequenceID

public long getSequenceID()
Description copied from interface: KeyValueScanner
Get the sequence id associated with this KeyValueScanner. This is required for comparing multiple files to find out which one has the latest data. The default implementation for this would be to return 0. A file having lower sequence id will be considered to be the older one.

Specified by:
getSequenceID in interface KeyValueScanner

getNextIndexedKey

public byte[] getNextIndexedKey()
Specified by:
getNextIndexedKey in interface KeyValueScanner
Overrides:
getNextIndexedKey in class NonLazyKeyValueScanner
Returns:
the next key in the index (the key to seek to the next block) if known, or null otherwise


Copyright © 2015 The Apache Software Foundation. All rights reserved.