org.apache.hadoop.hbase.regionserver
Class StoreFileScanner

java.lang.Object
  extended by org.apache.hadoop.hbase.regionserver.StoreFileScanner
All Implemented Interfaces:
KeyValueScanner

public class StoreFileScanner
extends Object
implements KeyValueScanner

KeyValueScanner adaptor over the Reader. It also provides hooks into bloom filter things.


Constructor Summary
StoreFileScanner(StoreFile.Reader reader, HFileScanner hfs, boolean useMVCC)
          Implements a KeyValueScanner on top of the specified HFileScanner
 
Method Summary
 void close()
          Close the KeyValue scanner.
 void enforceSeek()
          Does the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?).
static List<StoreFileScanner> getScannersForStoreFiles(Collection<StoreFile> files, boolean cacheBlocks, boolean usePread)
          Return an array of scanners corresponding to the given set of store files.
static List<StoreFileScanner> getScannersForStoreFiles(Collection<StoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction)
          Return an array of scanners corresponding to the given set of store files.
static List<StoreFileScanner> getScannersForStoreFiles(Collection<StoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, ScanQueryMatcher matcher)
          Return an array of scanners corresponding to the given set of store files, And set the ScanQueryMatcher for each store file scanner for further optimization
 long getSequenceID()
          Get the sequence id associated with this KeyValueScanner.
 boolean isFileScanner()
           
 KeyValue next()
          Return the next KeyValue in this scanner, iterating the scanner
 KeyValue peek()
          Look at the next KeyValue in this scanner, but do not iterate scanner.
 boolean realSeekDone()
          We optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap.
 boolean requestSeek(KeyValue kv, boolean forward, boolean useBloom)
          Pretend we have done a seek but don't do it yet, if possible.
 boolean reseek(KeyValue key)
          Reseek the scanner at or after the specified KeyValue.
 boolean seek(KeyValue key)
          Seek the scanner at or after the specified KeyValue.
static boolean seekAtOrAfter(HFileScanner s, KeyValue k)
           
 void setScanQueryMatcher(ScanQueryMatcher matcher)
           
 boolean shouldUseScanner(Scan scan, SortedSet<byte[]> columns, long oldestUnexpiredTS)
          Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.
protected  boolean skipKVsNewerThanReadpoint()
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

StoreFileScanner

public StoreFileScanner(StoreFile.Reader reader,
                        HFileScanner hfs,
                        boolean useMVCC)
Implements a KeyValueScanner on top of the specified HFileScanner

Parameters:
hfs - HFile scanner
Method Detail

getScannersForStoreFiles

public static List<StoreFileScanner> getScannersForStoreFiles(Collection<StoreFile> files,
                                                              boolean cacheBlocks,
                                                              boolean usePread)
                                                       throws IOException
Return an array of scanners corresponding to the given set of store files.

Throws:
IOException

getScannersForStoreFiles

public static List<StoreFileScanner> getScannersForStoreFiles(Collection<StoreFile> files,
                                                              boolean cacheBlocks,
                                                              boolean usePread,
                                                              boolean isCompaction)
                                                       throws IOException
Return an array of scanners corresponding to the given set of store files.

Throws:
IOException

getScannersForStoreFiles

public static List<StoreFileScanner> getScannersForStoreFiles(Collection<StoreFile> files,
                                                              boolean cacheBlocks,
                                                              boolean usePread,
                                                              boolean isCompaction,
                                                              ScanQueryMatcher matcher)
                                                       throws IOException
Return an array of scanners corresponding to the given set of store files, And set the ScanQueryMatcher for each store file scanner for further optimization

Throws:
IOException

toString

public String toString()
Overrides:
toString in class Object

peek

public KeyValue peek()
Description copied from interface: KeyValueScanner
Look at the next KeyValue in this scanner, but do not iterate scanner.

Specified by:
peek in interface KeyValueScanner
Returns:
the next KeyValue

next

public KeyValue next()
              throws IOException
Description copied from interface: KeyValueScanner
Return the next KeyValue in this scanner, iterating the scanner

Specified by:
next in interface KeyValueScanner
Returns:
the next KeyValue
Throws:
IOException

seek

public boolean seek(KeyValue key)
             throws IOException
Description copied from interface: KeyValueScanner
Seek the scanner at or after the specified KeyValue.

Specified by:
seek in interface KeyValueScanner
Parameters:
key - seek value
Returns:
true if scanner has values left, false if end of scanner
Throws:
IOException

reseek

public boolean reseek(KeyValue key)
               throws IOException
Description copied from interface: KeyValueScanner
Reseek the scanner at or after the specified KeyValue. This method is guaranteed to seek at or after the required key only if the key comes after the current position of the scanner. Should not be used to seek to a key which may come before the current position.

Specified by:
reseek in interface KeyValueScanner
Parameters:
key - seek value (should be non-null)
Returns:
true if scanner has values left, false if end of scanner
Throws:
IOException

skipKVsNewerThanReadpoint

protected boolean skipKVsNewerThanReadpoint()
                                     throws IOException
Throws:
IOException

close

public void close()
Description copied from interface: KeyValueScanner
Close the KeyValue scanner.

Specified by:
close in interface KeyValueScanner

seekAtOrAfter

public static boolean seekAtOrAfter(HFileScanner s,
                                    KeyValue k)
                             throws IOException
Parameters:
s -
k -
Returns:
Throws:
IOException

getSequenceID

public long getSequenceID()
Description copied from interface: KeyValueScanner
Get the sequence id associated with this KeyValueScanner. This is required for comparing multiple files to find out which one has the latest data. The default implementation for this would be to return 0. A file having lower sequence id will be considered to be the older one.

Specified by:
getSequenceID in interface KeyValueScanner

requestSeek

public boolean requestSeek(KeyValue kv,
                           boolean forward,
                           boolean useBloom)
                    throws IOException
Pretend we have done a seek but don't do it yet, if possible. The hope is that we find requested columns in more recent files and won't have to seek in older files. Creates a fake key/value with the given row/column and the highest (most recent) possible timestamp we might get from this file. When users of such "lazy scanner" need to know the next KV precisely (e.g. when this scanner is at the top of the heap), they run enforceSeek().

Note that this function does guarantee that the current KV of this scanner will be advanced to at least the given KV. Because of this, it does have to do a real seek in cases when the seek timestamp is older than the highest timestamp of the file, e.g. when we are trying to seek to the next row/column and use OLDEST_TIMESTAMP in the seek key.

Specified by:
requestSeek in interface KeyValueScanner
forward - do a forward-only "reseek" instead of a random-access seek
useBloom - whether to enable multi-column Bloom filter optimization
Throws:
IOException

realSeekDone

public boolean realSeekDone()
Description copied from interface: KeyValueScanner
We optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap. This method is then used to ensure the top store file scanner has done a seek operation.

Specified by:
realSeekDone in interface KeyValueScanner

enforceSeek

public void enforceSeek()
                 throws IOException
Description copied from interface: KeyValueScanner
Does the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?). Note that this function should be never called on scanners that always do real seek operations (i.e. most of the scanners). The easiest way to achieve this is to call KeyValueScanner.realSeekDone() first.

Specified by:
enforceSeek in interface KeyValueScanner
Throws:
IOException

setScanQueryMatcher

public void setScanQueryMatcher(ScanQueryMatcher matcher)

isFileScanner

public boolean isFileScanner()
Specified by:
isFileScanner in interface KeyValueScanner
Returns:
true if this is a file scanner. Otherwise a memory scanner is assumed.

shouldUseScanner

public boolean shouldUseScanner(Scan scan,
                                SortedSet<byte[]> columns,
                                long oldestUnexpiredTS)
Description copied from interface: KeyValueScanner
Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.

Specified by:
shouldUseScanner in interface KeyValueScanner
Parameters:
scan - the scan that we are selecting scanners for
columns - the set of columns in the current column family, or null if not specified by the scan
oldestUnexpiredTS - the oldest timestamp we are interested in for this query, based on TTL
Returns:
true if the scanner should be included in the query


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.