org.apache.hadoop.hbase.regionserver
Class StoreFile

java.lang.Object
  extended by org.apache.hadoop.hbase.regionserver.metrics.SchemaConfigured
      extended by org.apache.hadoop.hbase.regionserver.StoreFile
All Implemented Interfaces:
HeapSize, SchemaMetrics.SchemaAware

public class StoreFile
extends SchemaConfigured

A Store data file. Stores usually have one or more of these files. They are produced by flushing the memstore to disk. To create, instantiate a writer using StoreFile#WriterBuilder and append data. Be sure to add any metadata before calling close on the Writer (Use the appendMetadata convenience methods). On close, a StoreFile is sitting in the Filesystem. To refer to it, create a StoreFile instance passing filesystem and path. To read, call createReader().

StoreFiles may also reference store files in another Store. The reason for this weird pattern where you use a different instance for the writer and a reader is that we write once but read a lot more.


Nested Class Summary
static class StoreFile.BloomType
           
static class StoreFile.Reader
          Reader for a StoreFile.
static class StoreFile.Writer
          A StoreFile writer.
static class StoreFile.WriterBuilder
           
 
Field Summary
static byte[] BLOOM_FILTER_TYPE_KEY
          Bloom filter Type in FileInfo
static byte[] BULKLOAD_TASK_KEY
          Meta key set when store file is a result of a bulk load
static byte[] BULKLOAD_TIME_KEY
           
static int DEFAULT_BLOCKSIZE_SMALL
           
static byte[] DELETE_FAMILY_COUNT
          Delete Family Count in FileInfo
static byte[] EARLIEST_PUT_TS
          Key for timestamp of earliest-put in metadata
static byte[] EXCLUDE_FROM_MINOR_COMPACTION_KEY
          Major compaction flag in FileInfo
static String HFILE_NAME_REGEX
          A non-capture group, for hfiles, so that this can be embedded.
static byte[] MAJOR_COMPACTION_KEY
          Major compaction flag in FileInfo
static byte[] MAX_SEQ_ID_KEY
          Max Sequence ID in FileInfo
static byte[] TIMERANGE_KEY
          Key for Timerange information in metadata
 
Fields inherited from class org.apache.hadoop.hbase.regionserver.metrics.SchemaConfigured
SCHEMA_CONFIGURED_UNALIGNED_HEAP_SIZE
 
Constructor Summary
StoreFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf, StoreFile.BloomType cfBloomType, HFileDataBlockEncoder dataBlockEncoder)
          Constructor, loads a reader and it's indices, etc.
 
Method Summary
 void closeReader(boolean evictOnClose)
           
 StoreFile.Reader createReader()
           
 void deleteReader()
          Delete this file
 long getBulkLoadTimestamp()
          Return the timestamp at which this bulk load file was generated.
 HDFSBlocksDistribution getHDFSBlockDistribution()
           
 long getMaxMemstoreTS()
           
static long getMaxMemstoreTSInList(Collection<StoreFile> sfs)
          Return the largest memstoreTS found across all storefiles in the given list.
 long getMaxSequenceId()
           
static long getMaxSequenceIdInList(Collection<StoreFile> sfs)
          Return the highest sequence ID found across all storefiles in the given list.
 long getModificationTimeStamp()
           
 org.apache.hadoop.fs.Path getPath()
           
 StoreFile.Reader getReader()
           
static org.apache.hadoop.fs.Path getReferredToFile(org.apache.hadoop.fs.Path p)
           
static org.apache.hadoop.fs.Path getUniqueFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dir)
           
static boolean isReference(org.apache.hadoop.fs.Path p)
           
static boolean isReference(String name)
           
static org.apache.hadoop.fs.Path rename(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path tgt)
          Utility to help with rename.
 void setMaxMemstoreTS(long maxMemstoreTS)
           
 String toString()
           
 String toStringDetailed()
           
static boolean validateStoreFileName(String fileName)
          Validate the store file name.
 
Methods inherited from class org.apache.hadoop.hbase.regionserver.metrics.SchemaConfigured
createUnknown, getColumnFamilyName, getSchemaMetrics, getTableName, heapSize, isSchemaConfigured, passSchemaMetricsTo, resetSchemaMetricsConf, schemaConfAsJSON, schemaConfigurationChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MAX_SEQ_ID_KEY

public static final byte[] MAX_SEQ_ID_KEY
Max Sequence ID in FileInfo


MAJOR_COMPACTION_KEY

public static final byte[] MAJOR_COMPACTION_KEY
Major compaction flag in FileInfo


EXCLUDE_FROM_MINOR_COMPACTION_KEY

public static final byte[] EXCLUDE_FROM_MINOR_COMPACTION_KEY
Major compaction flag in FileInfo


BLOOM_FILTER_TYPE_KEY

public static final byte[] BLOOM_FILTER_TYPE_KEY
Bloom filter Type in FileInfo


DELETE_FAMILY_COUNT

public static final byte[] DELETE_FAMILY_COUNT
Delete Family Count in FileInfo


TIMERANGE_KEY

public static final byte[] TIMERANGE_KEY
Key for Timerange information in metadata


EARLIEST_PUT_TS

public static final byte[] EARLIEST_PUT_TS
Key for timestamp of earliest-put in metadata


DEFAULT_BLOCKSIZE_SMALL

public static final int DEFAULT_BLOCKSIZE_SMALL
See Also:
Constant Field Values

BULKLOAD_TASK_KEY

public static final byte[] BULKLOAD_TASK_KEY
Meta key set when store file is a result of a bulk load


BULKLOAD_TIME_KEY

public static final byte[] BULKLOAD_TIME_KEY

HFILE_NAME_REGEX

public static final String HFILE_NAME_REGEX
A non-capture group, for hfiles, so that this can be embedded. HFiles are uuid ([0-9a-z]+). Bulk loaded hfiles has (_SeqId_[0-9]+_) has suffix.

See Also:
Constant Field Values
Constructor Detail

StoreFile

public StoreFile(org.apache.hadoop.fs.FileSystem fs,
                 org.apache.hadoop.fs.Path p,
                 org.apache.hadoop.conf.Configuration conf,
                 CacheConfig cacheConf,
                 StoreFile.BloomType cfBloomType,
                 HFileDataBlockEncoder dataBlockEncoder)
          throws IOException
Constructor, loads a reader and it's indices, etc. May allocate a substantial amount of ram depending on the underlying files (10-20MB?).

Parameters:
fs - The current file system to use.
p - The path of the file.
blockcache - true if the block cache is enabled.
conf - The current configuration.
cacheConf - The cache configuration and block cache reference.
cfBloomType - The bloom type to use for this store file as specified by column family configuration. This may or may not be the same as the Bloom filter type actually present in the HFile, because column family configuration might change. If this is StoreFile.BloomType.NONE, the existing Bloom filter is ignored.
dataBlockEncoder - data block encoding algorithm.
Throws:
IOException - When opening the reader fails.
Method Detail

getMaxMemstoreTS

public long getMaxMemstoreTS()

setMaxMemstoreTS

public void setMaxMemstoreTS(long maxMemstoreTS)

getPath

public org.apache.hadoop.fs.Path getPath()
Returns:
Path or null if this StoreFile was made with a Stream.

isReference

public static boolean isReference(org.apache.hadoop.fs.Path p)
Parameters:
p - Path to check.
Returns:
True if the path has format of a HStoreFile reference.

isReference

public static boolean isReference(String name)
Parameters:
name - file name to check.
Returns:
True if the path has format of a HStoreFile reference.

getReferredToFile

public static org.apache.hadoop.fs.Path getReferredToFile(org.apache.hadoop.fs.Path p)

getMaxSequenceId

public long getMaxSequenceId()
Returns:
This files maximum edit sequence id.

getModificationTimeStamp

public long getModificationTimeStamp()

getMaxMemstoreTSInList

public static long getMaxMemstoreTSInList(Collection<StoreFile> sfs)
Return the largest memstoreTS found across all storefiles in the given list. Store files that were created by a mapreduce bulk load are ignored, as they do not correspond to any specific put operation, and thus do not have a memstoreTS associated with them.

Returns:
0 if no non-bulk-load files are provided or, this is Store that does not yet have any store files.

getMaxSequenceIdInList

public static long getMaxSequenceIdInList(Collection<StoreFile> sfs)
Return the highest sequence ID found across all storefiles in the given list. Store files that were created by a mapreduce bulk load are ignored, as they do not correspond to any edit log items.

Returns:
0 if no non-bulk-load files are provided or, this is Store that does not yet have any store files.

getBulkLoadTimestamp

public long getBulkLoadTimestamp()
Return the timestamp at which this bulk load file was generated.


getHDFSBlockDistribution

public HDFSBlocksDistribution getHDFSBlockDistribution()
Returns:
the cached value of HDFS blocks distribution. The cached value is calculated when store file is opened.

createReader

public StoreFile.Reader createReader()
                              throws IOException
Returns:
Reader for StoreFile. creates if necessary
Throws:
IOException

getReader

public StoreFile.Reader getReader()
Returns:
Current reader. Must call createReader first else returns null.
See Also:
createReader()

closeReader

public void closeReader(boolean evictOnClose)
                 throws IOException
Parameters:
evictOnClose - whether to evict blocks belonging to this file
Throws:
IOException

deleteReader

public void deleteReader()
                  throws IOException
Delete this file

Throws:
IOException

toString

public String toString()
Overrides:
toString in class Object

toStringDetailed

public String toStringDetailed()
Returns:
a length description of this StoreFile, suitable for debug output

rename

public static org.apache.hadoop.fs.Path rename(org.apache.hadoop.fs.FileSystem fs,
                                               org.apache.hadoop.fs.Path src,
                                               org.apache.hadoop.fs.Path tgt)
                                        throws IOException
Utility to help with rename.

Parameters:
fs -
src -
tgt -
Returns:
True if succeeded.
Throws:
IOException

getUniqueFile

public static org.apache.hadoop.fs.Path getUniqueFile(org.apache.hadoop.fs.FileSystem fs,
                                                      org.apache.hadoop.fs.Path dir)
                                               throws IOException
Parameters:
fs -
dir - Directory to create file in.
Returns:
random filename inside passed dir
Throws:
IOException

validateStoreFileName

public static boolean validateStoreFileName(String fileName)
Validate the store file name.

Parameters:
fileName - name of the file to validate
Returns:
true if the file could be a valid store file, false otherwise


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.