org.apache.hadoop.hbase.io.hfile
Class HFile

java.lang.Object
  extended by org.apache.hadoop.hbase.io.hfile.HFile

@InterfaceAudience.Private
public class HFile
extends Object

File format for hbase. A file of sorted key/value pairs. Both keys and values are byte arrays.

The memory footprint of a HFile includes the following (below is taken from the TFile documentation but applies also to HFile):

Suggestions on performance optimization. For more on the background behind HFile, see HBASE-61.

File is made of data blocks followed by meta data blocks (if any), a fileinfo block, data block index, meta data block index, and a fixed size trailer which records the offsets at which file changes content type.

<data blocks><meta blocks><fileinfo><data index><meta index><trailer>
Each block has a bit of magic at its start. Block are comprised of key/values. In data blocks, they are both byte arrays. Metadata blocks are a String key and a byte array value. An empty file looks like this:
<fileinfo><trailer>
. That is, there are not data nor meta blocks present.

TODO: Do scanners need to be able to take a start and end row? TODO: Should BlockIndex know the name of its file? Should it have a Path that points at its file say for the case where an index lives apart from an HFile instance?


Nested Class Summary
static interface HFile.CachingBlockReader
          An abstraction used by the block index
static class HFile.FileInfo
          Metadata for this file.
static interface HFile.Reader
          An interface used by clients to open and iterate an HFile.
static interface HFile.Writer
          API required to write an HFile
static class HFile.WriterFactory
          This variety of ways to construct writers is used throughout the code, and we want to be able to swap writer implementations.
 
Field Summary
static String BLOOM_FILTER_DATA_KEY
          Meta data block name for bloom filter bits.
static AtomicLong dataBlockReadCnt
           
static int DEFAULT_BYTES_PER_CHECKSUM
          The number of bytes per checksum.
static ChecksumType DEFAULT_CHECKSUM_TYPE
           
static String DEFAULT_COMPRESSION
          Default compression name: none.
static Compression.Algorithm DEFAULT_COMPRESSION_ALGORITHM
          Default compression: none.
static String FORMAT_VERSION_KEY
          The configuration key for HFile version to use for new files
static int MAX_FORMAT_VERSION
          Maximum supported HFile format version
static int MAXIMUM_KEY_LENGTH
          Maximum length of key in HFile.
static int MIN_FORMAT_VERSION
          Minimum supported HFile format version
static int MIN_FORMAT_VERSION_WITH_TAGS
          Minimum HFile format version with support for persisting cell tags
static int MIN_NUM_HFILE_PATH_LEVELS
          We assume that HFile path ends with ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this many levels of nesting.
 
Constructor Summary
HFile()
           
 
Method Summary
static void checkFormatVersion(int version)
          Checks the given HFile format version, and throws an exception if invalid.
static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf)
           
static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, FSDataInputStreamWrapper fsdis, long size, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf)
           
static long getChecksumFailuresCount()
          Number of checksum verification failures.
static int getFormatVersion(org.apache.hadoop.conf.Configuration conf)
           
static String[] getSupportedCompressionAlgorithms()
          Get names of supported compression algorithms.
static HFile.WriterFactory getWriterFactory(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf)
          Returns the factory to be used to create HFile writers
static HFile.WriterFactory getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
          Returns the factory to be used to create HFile writers.
static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus fileStatus)
          Returns true if the specified file has a valid HFile Trailer.
static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Returns true if the specified file has a valid HFile Trailer.
static boolean isReservedFileInfoKey(byte[] key)
          Return true if the given file info key is reserved for internal use.
static void main(String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAXIMUM_KEY_LENGTH

public static final int MAXIMUM_KEY_LENGTH
Maximum length of key in HFile.

See Also:
Constant Field Values

DEFAULT_COMPRESSION_ALGORITHM

public static final Compression.Algorithm DEFAULT_COMPRESSION_ALGORITHM
Default compression: none.


MIN_FORMAT_VERSION

public static final int MIN_FORMAT_VERSION
Minimum supported HFile format version

See Also:
Constant Field Values

MAX_FORMAT_VERSION

public static final int MAX_FORMAT_VERSION
Maximum supported HFile format version

See Also:
Constant Field Values

MIN_FORMAT_VERSION_WITH_TAGS

public static final int MIN_FORMAT_VERSION_WITH_TAGS
Minimum HFile format version with support for persisting cell tags

See Also:
Constant Field Values

DEFAULT_COMPRESSION

public static final String DEFAULT_COMPRESSION
Default compression name: none.


BLOOM_FILTER_DATA_KEY

public static final String BLOOM_FILTER_DATA_KEY
Meta data block name for bloom filter bits.

See Also:
Constant Field Values

MIN_NUM_HFILE_PATH_LEVELS

public static final int MIN_NUM_HFILE_PATH_LEVELS
We assume that HFile path ends with ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this many levels of nesting. This is needed for identifying table and CF name from an HFile path.

See Also:
Constant Field Values

DEFAULT_BYTES_PER_CHECKSUM

public static final int DEFAULT_BYTES_PER_CHECKSUM
The number of bytes per checksum.

See Also:
Constant Field Values

DEFAULT_CHECKSUM_TYPE

public static final ChecksumType DEFAULT_CHECKSUM_TYPE

dataBlockReadCnt

public static final AtomicLong dataBlockReadCnt

FORMAT_VERSION_KEY

public static final String FORMAT_VERSION_KEY
The configuration key for HFile version to use for new files

See Also:
Constant Field Values
Constructor Detail

HFile

public HFile()
Method Detail

getChecksumFailuresCount

public static final long getChecksumFailuresCount()
Number of checksum verification failures. It also clears the counter.


getFormatVersion

public static int getFormatVersion(org.apache.hadoop.conf.Configuration conf)

getWriterFactoryNoCache

public static final HFile.WriterFactory getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
Returns the factory to be used to create HFile writers. Disables block cache access for all writers created through the returned factory.


getWriterFactory

public static final HFile.WriterFactory getWriterFactory(org.apache.hadoop.conf.Configuration conf,
                                                         CacheConfig cacheConf)
Returns the factory to be used to create HFile writers


createReader

public static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs,
                                        org.apache.hadoop.fs.Path path,
                                        FSDataInputStreamWrapper fsdis,
                                        long size,
                                        CacheConfig cacheConf,
                                        org.apache.hadoop.conf.Configuration conf)
                                 throws IOException
Parameters:
fs - A file system
path - Path to HFile
fsdis - a stream of path's file
size - max size of the trailer.
cacheConf - Cache configuration for hfile's contents
conf - Configuration
Returns:
A version specific Hfile Reader
Throws:
IOException - If file is invalid, will throw CorruptHFileException flavored IOException

createReader

public static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs,
                                        org.apache.hadoop.fs.Path path,
                                        CacheConfig cacheConf,
                                        org.apache.hadoop.conf.Configuration conf)
                                 throws IOException
Parameters:
fs - filesystem
path - Path to file to read
cacheConf - This must not be null. @see CacheConfig.CacheConfig(Configuration)
Returns:
an active Reader instance
Throws:
IOException - Will throw a CorruptHFileException (DoNotRetryIOException subtype) if hfile is corrupt/invalid.

isHFileFormat

public static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
                                    org.apache.hadoop.fs.Path path)
                             throws IOException
Returns true if the specified file has a valid HFile Trailer.

Parameters:
fs - filesystem
path - Path to file to verify
Returns:
true if the file has a valid HFile Trailer, otherwise false
Throws:
IOException - if failed to read from the underlying stream

isHFileFormat

public static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
                                    org.apache.hadoop.fs.FileStatus fileStatus)
                             throws IOException
Returns true if the specified file has a valid HFile Trailer.

Parameters:
fs - filesystem
fileStatus - the file to verify
Returns:
true if the file has a valid HFile Trailer, otherwise false
Throws:
IOException - if failed to read from the underlying stream

isReservedFileInfoKey

public static boolean isReservedFileInfoKey(byte[] key)
Return true if the given file info key is reserved for internal use.


getSupportedCompressionAlgorithms

public static String[] getSupportedCompressionAlgorithms()
Get names of supported compression algorithms. The names are acceptable by HFile.Writer.

Returns:
Array of strings, each represents a supported compression algorithm. Currently, the following compression algorithms are supported.
  • "none" - No compression.
  • "gz" - GZIP compression.

checkFormatVersion

public static void checkFormatVersion(int version)
                               throws IllegalArgumentException
Checks the given HFile format version, and throws an exception if invalid. Note that if the version number comes from an input file and has not been verified, the caller needs to re-throw an IOException to indicate that this is not a software error, but corrupted input.

Parameters:
version - an HFile version
Throws:
IllegalArgumentException - if the version is invalid

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2015 The Apache Software Foundation. All rights reserved.