This section is all about Regions.
Regions are comprised of a Store per Column Family.
Region size is one of those tricky things, there are a few factors to consider:
Regions are the basic element of availability and distribution.
HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine you are a net loss there.
High region count has been known to make things slow, this is getting better, but it is probably better to have 700 regions than 3000 for the same amount of data.
Low region count prevents parallel scalability as per point #2. This really cant be stressed enough, since a common problem is loading 200MB data into HBase then wondering why your awesome 10 node cluster is mostly idle.
There is not much memory footprint difference between 1 region and 10 in terms of indexes, etc, held by the RegionServer.
Its probably best to stick to the default, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster), or go with a 1GB region size if your cell sizes tend to be largish (100k and up).
Splits run unaided on the RegionServer; i.e. the Master does not participate. The RegionServer splits a region, offlines the split region and then adds the daughter regions to META, opens daughters on the parent's hosting RegionServer and then reports the split to the Master. See Section 2.8.2.7, “Managed Splitting” for how to manually manage splits (and for why you might do this)
Regions can be periodically moved by the Section 8.4.3.1, “LoadBalancer”.
A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
The MemStore holds in-memory modifications to the Store. Modifications are KeyValues. When asked to flush, current memstore is moved to snapshot and is cleared. HBase continues to serve edits out of new memstore and backing snapshot until flusher reports in that the flush succeeded. At this point the snapshot is let go.
The hfile file format is based on the SSTable file described in the BigTable [2006] paper and on Hadoop's tfile (The unit test suite and the compression harness were taken directly from tfile). Schubert Zhang's blog post on HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs makes for a thorough introduction to HBase's hfile. Matteo Bertozzi has also put up a helpful description, HBase I/O: HFile.
For more information, see the HFile source code.
To view a textualized version of hfile content, you can do use
the org.apache.hadoop.hbase.io.hfile.HFile
tool. Type the following to see usage:
$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile
For
example, to view the content of the file
hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475
,
type the following:
$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f hdfs://10.81.47.41:8020/hbase/TEST/1418428042/DSMP/4759508618286845475
If
you leave off the option -v to see just a summary on the hfile. See
usage for other things to do with the HFile
tool.
For more information of what StoreFiles look like on HDFS with respect to the directory structure, see Section 11.5.2, “Browsing HDFS for HBase Objects”.
StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis.
For more information, see the HFileBlock source code.
The KeyValue class is the heart of data storage in HBase. KeyValue wraps a byte array and takes offsets and lengths into passed array at where to start interpreting the content as KeyValue.
The KeyValue format inside a byte array is:
The Key is further decomposed as:
For more information, see the KeyValue source code.
To emphasize the points above, examine what happens with two Puts for two different columns for the same row:
rowkey=row1, cf:attr1=value1
rowkey=row1, cf:attr2=value2
Even though these are for the same row, a KeyValue is created for each column:
Key portion for Put #1:
------------> 4
-----------------> row1
---> 2
--------> cf
------> attr1
-----------> server time of Put
-------------> Put
Key portion for Put #2:
------------> 4
-----------------> row1
---> 2
--------> cf
------> attr2
-----------> server time of Put
-------------> Put
It is critical to understand that the rowkey, ColumnFamily, and column (aka columnqualifier) are embedded within the KeyValue instance. The longer these identifiers are, the bigger the KeyValue is.
There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent files and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction will pick up all the files in the store and in this case it actually promotes itself to being a major compaction. For a description of how a minor compaction picks files to compact, see the ascii diagram in the Store source code.
After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable; major compactions will usually have to be done manually on large systems. See Section 2.8.2.8, “Managed Compactions”.
Bloom filters were developed over in HBase-1200 Add bloomfilters.[22][23]
See also Section 10.5.4, “Bloom Filters” and Section 2.9, “Bloom Filter Configuration”.
Bloom filters add an entry to the StoreFile
general FileInfo
data structure and then two
extra entries to the StoreFile
metadata
section.
FileInfo
has a
BLOOM_FILTER_TYPE
entry which is set to
NONE
, ROW
or
ROWCOL.
BLOOM_FILTER_META
holds Bloom Size, Hash
Function used, etc. Its small in size and is cached on
StoreFile.Reader
load
BLOOM_FILTER_DATA
is the actual bloomfilter
data. Obtained on-demand. Stored in the LRU cache, if it is enabled
(Its enabled by default).
[22] For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the Development Process section of the document BloomFilters in HBase attached to HBase-1200.
[23] The bloom filters described here are actually version two of blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the European Commission One-Lab Project 034819. The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well. Version 2 is a rewrite from scratch though again it starts with the one-lab work.