Table of Contents
Bloom filters were developed over in HBase-1200 Add bloomfilters.[21][22]
Blooms are enabled by specifying options on a column family in the
HBase shell or in java code as specification on
org.apache.hadoop.hbase.HColumnDescriptor
.
Use HColumnDescriptor.setBloomFilterType(NONE | ROW |
ROWCOL)
to enable blooms per Column Family. Default =
NONE
for no bloom filters. If
ROW
, the hash of the row will be added to the bloom
on each insert. If ROWCOL
, the hash of the row +
column family + column family qualifier will be added to the bloom on
each key insert.
io.hfile.bloom.enabled
in
Configuration
serves as the kill switch in case
something goes wrong. Default = true
.
io.hfile.bloom.error.rate
= average false
positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
bit per bloom entry.
io.hfile.bloom.max.fold
= guaranteed minimum
fold rate. Most people should leave this alone. Default = 7, or can
collapse to at least 1/128th of original size. See the
Development Process section of the document BloomFilters
in HBase for more on what this option means.
[21] For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the Development Process section of the document BloomFilters in HBase attached to HBase-1200.
[22] The bloom filters described here are actually version two of blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the European Commission One-Lab Project 034819. The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well. Version 2 is a rewrite from scratch though again it starts with the one-lab work.