Chapter 14. Bloom Filters

Table of Contents

14.1. Configurations
14.1.1. HColumnDescriptor option
14.1.2. io.hfile.bloom.enabled global kill switch
14.1.3. io.hfile.bloom.error.rate
14.1.4. io.hfile.bloom.max.fold
14.2. Bloom StoreFile footprint
14.2.1. BloomFilter in the StoreFile FileInfo data structure
14.2.2. BloomFilter entries in StoreFile metadata

Bloom filters were developed over in HBase-1200 Add bloomfilters.[21][22]

14.1. Configurations

Blooms are enabled by specifying options on a column family in the HBase shell or in java code as specification on org.apache.hadoop.hbase.HColumnDescriptor.

14.1.1. HColumnDescriptor option

Use HColumnDescriptor.setBloomFilterType(NONE | ROW | ROWCOL) to enable blooms per Column Family. Default = NONE for no bloom filters. If ROW, the hash of the row will be added to the bloom on each insert. If ROWCOL, the hash of the row + column family + column family qualifier will be added to the bloom on each key insert.

14.1.2. io.hfile.bloom.enabled global kill switch

io.hfile.bloom.enabled in Configuration serves as the kill switch in case something goes wrong. Default = true.

14.1.3. io.hfile.bloom.error.rate

io.hfile.bloom.error.rate = average false positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1 bit per bloom entry.

14.1.4. io.hfile.bloom.max.fold

io.hfile.bloom.max.fold = guaranteed minimum fold rate. Most people should leave this alone. Default = 7, or can collapse to at least 1/128th of original size. See the Development Process section of the document BloomFilters in HBase for more on what this option means.



[21] For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the Development Process section of the document BloomFilters in HBase attached to HBase-1200.

[22] The bloom filters described here are actually version two of blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the European Commission One-Lab Project 034819. The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well. Version 2 is a rewrite from scratch though again it starts with the one-lab work.