org.apache.hadoop.hbase.util
Class DynamicByteBloomFilter

java.lang.Object
  extended by org.apache.hadoop.hbase.util.DynamicByteBloomFilter
All Implemented Interfaces:
BloomFilter

public class DynamicByteBloomFilter
extends Object
implements BloomFilter

Implements a dynamic Bloom filter, as defined in the INFOCOM 2006 paper.

A dynamic Bloom filter (DBF) makes use of a s * m bit matrix but each of the s rows is a standard Bloom filter. The creation process of a DBF is iterative. At the start, the DBF is a 1 * m bit matrix, i.e., it is composed of a single standard Bloom filter. It assumes that nr elements are recorded in the initial bit vector, where nr <= n (n is the cardinality of the set A to record in the filter).

As the size of A grows during the execution of the application, several keys must be inserted in the DBF. When inserting a key into the DBF, one must first get an active Bloom filter in the matrix. A Bloom filter is active when the number of recorded keys, nr, is strictly less than the current cardinality of A, n. If an active Bloom filter is found, the key is inserted and nr is incremented by one. On the other hand, if there is no active Bloom filter, a new one is created (i.e., a new row is added to the matrix) according to the current size of A and the element is added in this new Bloom filter and the nr value of this new Bloom filter is set to one. A given key is said to belong to the DBF if the k positions are set to one in one of the matrix rows.

Originally created by European Commission One-Lab Project 034819.

See Also:
A Bloom filter, Theory and Network Applications of Dynamic Bloom Filters

Field Summary
protected  int curKeys
          The number of keys recorded in the current Bloom filter.
protected  float errorRate
          The maximum false positive rate per bloom
protected  int hashType
          Hash type
protected  int keyInterval
          Maximum number of keys in a dynamic Bloom filter row.
protected  ByteBloomFilter[] matrix
          The matrix of Bloom filters (contains bloom data only during writes).
protected  int readMatrixSize
          expected size of bloom filter matrix (used during reads)
static int VERSION
          Current file format version
 
Constructor Summary
DynamicByteBloomFilter(ByteBuffer meta)
          Normal read constructor.
DynamicByteBloomFilter(int keyInterval, float errorRate, int hashType)
          Normal write constructor.
 
Method Summary
 void add(byte[] buf)
          Add the specified binary to the bloom filter.
 void add(byte[] buf, int offset, int len)
          Add the specified binary to the bloom filter.
 void allocBloom()
          Allocate memory for the bloom filter data.
 void compactBloom()
          Compact the bloom before writing metadata & data to disk
 boolean contains(byte[] buf, ByteBuffer theBloom)
          Check if the specified key is contained in the bloom filter.
 boolean contains(byte[] buf, int offset, int length, ByteBuffer theBloom)
          Check if the specified key is contained in the bloom filter.
 int getByteSize()
           
 org.apache.hadoop.io.Writable getDataWriter()
          Get a writable interface into bloom filter data (actual bloom).
 int getKeyCount()
           
 int getMaxKeys()
           
 org.apache.hadoop.io.Writable getMetaWriter()
          Get a writable interface into bloom filter meta data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final int VERSION
Current file format version

See Also:
Constant Field Values

keyInterval

protected final int keyInterval
Maximum number of keys in a dynamic Bloom filter row.


errorRate

protected final float errorRate
The maximum false positive rate per bloom


hashType

protected final int hashType
Hash type


curKeys

protected int curKeys
The number of keys recorded in the current Bloom filter.


readMatrixSize

protected int readMatrixSize
expected size of bloom filter matrix (used during reads)


matrix

protected ByteBloomFilter[] matrix
The matrix of Bloom filters (contains bloom data only during writes).

Constructor Detail

DynamicByteBloomFilter

public DynamicByteBloomFilter(ByteBuffer meta)
                       throws IllegalArgumentException
Normal read constructor. Loads bloom filter meta data.

Parameters:
meta - stored bloom meta data
Throws:
IllegalArgumentException - meta data is invalid

DynamicByteBloomFilter

public DynamicByteBloomFilter(int keyInterval,
                              float errorRate,
                              int hashType)
                       throws IllegalArgumentException
Normal write constructor. Note that this doesn't allocate bloom data by default. Instead, call allocBloom() before adding entries.

Parameters:
errorRate -
hashType - type of the hashing function (see org.apache.hadoop.util.hash.Hash).
keyInterval - Maximum number of keys to record per Bloom filter row.
Throws:
IllegalArgumentException - The input parameters were invalid
Method Detail

allocBloom

public void allocBloom()
Description copied from interface: BloomFilter
Allocate memory for the bloom filter data. Note that bloom data isn't allocated by default because it can grow large & reads would be better managed by the LRU cache.

Specified by:
allocBloom in interface BloomFilter

add

public void add(byte[] buf,
                int offset,
                int len)
Description copied from interface: BloomFilter
Add the specified binary to the bloom filter.

Specified by:
add in interface BloomFilter
Parameters:
buf - data to be added to the bloom
offset - offset into the data to be added
len - length of the data to be added

add

public void add(byte[] buf)
Description copied from interface: BloomFilter
Add the specified binary to the bloom filter.

Specified by:
add in interface BloomFilter
Parameters:
buf - data to be added to the bloom

contains

public boolean contains(byte[] buf,
                        ByteBuffer theBloom)
Description copied from interface: BloomFilter
Check if the specified key is contained in the bloom filter.

Specified by:
contains in interface BloomFilter
Parameters:
buf - data to check for existence of
theBloom - bloom filter data to search
Returns:
true if matched by bloom, false if not

contains

public boolean contains(byte[] buf,
                        int offset,
                        int length,
                        ByteBuffer theBloom)
Description copied from interface: BloomFilter
Check if the specified key is contained in the bloom filter.

Specified by:
contains in interface BloomFilter
Parameters:
buf - data to check for existence of
offset - offset into the data
length - length of the data
theBloom - bloom filter data to search
Returns:
true if matched by bloom, false if not

getKeyCount

public int getKeyCount()
Specified by:
getKeyCount in interface BloomFilter
Returns:
The number of keys added to the bloom

getMaxKeys

public int getMaxKeys()
Specified by:
getMaxKeys in interface BloomFilter
Returns:
The max number of keys that can be inserted to maintain the desired error rate

getByteSize

public int getByteSize()
Specified by:
getByteSize in interface BloomFilter
Returns:
Size of the bloom, in bytes

compactBloom

public void compactBloom()
Description copied from interface: BloomFilter
Compact the bloom before writing metadata & data to disk

Specified by:
compactBloom in interface BloomFilter

getMetaWriter

public org.apache.hadoop.io.Writable getMetaWriter()
Description copied from interface: BloomFilter
Get a writable interface into bloom filter meta data.

Specified by:
getMetaWriter in interface BloomFilter
Returns:
writable class

getDataWriter

public org.apache.hadoop.io.Writable getDataWriter()
Description copied from interface: BloomFilter
Get a writable interface into bloom filter data (actual bloom).

Specified by:
getDataWriter in interface BloomFilter
Returns:
writable class


Copyright © 2011 The Apache Software Foundation. All Rights Reserved.