org.apache.hadoop.hbase.util
Class ByteBloomFilter

java.lang.Object
  extended by org.apache.hadoop.hbase.util.ByteBloomFilter
All Implemented Interfaces:
BloomFilter

public class ByteBloomFilter
extends Object
implements BloomFilter

Implements a Bloom filter, as defined by Bloom in 1970.

The Bloom filter is a data structure that was introduced in 1970 and that has been adopted by the networking research community in the past decade thanks to the bandwidth efficiencies that it offers for the transmission of set membership information between networked hosts. A sender encodes the information into a bit vector, the Bloom filter, that is more compact than a conventional representation. Computation and space costs for construction are linear in the number of elements. The receiver uses the filter to test whether various elements are members of the set. Though the filter will occasionally return a false positive, it will never return a false negative. When creating the filter, the sender can choose its desired point in a trade-off between the false positive rate and the size.

Originally inspired by European Commission One-Lab Project 034819.

See Also:
The general behavior of a filter, Space/Time Trade-Offs in Hash Coding with Allowable Errors

Field Summary
protected  ByteBuffer bloom
          Bloom bits
protected  long byteSize
          Bytes (B) in the array
protected  Hash hash
          Hash Function
protected  int hashCount
          Number of hash functions
protected  int hashType
          Hash type
protected  int keyCount
          Keys currently in the bloom
protected  int maxKeys
          Max Keys expected for the bloom
static int VERSION
          Current file format version
 
Constructor Summary
ByteBloomFilter(ByteBuffer meta)
          Loads bloom filter meta data from file input.
ByteBloomFilter(int maxKeys, float errorRate, int hashType, int foldFactor)
          Determines & initializes bloom filter meta data from user config.
 
Method Summary
 void add(byte[] buf)
          Add the specified binary to the bloom filter.
 void add(byte[] buf, int offset, int len)
          Add the specified binary to the bloom filter.
 void allocBloom()
          Allocate memory for the bloom filter data.
 void compactBloom()
          Compact the bloom before writing metadata & data to disk
 boolean contains(byte[] buf, ByteBuffer theBloom)
          Check if the specified key is contained in the bloom filter.
 boolean contains(byte[] buf, int offset, int length, ByteBuffer theBloom)
          Check if the specified key is contained in the bloom filter.
 int getByteSize()
           
 org.apache.hadoop.io.Writable getDataWriter()
          Get a writable interface into bloom filter data (actual bloom).
 int getKeyCount()
           
 int getMaxKeys()
           
 org.apache.hadoop.io.Writable getMetaWriter()
          Get a writable interface into bloom filter meta data.
 void writeBloom(DataOutput out)
          Writes just the bloom filter to the output array
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final int VERSION
Current file format version

See Also:
Constant Field Values

byteSize

protected long byteSize
Bytes (B) in the array


hashCount

protected final int hashCount
Number of hash functions


hashType

protected final int hashType
Hash type


hash

protected final Hash hash
Hash Function


keyCount

protected int keyCount
Keys currently in the bloom


maxKeys

protected int maxKeys
Max Keys expected for the bloom


bloom

protected ByteBuffer bloom
Bloom bits

Constructor Detail

ByteBloomFilter

public ByteBloomFilter(ByteBuffer meta)
                throws IllegalArgumentException
Loads bloom filter meta data from file input.

Parameters:
meta - stored bloom meta data
Throws:
IllegalArgumentException - meta data is invalid

ByteBloomFilter

public ByteBloomFilter(int maxKeys,
                       float errorRate,
                       int hashType,
                       int foldFactor)
                throws IllegalArgumentException
Determines & initializes bloom filter meta data from user config. Call allocBloom() to allocate bloom filter data.

Parameters:
maxKeys - Maximum expected number of keys that will be stored in this bloom
errorRate - Desired false positive error rate. Lower rate = more storage required
hashType - Type of hash function to use
foldFactor - When finished adding entries, you may be able to 'fold' this bloom to save space. Tradeoff potentially excess bytes in bloom for ability to fold if keyCount is exponentially greater than maxKeys.
Throws:
IllegalArgumentException
Method Detail

allocBloom

public void allocBloom()
Description copied from interface: BloomFilter
Allocate memory for the bloom filter data. Note that bloom data isn't allocated by default because it can grow large & reads would be better managed by the LRU cache.

Specified by:
allocBloom in interface BloomFilter

add

public void add(byte[] buf)
Description copied from interface: BloomFilter
Add the specified binary to the bloom filter.

Specified by:
add in interface BloomFilter
Parameters:
buf - data to be added to the bloom

add

public void add(byte[] buf,
                int offset,
                int len)
Description copied from interface: BloomFilter
Add the specified binary to the bloom filter.

Specified by:
add in interface BloomFilter
Parameters:
buf - data to be added to the bloom
offset - offset into the data to be added
len - length of the data to be added

contains

public boolean contains(byte[] buf,
                        ByteBuffer theBloom)
Description copied from interface: BloomFilter
Check if the specified key is contained in the bloom filter.

Specified by:
contains in interface BloomFilter
Parameters:
buf - data to check for existence of
theBloom - bloom filter data to search
Returns:
true if matched by bloom, false if not

contains

public boolean contains(byte[] buf,
                        int offset,
                        int length,
                        ByteBuffer theBloom)
Description copied from interface: BloomFilter
Check if the specified key is contained in the bloom filter.

Specified by:
contains in interface BloomFilter
Parameters:
buf - data to check for existence of
offset - offset into the data
length - length of the data
theBloom - bloom filter data to search
Returns:
true if matched by bloom, false if not

getKeyCount

public int getKeyCount()
Specified by:
getKeyCount in interface BloomFilter
Returns:
The number of keys added to the bloom

getMaxKeys

public int getMaxKeys()
Specified by:
getMaxKeys in interface BloomFilter
Returns:
The max number of keys that can be inserted to maintain the desired error rate

getByteSize

public int getByteSize()
Specified by:
getByteSize in interface BloomFilter
Returns:
Size of the bloom, in bytes

compactBloom

public void compactBloom()
Description copied from interface: BloomFilter
Compact the bloom before writing metadata & data to disk

Specified by:
compactBloom in interface BloomFilter

writeBloom

public void writeBloom(DataOutput out)
                throws IOException
Writes just the bloom filter to the output array

Parameters:
out - OutputStream to place bloom
Throws:
IOException - Error writing bloom array

getMetaWriter

public org.apache.hadoop.io.Writable getMetaWriter()
Description copied from interface: BloomFilter
Get a writable interface into bloom filter meta data.

Specified by:
getMetaWriter in interface BloomFilter
Returns:
writable class

getDataWriter

public org.apache.hadoop.io.Writable getDataWriter()
Description copied from interface: BloomFilter
Get a writable interface into bloom filter data (actual bloom).

Specified by:
getDataWriter in interface BloomFilter
Returns:
writable class


Copyright © 2011 The Apache Software Foundation. All Rights Reserved.