org.apache.hadoop.hbase
Class BloomFilterDescriptor

java.lang.Object
  extended by org.apache.hadoop.hbase.BloomFilterDescriptor
All Implemented Interfaces:
Comparable, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable

public class BloomFilterDescriptor
extends Object
implements org.apache.hadoop.io.WritableComparable

Supplied as a parameter to HColumnDescriptor to specify what kind of bloom filter to use for a column, and its configuration parameters. There is no way to automatically determine the vector size and the number of hash functions to use. In particular, bloom filters are very sensitive to the number of elements inserted into them. For HBase, the number of entries depends on the size of the data stored in the column. Currently the default region size is 64MB, so the number of entries is approximately 64MB / (average value size for column). If m denotes the number of bits in the Bloom filter (vectorSize), n denotes the number of elements inserted into the Bloom filter and k represents the number of hash functions used (nbHash), then according to Broder and Mitzenmacher, ( http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf ) the probability of false positives is minimized when k is approximately m/n ln(2).


Nested Class Summary
static class BloomFilterDescriptor.BloomFilterType
          The type of bloom filter
 
Constructor Summary
BloomFilterDescriptor()
          Default constructor - used in conjunction with Writable
BloomFilterDescriptor(BloomFilterDescriptor.BloomFilterType type, int numberOfEntries)
          Creates a BloomFilterDescriptor for the specified type of filter, fixes the number of hash functions to 4 and computes a vector size using: vectorSize = ceil((4 * n) / ln(2))
BloomFilterDescriptor(BloomFilterDescriptor.BloomFilterType type, int vectorSize, int nbHash)
           
 
Method Summary
 int compareTo(Object o)
          
 boolean equals(Object obj)
          
 int getNbHash()
           
 BloomFilterDescriptor.BloomFilterType getType()
           
 int getVectorSize()
           
 int hashCode()
          
 void readFields(DataInput in)
          
 String toString()
          
 void write(DataOutput out)
          
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

BloomFilterDescriptor

public BloomFilterDescriptor()
Default constructor - used in conjunction with Writable


BloomFilterDescriptor

public BloomFilterDescriptor(BloomFilterDescriptor.BloomFilterType type,
                             int numberOfEntries)
Creates a BloomFilterDescriptor for the specified type of filter, fixes the number of hash functions to 4 and computes a vector size using: vectorSize = ceil((4 * n) / ln(2))

Parameters:
type -
numberOfEntries -

BloomFilterDescriptor

public BloomFilterDescriptor(BloomFilterDescriptor.BloomFilterType type,
                             int vectorSize,
                             int nbHash)
Parameters:
type - The kind of bloom filter to use.
vectorSize - The vector size of this filter.
nbHash - The number of hash functions to consider.
Method Detail

toString

public String toString()

Overrides:
toString in class Object

getType

public BloomFilterDescriptor.BloomFilterType getType()

getVectorSize

public int getVectorSize()

getNbHash

public int getNbHash()

equals

public boolean equals(Object obj)

Overrides:
equals in class Object

hashCode

public int hashCode()

Overrides:
hashCode in class Object

readFields

public void readFields(DataInput in)
                throws IOException

Specified by:
readFields in interface org.apache.hadoop.io.Writable
Throws:
IOException

write

public void write(DataOutput out)
           throws IOException

Specified by:
write in interface org.apache.hadoop.io.Writable
Throws:
IOException

compareTo

public int compareTo(Object o)

Specified by:
compareTo in interface Comparable


Copyright © 2008 The Apache Software Foundation