org.apache.cassandra.service
Class AntiEntropyService

java.lang.Object
  extended by org.apache.cassandra.service.AntiEntropyService

public class AntiEntropyService
extends java.lang.Object

AntiEntropyService encapsulates "validating" (hashing) individual column families, exchanging MerkleTrees with remote nodes via a TreeRequest/Response conversation, and then triggering repairs for disagreeing ranges. Every Tree conversation has an 'initiator', where valid trees are sent after generation and where the local and remote tree will rendezvous in register(cf, endpoint, tree). Once the trees rendezvous, a Differencer is executed and the service can trigger repairs for disagreeing ranges. Tree comparison and repair triggering occur in the single threaded AE_SERVICE_STAGE. The steps taken to enact a repair are as follows: 1. A major compaction is triggered either via nodeprobe, or automatically: * Nodeprobe sends TreeRequest messages to all neighbors of the target node: when a node receives a TreeRequest, it will perform a readonly compaction to immediately validate the column family. * Automatic compactions will also validate a column family and broadcast TreeResponses, but since TreeRequest messages are not sent to neighboring nodes, repairs will only occur if two nodes happen to perform automatic compactions within TREE_CACHE_LIFETIME of one another. 2. The compaction process validates the column family by: * Calling getValidator(), which can return a NoopValidator if validation should not be performed, * Calling IValidator.prepare(), which samples the column family to determine key distribution, * Calling IValidator.add() in order for every row in the column family, * Calling IValidator.complete() to indicate that all rows have been added. * If getValidator decided that the column family should be validated, calling complete() indicates that a valid MerkleTree has been created for the column family. * The valid tree is broadcast to neighboring nodes via TreeResponse, and stored locally. 3. When a node receives a TreeResponse, it passes the tree to register(), which checks for trees to rendezvous with / compare to: * If the tree is local, it is cached, and compared to any trees that were received from neighbors. * If the tree is remote, it is immediately compared to a local tree if one is cached. Otherwise, the remote tree is stored until a local tree can be generated. * A Differencer object is enqueued for each comparison. 4. Differencers are executed in AE_SERVICE_STAGE, to compare the two trees. * Based on the fraction of disagreement between the trees, the differencer will either perform repair via the io.Streaming api, or via RangeCommand read repairs. 5. TODO: Because a local tree is stored for TREE_CACHE_LIFETIME, it is possible to perform redundant repairs when repairs are triggered manually. Because of the SSTable architecture, this doesn't cause any problems except excess data transfer, but: * One possible solution is to maintain the local tree in memory by invalidating ranges when they change, and only performing partial compactions/validations. * Another would be to only communicate with one neighbor at a time, meaning that an additional compaction is required for every neighbor.


Nested Class Summary
static class AntiEntropyService.Differencer
          Compares two trees, and launches repairs for disagreeing ranges.
static interface AntiEntropyService.IValidator
          A Strategy to handle building and validating a merkle tree for a column family.
static class AntiEntropyService.NoopValidator
          The IValidator to be used before a cluster has stabilized, or when repairs are disabled.
static class AntiEntropyService.TreeRequestVerbHandler
          Handler for requests from remote nodes to generate a valid tree.
static class AntiEntropyService.TreeResponseVerbHandler
          Handler for responses from remote nodes that contain a valid tree.
static class AntiEntropyService.Validator
          The IValidator to be used in normal operation.
 
Field Summary
static java.lang.String AE_SERVICE_STAGE
           
static long TREE_CACHE_LIFETIME
           
static java.lang.String TREE_REQUEST_VERB
           
static java.lang.String TREE_RESPONSE_VERB
           
 
Method Summary
 AntiEntropyService.IValidator getValidator(java.lang.String table, java.lang.String cf, java.net.InetAddress initiator)
          Return a Validator object which can be used to collect hashes for a column family.
static AntiEntropyService instance()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

AE_SERVICE_STAGE

public static final java.lang.String AE_SERVICE_STAGE
See Also:
Constant Field Values

TREE_REQUEST_VERB

public static final java.lang.String TREE_REQUEST_VERB
See Also:
Constant Field Values

TREE_RESPONSE_VERB

public static final java.lang.String TREE_RESPONSE_VERB
See Also:
Constant Field Values

TREE_CACHE_LIFETIME

public static final long TREE_CACHE_LIFETIME
See Also:
Constant Field Values
Method Detail

instance

public static AntiEntropyService instance()

getValidator

public AntiEntropyService.IValidator getValidator(java.lang.String table,
                                                  java.lang.String cf,
                                                  java.net.InetAddress initiator)
Return a Validator object which can be used to collect hashes for a column family. A Validator must be prepared() before use, and completed() afterward.

Parameters:
table - The table name containing the column family.
cf - The column family name.
initiator - Endpoint that initially triggered this validation, or null if the validation will not see all of the data contained in the column family.
Returns:
A Validator.


Copyright © 2009 The Apache Software Foundation