Package org.apache.hadoop.hbase.replication

Multi Cluster Replication

See:
          Description

Class Summary
ReplicationZookeeperWrapper This class serves as a helper for all things related to zookeeper in replication.
 

Package org.apache.hadoop.hbase.replication Description

Multi Cluster Replication

This package provides replication between HBase clusters.

Table Of Contents

  1. Status
  2. Requirements
  3. Deployment

Status

This package is experimental quality software and is only meant to be a base for future developments. The current implementation offers the following features:

  1. Master/Slave replication limited to 1 slave cluster.
  2. Replication of scoped families in user tables.
  3. Start/stop replication stream.
  4. Supports clusters of different sizes.
  5. Handling of partitions longer than 10 minutes
Please report bugs on the project's Jira when found.

Requirements

Before trying out replication, make sure to review the following requirements:

  1. Zookeeper should be handled by yourself, not by HBase, and should always be available during the deployment.
  2. All machines from both clusters should be able to reach every other machine since replication goes from any region server to any other one on the slave cluster. That also includes the Zookeeper clusters.
  3. Both clusters should have the same HBase and Hadoop major revision. For example, having 0.90.1 on the master and 0.90.0 on the slave is correct but not 0.90.1 and 0.89.20100725.
  4. Every table that contains families that are scoped for replication should exist on every cluster with the exact same name, same for those replicated families.

Deployment

The following steps describe how to enable replication from a cluster to another. This must be done with both clusters offlined.

  1. Edit ${HBASE_HOME}/conf/hbase-site.xml on both cluster to add the following configurations:
    <property>
      <name>hbase.replication</name>
      <value>true</value>
    </property>
  2. Run the following command on any cluster:
    $HBASE_HOME/bin/hbase org.jruby.Main $HBASE_HOME/bin/replication/add_peer.tb
    This will show you the help to setup the replication stream between both clusters. If both clusters use the same Zookeeper cluster, you have to use a different zookeeper.znode.parent since they can't write in the same folder.
  3. You can now start and stop the clusters with your preferred method.
You can confirm that your setup works by looking at any region server's log on the master cluster and look for the following lines;
Considering 1 rs, with ratio 0.1
Getting 1 rs from peer cluster # 0
Choosing peer 10.10.1.49:62020
In this case it indicates that 1 region server from the slave cluster was chosen for replication.

Should you want to stop the replication while the clusters are running, open the shell on the master cluster and issue this command:
hbase(main):001:0> zk 'set /zookeeper.znode.parent/replication/state false'
Where you replace the znode parent with the one configured on your master cluster. Replication of already queued edits will still happen after you issued that command but new entries won't be. To start it back, simply replace "false" with "true" in the command.



Copyright © 2010 Apache Software Foundation. All Rights Reserved.