Chapter 12. Architecture

Table of Contents

12.1. Client
12.1.1. Connections
12.1.2. WriteBuffer and Batch Methods
12.1.3. Filters
12.2. Daemons
12.2.1. Master
12.2.2. RegionServer
12.3. Regions
12.3.1. Region Size
12.3.2. Region Splits
12.3.3. Region Load Balancer
12.3.4. Store
12.4. Write Ahead Log (WAL)
12.4.1. Purpose
12.4.2. WAL Flushing
12.4.3. WAL Splitting

12.1. Client

The HBase client HTable is responsible for finding RegionServers that are serving the particular row range of interest. It does this by querying the .META. and -ROOT catalog tables (TODO: Explain). After locating the required region(s), the client directly contacts the RegionServer serving that region (i.e., it does not go through the master) and issues the read or write request. This information is cached in the client so that subsequent requests need not go through the lookup process. Should a region be reassigned either by the master load balancer or because a RegionServer has died, the client will requery the catalog tables to determine the new location of the user region.

Administrative functions are handled through HBaseAdmin

12.1.1. Connections

For connection configuration information, see Section 3.5, “Client configuration and dependencies connecting to an HBase cluster”.

HTable instances are not thread-safe. When creating HTable instances, it is advisable to use the same HBaseConfiguration instance. This will ensure sharing of ZooKeeper and socket instances to the RegionServers which is usually what you want. For example, this is preferred:

HBaseConfiguration conf = HBaseConfiguration.create();
HTable table1 = new HTable(conf, "myTable");
HTable table2 = new HTable(conf, "myTable");

a s opposed to this:

HBaseConfiguration conf1 = HBaseConfiguration.create();
HTable table1 = new HTable(conf1, "myTable");
HBaseConfiguration conf2 = HBaseConfiguration.create();
HTable table2 = new HTable(conf2, "myTable");

For more information about how connections are handled in the HBase client, see HConnectionManager.

12.1.2. WriteBuffer and Batch Methods

If Section 13.6.1, “AutoFlush” is turned off on HTable, Puts are sent to RegionServers when the writebuffer is filled. The writebuffer is 2MB by default. Before an HTable instance is discarded, either close() or flushCommits() should be invoked so Puts will not be lost.

For fine-grained control of batching of Puts or Deletes, see the batch methods on HTable.

12.1.3. Filters

Get and Scan instances can be optionally configured with filters which are applied on the RegionServer.