Chapter 13. Performance Tuning

Table of Contents

13.1. Java
13.1.1. The Garage Collector and HBase
13.2. Configurations
13.2.1. Number of Regions
13.2.2. Managing Compactions
13.2.3. Compression
13.3. Number of Column Families
13.4. Data Clumping
13.5. Batch Loading
13.6. HBase Client
13.6.1. AutoFlush
13.6.2. Scan Caching
13.6.3. Close ResultScanners
13.6.4. Block Cache

Start with the wiki Performance Tuning page. It has a general discussion of the main factors involved; RAM, compression, JVM settings, etc. Afterward, come back here for more pointers.

13.1. Java

13.1.1. The Garage Collector and HBase

13.1.1.1. Long GC pauses

In his presentation, Avoiding Full GCs with MemStore-Local Allocation Buffers, Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading; CMS failure modes and old generation heap fragmentation brought. To address the first, start the CMS earlier than default by adding -XX:CMSInitiatingOccupancyFraction and setting it down from defaults. Start at 60 or 70 percent (The lower you bring down the threshold, the more GCing is done, the more CPU used). To address the second fragmentation issue, Todd added an experimental facility that must be explicitly enabled in HBase 0.90.x (Its defaulted to be on in 0.92.x HBase). See hbase.hregion.memstore.mslab.enabled to true in your Configuration. See the cited slides for background and detail.