Chapter 1. Performance Tuning

Table of Contents

1.1. Operating System
1.1.1. Memory
1.1.2. 64-bit
1.1.3. Swapping
1.2. Network
1.2.1. Single Switch
1.2.2. Multiple Switches
1.2.3. Multiple Racks
1.3. Java
1.3.1. The Garbage Collector and HBase
1.4. HBase Configurations
1.4.1. Number of Regions
1.4.2. Managing Compactions
1.4.3. hbase.regionserver.handler.count
1.4.4. hfile.block.cache.size
1.4.5. hbase.regionserver.global.memstore.upperLimit
1.4.6. hbase.regionserver.global.memstore.lowerLimit
1.4.7. hbase.hstore.blockingStoreFiles
1.4.8. hbase.hregion.memstore.block.multiplier
1.5. ZooKeeper
1.6. Schema Design
1.6.1. Number of Column Families
1.6.2. Key and Attribute Lengths
1.6.3. Table RegionSize
1.6.4. Bloom Filters
1.6.5. ColumnFamily BlockSize
1.6.6. In-Memory ColumnFamilies
1.6.7. Compression
1.7. Writing to HBase
1.7.1. Batch Loading
1.7.2. Table Creation: Pre-Creating Regions
1.7.3. Table Creation: Deferred Log Flush
1.7.4. HBase Client: AutoFlush
1.7.5. HBase Client: Turn off WAL on Puts
1.7.6. HBase Client: Group Puts by RegionServer
1.7.7. MapReduce: Skip The Reducer
1.7.8. Anti-Pattern: One Hot Region
1.8. Reading from HBase
1.8.1. Scan Caching
1.8.2. Scan Attribute Selection
1.8.3. Close ResultScanners
1.8.4. Block Cache
1.8.5. Optimal Loading of Row Keys
1.8.6. Concurrency: Monitor Data Spread
1.9. Deleting from HBase
1.9.1. Using HBase Tables as Queues
1.9.2. Delete RPC Behavior
1.10. HDFS
1.10.1. Current Issues With Low-Latency Reads
1.10.2. Performance Comparisons of HBase vs. HDFS
1.11. Amazon EC2

1.1. Operating System

1.1.1. Memory

RAM, RAM, RAM. Don't starve HBase.

1.1.2. 64-bit

Use a 64-bit platform (and 64-bit JVM).

1.1.3. Swapping

Watch out for swapping. Set swappiness to 0.