The Apache HBase Book

Revision History
Revision 0.90.4  
Adding first cuts at Configuration, Getting Started, Data Model
Revision 0.89.20100924 5 October 2010 stack
Initial layout

Abstract

This is the official book of Apache HBase, a distributed, versioned, column-oriented database built on top of Apache Hadoop and Apache ZooKeeper.


Table of Contents

Preface
1. Getting Started
1.1. Introduction
1.2. Quick Start
1.2.1. Download and unpack the latest stable release.
1.2.2. Start HBase
1.2.3. Shell Exercises
1.2.4. Stopping HBase
1.2.5. Where to go next
1.3. Not-so-quick Start Guide
1.3.1. Requirements
1.3.2. HBase run modes: Standalone and Distributed
1.3.3. Example Configurations
2. Upgrading
2.1. Upgrading to HBase 0.90.x from 0.20.x or 0.89.x
3. Configuration
3.1. hbase-site.xml and hbase-default.xml
3.1.1. HBase Default Configuration
3.2. hbase-env.sh
3.3. log4j.properties
3.4. The Important Configurations
3.4.1. Required Configurations
3.4.2. Recommended Configuations
3.5. Client configuration and dependencies connecting to an HBase cluster
3.5.1. Java client configuration
4. The HBase Shell
4.1. Scripting
4.2. Shell Tricks
4.2.1. irbrc
4.2.2. LOG data to timestamp
4.2.3. Debug
5. Building HBase
5.1. Adding an HBase release to Apache's Maven Repository
6. Developers
6.1. IDEs
6.1.1. Eclipse
6.2. Unit Tests
6.2.1. Mocito
7. HBase and MapReduce
7.1. The default HBase MapReduce Splitter
7.2. HBase Input MapReduce Example
7.3. Accessing Other HBase Tables in a MapReduce Job
7.4. Speculative Execution
8. HBase and Schema Design
8.1. Schema Creation
8.2. On the number of column families
8.3. Monotonically Increasing Row Keys/Timeseries Data
8.4. Try to minimize row and column sizes
8.5. Number of Versions
9. Metrics
9.1. Metric Setup
9.2. RegionServer Metrics
9.2.1. hbase.regionserver.blockCacheCount
9.2.2. hbase.regionserver.blockCacheFree
9.2.3. hbase.regionserver.blockCacheHitRatio
9.2.4. hbase.regionserver.blockCacheSize
9.2.5. hbase.regionserver.compactionQueueSize
9.2.6. hbase.regionserver.fsReadLatency_avg_time
9.2.7. hbase.regionserver.fsReadLatency_num_ops
9.2.8. hbase.regionserver.fsSyncLatency_avg_time
9.2.9. hbase.regionserver.fsSyncLatency_num_ops
9.2.10. hbase.regionserver.fsWriteLatency_avg_time
9.2.11. hbase.regionserver.fsWriteLatency_num_ops
9.2.12. hbase.regionserver.memstoreSizeMB
9.2.13. hbase.regionserver.regions
9.2.14. hbase.regionserver.requests
9.2.15. hbase.regionserver.storeFileIndexSizeMB
9.2.16. hbase.regionserver.stores
9.2.17. hbase.regionserver.storeFiles
10. Cluster Replication
11. Data Model
11.1. Conceptual View
11.2. Physical View
11.3. Table
11.4. Row
11.5. Column Family
11.6. Cells
11.7. Versions
11.7.1. Versions and HBase Operations
11.7.2. Current Limitations
12. Architecture
12.1. Client
12.1.1. Connections
12.1.2. WriteBuffer and Batch Methods
12.1.3. Filters
12.2. Daemons
12.2.1. Master
12.2.2. RegionServer
12.3. Regions
12.3.1. Region Size
12.3.2. Region Splits
12.3.3. Region Load Balancer
12.3.4. Store
12.4. Write Ahead Log (WAL)
12.4.1. Purpose
12.4.2. WAL Flushing
12.4.3. WAL Splitting
13. Performance Tuning
13.1. Java
13.1.1. The Garbage Collector and HBase
13.2. Configurations
13.2.1. Number of Regions
13.2.2. Managing Compactions
13.2.3. Compression
13.2.4. hbase.regionserver.handler.count
13.2.5. hfile.block.cache.size
13.2.6. hbase.regionserver.global.memstore.upperLimit
13.2.7. hbase.regionserver.global.memstore.lowerLimit
13.2.8. hbase.hstore.blockingStoreFiles
13.2.9. hbase.hregion.memstore.block.multiplier
13.3. Number of Column Families
13.4. Data Clumping
13.5. Batch Loading
13.5.1. Table Creation: Pre-Creating Regions
13.6. HBase Client
13.6.1. AutoFlush
13.6.2. Scan Caching
13.6.3. Scan Attribute Selection
13.6.4. Close ResultScanners
13.6.5. Block Cache
13.6.6. Optimal Loading of Row Keys
14. Bloom Filters
14.1. Configurations
14.1.1. HColumnDescriptor option
14.1.2. io.hfile.bloom.enabled global kill switch
14.1.3. io.hfile.bloom.error.rate
14.1.4. io.hfile.bloom.max.fold
14.2. Bloom StoreFile footprint
14.2.1. BloomFilter in the StoreFile FileInfo data structure
14.2.2. BloomFilter entries in StoreFile metadata
15. Troubleshooting and Debugging HBase
15.1. General Guidelines
15.2. Logs
15.2.1. Log Locations
15.3. Tools
15.3.1. search-hadoop.com
15.3.2. tail
15.3.3. top
15.3.4. jps
15.3.5. jstack
15.3.6. OpenTSDB
15.3.7. clusterssh+top
15.4. Client
15.4.1. ScannerTimeoutException
15.5. RegionServer
15.5.1. Startup Errors
15.5.2. Runtime Errors
15.5.3. Shutdown Errors
15.6. Master
15.6.1. Startup Errors
15.6.2. Shutdown Errors
A. Tools
A.1. HBase hbck
A.2. HFile Tool
A.3. WAL Tools
A.3.1. HLog tool
A.4. Compression Tool
A.5. Node Decommission
A.5.1. Rolling Restart
B. Compression In HBase
B.1. CompressionTest Tool
B.2. hbase.regionserver.codecs
B.3. LZO
B.4. GZIP
C. FAQ
D. YCSB: The Yahoo! Cloud Serving Benchmark and HBase
Index

List of Tables

11.1. Table webtable
11.2. ColumnFamily anchor
11.3. ColumnFamily contents