The Apache HBase Book

Revision History
Revision 0.90.2  
Adding first cuts at Configuration, Getting Started, Data Model
Revision 0.89.20100924 5 October 2010 stack
Initial layout

Abstract

This is the official book of Apache HBase, a distributed, versioned, column-oriented database built on top of Apache Hadoop and Apache ZooKeeper.


Table of Contents

Preface
1. Getting Started
1.1. Introduction
1.2. Quick Start
1.2.1. Download and unpack the latest stable release.
1.2.2. Start HBase
1.2.3. Shell Exercises
1.2.4. Stopping HBase
1.2.5. Where to go next
1.3. Not-so-quick Start Guide
1.3.1. Requirements
1.3.2. HBase run modes: Standalone and Distributed
1.3.3. Example Configurations
2. Upgrading
2.1. Upgrading to HBase 0.90.x from 0.20.x or 0.89.x
3. Configuration
3.1. hbase-site.xml and hbase-default.xml
3.1.1. HBase Default Configuration
3.2. hbase-env.sh
3.3. log4j.properties
3.4. The Important Configurations
3.4.1. Required Configurations
3.4.2. Recommended Configuations
3.5. Client configuration and dependencies connecting to an HBase cluster
3.5.1. Java client configuration
4. The HBase Shell
4.1. Scripting
4.2. Shell Tricks
4.2.1. irbrc
4.2.2. LOG data to timestamp
4.2.3. Debug
5. Building HBase
5.1. Adding an HBase release to Apache's Maven Repository
6. HBase and MapReduce
6.1. The default HBase MapReduce Splitter
6.2. HBase Input MapReduce Example
6.3. Accessing Other HBase Tables in a MapReduce Job
7. HBase and Schema Design
7.1. Schema Creation
7.2. On the number of column families
7.3. Monotonically Increasing Row Keys/Timeseries Data
7.4. Try to minimize row and column sizes
7.5. Table Creation: Pre-Creating Regions
8. Metrics
8.1. Metric Setup
8.2. Region Server Metrics
8.2.1. hbase.regionserver.blockCacheCount
8.2.2. hbase.regionserver.blockCacheFree
8.2.3. hbase.regionserver.blockCacheHitRatio
8.2.4. hbase.regionserver.blockCacheSize
8.2.5. hbase.regionserver.fsReadLatency_avg_time
8.2.6. hbase.regionserver.fsReadLatency_num_ops
8.2.7. hbase.regionserver.fsSyncLatency_avg_time
8.2.8. hbase.regionserver.fsSyncLatency_num_ops
8.2.9. hbase.regionserver.fsWriteLatency_avg_time
8.2.10. hbase.regionserver.fsWriteLatency_num_ops
8.2.11. hbase.regionserver.memstoreSizeMB
8.2.12. hbase.regionserver.regions
8.2.13. hbase.regionserver.requests
8.2.14. hbase.regionserver.storeFileIndexSizeMB
8.2.15. hbase.regionserver.stores
9. Cluster Replication
10. Data Model
10.1. Table
10.2. Row
10.3. Column Family
10.4. Cells
10.5. Versions
10.5.1. Versions and HBase Operations
10.5.2. Current Limitations
11. Architecture
11.1. Daemons
11.1.1. Master
11.1.2. RegionServer
11.2. Regions
11.2.1. Region Size
11.2.2. Region Splits
11.2.3. Region Load Balancer
11.2.4. Store
12. The WAL
12.1. What is the purpose of the HBase WAL
12.2. WAL splitting
12.2.1. hbase.hlog.split.skip.errors
12.2.2. How EOFExceptions are treated when splitting a crashed RegionServers' WALs
13. Performance Tuning
13.1. Java
13.1.1. The Garage Collector and HBase
13.2. Configurations
13.2.1. Number of Regions
13.2.2. Managing Compactions
13.2.3. Compression
13.3. Number of Column Families
13.4. Data Clumping
13.5. Batch Loading
13.6. HBase Client
13.6.1. AutoFlush
13.6.2. Scan Caching
13.6.3. Close ResultScanners
13.6.4. Block Cache
14. Bloom Filters
14.1. Configurations
14.1.1. HColumnDescriptor option
14.1.2. io.hfile.bloom.enabled global kill switch
14.1.3. io.hfile.bloom.error.rate
14.1.4. io.hfile.bloom.max.fold
14.2. Bloom StoreFile footprint
14.2.1. BloomFilter in the StoreFile FileInfo data structure
14.2.2. BloomFilter entries in StoreFile metadata
A. Tools
A.1. HBase hbck
A.2. HFile Tool
A.3. WAL Tools
A.3.1. HLog tool
A.4. Compression Tool
B. Compression In HBase
B.1. CompressionTest Tool
B.2. hbase.regionserver.codecs
B.3. LZO
B.4. GZIP
C. FAQ
D. YCSB: The Yahoo! Cloud Serving Benchmark and HBase
Index