Apache HBase Reference Guide

Revision History
Revision 0.94.2 2012-10-07T19:31

Abstract

This is the official reference guide of Apache HBase, a distributed, versioned, column-oriented database built on top of Apache Hadoop and Apache ZooKeeper.


Table of Contents

Preface
1. Getting Started
1.1. Introduction
1.2. Quick Start
2. Configuration
2.1. Java
2.2. Operating System
2.3. Hadoop
2.4. HBase run modes: Standalone and Distributed
2.5. ZooKeeper
2.6. Configuration Files
2.7. Example Configurations
2.8. The Important Configurations
2.9. Bloom Filter Configuration
3. Upgrading
3.1. Upgrading to HBase 0.90.x from 0.20.x or 0.89.x
3.2. Upgrading from 0.90.x to 0.92.x
4. The HBase Shell
4.1. Scripting
4.2. Shell Tricks
5. Data Model
5.1. Conceptual View
5.2. Physical View
5.3. Table
5.4. Row
5.5. Column Family
5.6. Cells
5.7. Data Model Operations
5.8. Versions
5.9. Sort Order
5.10. Column Metadata
5.11. Joins
6. HBase and Schema Design
6.1. Schema Creation
6.2. On the number of column families
6.3. Rowkey Design
6.4. Number of Versions
6.5. Supported Datatypes
6.6. Joins
6.7. Time To Live (TTL)
6.8. Keeping Deleted Cells
6.9. Secondary Indexes and Alternate Query Paths
6.10. Schema Design Smackdown
6.11. Operational and Performance Configuration Options
6.12. Constraints
7. HBase and MapReduce
7.1. Map-Task Spitting
7.2. HBase MapReduce Examples
7.3. Accessing Other HBase Tables in a MapReduce Job
7.4. Speculative Execution
8. Architecture
8.1. Overview
8.2. Catalog Tables
8.3. Client
8.4. Client Request Filters
8.5. Master
8.6. RegionServer
8.7. Regions
8.8. HDFS
9. External APIs
9.1. Non-Java Languages Talking to the JVM
9.2. REST
9.3. Thrift
10. Performance Tuning
10.1. Operating System
10.2. Network
10.3. Java
10.4. HBase Configurations
10.5. ZooKeeper
10.6. Schema Design
10.7. Writing to HBase
10.8. Reading from HBase
10.9. Deleting from HBase
10.10. HDFS
10.11. Amazon EC2
11. Troubleshooting and Debugging HBase
11.1. General Guidelines
11.2. Logs
11.3. Resources
11.4. Tools
11.5. Client
11.6. MapReduce
11.7. NameNode
11.8. Network
11.9. RegionServer
11.10. Master
11.11. ZooKeeper
11.12. Amazon EC2
11.13. HBase and Hadoop version issues
12. HBase Operational Management
12.1. HBase Tools and Utilities
12.2. Region Management
12.3. Node Management
12.4. Metrics
12.5. HBase Monitoring
12.6. Cluster Replication
12.7. HBase Backup
12.8. Capacity Planning
13. Building and Developing HBase
13.1. HBase Repositories
13.2. IDEs
13.3. Building HBase
13.4. Tests
13.5. Maven Build Commands
13.6. Getting Involved
13.7. Developing
13.8. Submitting Patches
A. FAQ
B. Compression In HBase
B.1. CompressionTest Tool
B.2. hbase.regionserver.codecs
B.3. LZO
B.4. GZIP
B.5. SNAPPY
C. YCSB: The Yahoo! Cloud Serving Benchmark and HBase
D. HFile format version 2
D.1. Motivation
D.2. HFile format version 1 overview
D.3. HBase file format with inline blocks (version 2)
E. Other Information About HBase
E.1. HBase Videos
E.2. HBase Presentations (Slides)
E.3. HBase Papers
E.4. HBase Sites
E.5. HBase Books
E.6. Hadoop Books
F. HBase and the Apache Software Foundation
F.1. ASF Development Process
F.2. ASF Board Reporting
Index

List of Tables

5.1. Table webtable
5.2. ColumnFamily anchor
5.3. ColumnFamily contents