The Apache HBase™ Reference Guide

Revision History
Revision 0.98.3-hadoop1 2014-05-31T19:27

Abstract

This is the official reference guide of Apache HBase™, a distributed, versioned, big data store built on top of Apache Hadoop™ and Apache ZooKeeper™.


Table of Contents

Preface
1. Getting Started
1.1. Introduction
1.2. Quick Start
2. Apache HBase Configuration
2.1. Basic Prerequisites
2.2. HBase run modes: Standalone and Distributed
2.3. Configuration Files
2.4. Example Configurations
2.5. The Important Configurations
3. Upgrading
3.1. HBase version numbers
3.2. Upgrading from 0.96.x to 0.98.x
3.3. Upgrading from 0.94.x to 0.98.x
3.4. Upgrading from 0.94.x to 0.96.x
3.5. Upgrading from 0.92.x to 0.94.x
3.6. Upgrading from 0.90.x to 0.92.x
3.7. Upgrading to HBase 0.90.x from 0.20.x or 0.89.x
4. The Apache HBase Shell
4.1. Scripting
4.2. Shell Tricks
5. Data Model
5.1. Conceptual View
5.2. Physical View
5.3. Namespace
5.4. Table
5.5. Row
5.6. Column Family
5.7. Cells
5.8. Data Model Operations
5.9. Versions
5.10. Sort Order
5.11. Column Metadata
5.12. Joins
5.13. ACID
6. HBase and Schema Design
6.1. Schema Creation
6.2. On the number of column families
6.3. Rowkey Design
6.4. Number of Versions
6.5. Supported Datatypes
6.6. Joins
6.7. Time To Live (TTL)
6.8. Keeping Deleted Cells
6.9. Secondary Indexes and Alternate Query Paths
6.10. Constraints
6.11. Schema Design Case Studies
6.12. Operational and Performance Configuration Options
7. HBase and MapReduce
7.1. HBase, MapReduce, and the CLASSPATH
7.2. Bundled HBase MapReduce Jobs
7.3. HBase as a MapReduce Job Data Source and Data Sink
7.4. Writing HFiles Directly During Bulk Import
7.5. RowCounter Example
7.6. Map-Task Splitting
7.7. HBase MapReduce Examples
7.8. Accessing Other HBase Tables in a MapReduce Job
7.9. Speculative Execution
8. Secure Apache HBase
8.1. Secure Client Access to Apache HBase
8.2. Simple User Access to Apache HBase
8.3. Tags
8.4. Access Control
8.5. Secure Bulk Load
8.6. Visibility Labels
8.7. Transparent Server Side Encryption
9. Architecture
9.1. Overview
9.2. Catalog Tables
9.3. Client
9.4. Client Request Filters
9.5. Master
9.6. RegionServer
9.7. Regions
9.8. Bulk Loading
9.9. HDFS
10. Apache HBase External APIs
10.1. Non-Java Languages Talking to the JVM
10.2. REST
10.3. Thrift
10.4. C/C++ Apache HBase Client
11. Thrift API and Filter Language
11.1. Filter Language
12. Apache HBase Coprocessors
13. Apache HBase Performance Tuning
13.1. Operating System
13.2. Network
13.3. Java
13.4. HBase Configurations
13.5. ZooKeeper
13.6. Schema Design
13.7. HBase General Patterns
13.8. Writing to HBase
13.9. Reading from HBase
13.10. Deleting from HBase
13.11. HDFS
13.12. Amazon EC2
13.13. Collocating HBase and MapReduce
13.14. Case Studies
14. Troubleshooting and Debugging Apache HBase
14.1. General Guidelines
14.2. Logs
14.3. Resources
14.4. Tools
14.5. Client
14.6. MapReduce
14.7. NameNode
14.8. Network
14.9. RegionServer
14.10. Master
14.11. ZooKeeper
14.12. Amazon EC2
14.13. HBase and Hadoop version issues
14.14. Running unit or integration tests
14.15. Case Studies
14.16. Cryptographic Features
15. Apache HBase Case Studies
15.1. Overview
15.2. Schema Design
15.3. Performance/Troubleshooting
16. Apache HBase Operational Management
16.1. HBase Tools and Utilities
16.2. Region Management
16.3. Node Management
16.4. HBase Metrics
16.5. HBase Monitoring
16.6. Cluster Replication
16.7. HBase Backup
16.8. HBase Snapshots
16.9. Capacity Planning and Region Sizing
16.10. Table Rename
17. Building and Developing Apache HBase
17.1. Apache HBase Repositories
17.2. IDEs
17.3. Building Apache HBase
17.4. Releasing Apache HBase
17.5. Generating the HBase Reference Guide
17.6. Updating hbase.apache.org
17.7. Tests
17.8. Maven Build Commands
17.9. Getting Involved
17.10. Developing
17.11. Submitting Patches
18. ZooKeeper
18.1. Using existing ZooKeeper ensemble
18.2. SASL Authentication with ZooKeeper
19. Community
19.1. Decisions
19.2. Community Roles
19.3. Commit Message format
A. FAQ
B. hbck In Depth
B.1. Running hbck to identify inconsistencies
B.2. Inconsistencies
B.3. Localized repairs
B.4. Region Overlap Repairs
C. Compression In HBase
C.1. CompressionTest Tool
C.2. hbase.regionserver.codecs
C.3. LZO
C.4. GZIP
C.5. SNAPPY
C.6. Changing Compression Schemes
D. YCSB: The Yahoo! Cloud Serving Benchmark and HBase
E. HFile format version 2
E.1. Motivation
E.2. HFile format version 1 overview
E.3. HBase file format with inline blocks (version 2)
F. Other Information About HBase
F.1. HBase Videos
F.2. HBase Presentations (Slides)
F.3. HBase Papers
F.4. HBase Sites
F.5. HBase Books
F.6. Hadoop Books
G. HBase History
H. HBase and the Apache Software Foundation
H.1. ASF Development Process
H.2. ASF Board Reporting
I. Enabling Dapper-like Tracing in HBase
I.1. SpanReceivers
I.2. Client Modifications
I.3. Tracing from HBase Shell
J. 0.95 RPC Specification
J.1. Goals
J.2. TODO
J.3. RPC
J.4. Notes
Index

List of Tables

2.1. Hadoop version support matrix
5.1. Table webtable
5.2. ColumnFamily anchor
5.3. ColumnFamily contents
8.1. Operation To Permission Mapping
9.1. Parameters Used by Compaction Algorithm

List of Examples

5.1. Examples
5.2. Examples
8.1. Grant
8.2. Revoke
8.3. Alter
8.4. User Permission
9.1. Pre-Creating a HConnection
11.1. Compound Operators
11.2. Precedence Example
11.3. Example 1
11.4. Example 2
11.5. Example 3
11.6. Example 4
comments powered by Disqus