Appendix B. FAQ

B.1. General
When should I use HBase?
Are there other HBase FAQs?
Does HBase support SQL?
How does HBase work on top of HDFS?
Can I change a table's rowkeys?
B.2. Amazon EC2
I am running HBase on Amazon EC2 and...
B.3. Building HBase
When I build, why do I always get Unable to find resource 'VM_global_library.vm'?
B.4. Runtime
I'm having problems with my HBase cluster, how can I troubleshoot it?
How can I improve HBase cluster performance?
Why are logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor' messages?
B.5. How do I...?
Secondary Indexes in HBase?
Store (fill in the blank) in HBase?
Back up my HBase Cluster?
Get a column 'slice': i.e. I have a million columns in my row but I only want to look at columns bbbb-bbbd?

B.1. General

When should I use HBase?
Are there other HBase FAQs?
Does HBase support SQL?
How does HBase work on top of HDFS?
Can I change a table's rowkeys?

When should I use HBase?

Anybody can download and give HBase a spin, even on a laptop. The scope of this answer is when would it be best to use HBase in a real deployment.

First, make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode. Second, make sure you have enough data. HBase isn't suitable for every problem. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Are there other HBase FAQs?

See the FAQ that is up on the wiki, HBase Wiki FAQ.

Does HBase support SQL?

Not really. SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the Chapter 5, Data Model section for examples on the HBase client.

How does HBase work on top of HDFS?

HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. See the Chapter 5, Data Model and Chapter 8, Architecture sections for more information on how HBase achieves its goals.

Can I change a table's rowkeys?

No. See Section 6.3.5, “Immutability of Rowkeys”.

B.2. Amazon EC2

I am running HBase on Amazon EC2 and...

I am running HBase on Amazon EC2 and...

See Troubleshooting Section 11.9, “Amazon EC2” and Performance Section 10.10, “Amazon EC2” sections.

B.3. Building HBase

When I build, why do I always get Unable to find resource 'VM_global_library.vm'?

When I build, why do I always get Unable to find resource 'VM_global_library.vm'?

Ignore it. Its not an error. It is officially ugly though.

B.4. Runtime

I'm having problems with my HBase cluster, how can I troubleshoot it?
How can I improve HBase cluster performance?
Why are logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor' messages?

I'm having problems with my HBase cluster, how can I troubleshoot it?

See Chapter 11, Troubleshooting and Debugging HBase.

How can I improve HBase cluster performance?

See Chapter 10, Performance Tuning.

Why are logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor' messages?

Because we are not using the native versions of compression libraries. See HBASE-1900 Put back native support when hadoop 0.21 is released. Copy the native libs from hadoop under hbase lib dir or symlink them into place and the message should go away.

B.5. How do I...?

Secondary Indexes in HBase?
Store (fill in the blank) in HBase?
Back up my HBase Cluster?
Get a column 'slice': i.e. I have a million columns in my row but I only want to look at columns bbbb-bbbd?

Secondary Indexes in HBase?

See Section 6.8, “ Secondary Indexes and Alternate Query Paths ”

Store (fill in the blank) in HBase?

See Section 6.5, “ Supported Datatypes ”.

Back up my HBase Cluster?

See Section 12.6, “HBase Backup”

Get a column 'slice': i.e. I have a million columns in my row but I only want to look at columns bbbb-bbbd?

See org.apache.hadoop.hbase.filter.ColumnRangeFilter