Chapter 12. HBase Operational Management

Table of Contents

12.1. HBase Tools and Utilities
12.1.1. HBase hbck
12.1.2. HFile Tool
12.1.3. WAL Tools
12.1.4. Compression Tool
12.1.5. CopyTable
12.1.6. Export
12.1.7. Import
12.1.8. WALPlayer
12.1.9. RowCounter
12.2. Region Management
12.2.1. Major Compaction
12.2.2. Merge
12.3. Node Management
12.3.1. Node Decommission
12.3.2. Rolling Restart
12.4. Metrics
12.4.1. Metric Setup
12.4.2. RegionServer Metrics
12.5. HBase Monitoring
12.5.1. Slow Query Log
12.6. Cluster Replication
12.7. HBase Backup
12.7.1. Full Shutdown Backup
12.7.2. Live Cluster Backup - Replication
12.7.3. Live Cluster Backup - CopyTable
12.7.4. Live Cluster Backup - Export
12.8. Capacity Planning
12.8.1. Storage
12.8.2. Regions
This chapter will cover operational tools and practices required of a running HBase cluster. The subject of operations is related to the topics of Chapter 11, Troubleshooting and Debugging HBase, Chapter 10, Performance Tuning, and Chapter 2, Configuration but is a distinct topic in itself.

12.1. HBase Tools and Utilities

Here we list HBase tools for administration, analysis, fixup, and debugging.

12.1.1. HBase hbck

An fsck for your HBase install

To run hbck against your HBase cluster run

$ ./bin/hbase hbck

At the end of the commands output it prints OK or INCONSISTENCY. If your cluster reports inconsistencies, pass -details to see more detail emitted. If inconsistencies, run hbck a few times because the inconsistency may be transient (e.g. cluster is starting up or a region is splitting). Passing -fix may correct the inconsistency (This latter is an experimental feature).

12.1.2. HFile Tool

See Section 8.7.5.2.2, “HFile Tool”.

12.1.3. WAL Tools

12.1.3.1. HLog tool

The main method on HLog offers manual split and dump facilities. Pass it WALs or the product of a split, the content of the recovered.edits. directory.

You can get a textual dump of a WAL file content by doing the following:

 $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012 

The return code will be non-zero if issues with the file so you can test wholesomeness of file by redirecting STDOUT to /dev/null and testing the program return.

Similarly you can force a split of a log file directory by doing:

 $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/

12.1.4. Compression Tool

See Section 12.1.4, “Compression Tool”.

12.1.5. CopyTable

CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename

Options:

  • starttime Beginning of the time range. Without endtime means starttime to forever.
  • endtime End of the time range. Without endtime means starttime to forever.
  • versions Number of cell versions to copy.
  • new.name New table's name.
  • peer.adr Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
  • families Comma-separated list of ColumnFamilies to copy.
  • all.cells Also copy delete markers and uncollected deleted cells (advanced option).

Args:

  • tablename Name of table to copy.

Example of copying 'TestTable' to a cluster that uses replication for a 1 hour window:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--starttime=1265875194289 --endtime=1265878794289
--peer.adr=server1,server2,server3:2181:/hbase TestTable

Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration.

12.1.6. Export

Export is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration.

12.1.7. Import

Import is a utility that will load data that has been exported back into HBase. Invoke via:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

12.1.8. WALPlayer

WALPlayer is a utility to replay WAL files into HBase.

The WAL can be replayed for a set of tables or all tables, and a timerange can be provided (in milliseconds). The WAL is filtered to this set of tables. The output can optionally be mapped to another set of tables.

WALPlayer can also generate HFiles for later bulk importing, in that case only a single table and no mapping can be specified.

Invoke via:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer [options] <wal inputdir> <tables> [<tableMappings>]>

For example:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2

12.1.9. RowCounter

RowCounter is a utility that will count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.

$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]

Note: caching for the input Scan is configured via hbase.client.scanner.caching in the job configuration.