HBase has two run modes: Section 2.2.1, “Standalone HBase” and Section 2.2.2, “Distributed”. Out of the box, HBase runs in standalone mode. Whatever your mode,
you will need to configure HBase by editing files in the HBase conf
directory. At a minimum, you must edit conf/hbase-env.sh
to tell HBase which
java to use. In this file you set HBase environment variables such as the
heapsize and other options for the JVM, the preferred location for
log files, etc. Set JAVA_HOME
to point at the root of your
java install.
This is the default mode. Standalone mode is what is described in the Section 1.2, “Quick Start” section. In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.
Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a pseudo-distributed-- and fully-distributed where the daemons are spread across all nodes in the cluster [10].
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the Hadoop Distributed File System (HDFS). Fully-distributed mode can ONLY run on HDFS. See the Hadoop requirements and instructions for how to set up HDFS.
Below we describe the different distributed setups. Starting, verification and exploration of your install, whether a pseudo-distributed or fully-distributed configuration is described in a section that follows, Section 2.2.3, “Running and Confirming Your Installation”. The same verification script applies to both deploy types.
A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.
First, if you want to run on HDFS rather than on the local filesystem, setup your HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc; the hadoop site doesn't have any any more). Ensure you have a working HDFS before proceeding.
Next, configure HBase. Edit conf/hbase-site.xml
. This is the file
into which you add local customizations and overrides. At a minimum, you must tell HBase
to run in (pseudo-)distributed mode rather than in default standalone mode. To do this,
set the hbase.cluster.distributed
property to true (Its default is
false
). The absolute bare-minimum hbase-site.xml
is therefore as follows:
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
With this configuration, HBase will start up an HBase Master process, a ZooKeeper
server, and a RegionServer process running against the local filesystem writing to
wherever your operating system stores temporary files into a directory named
hbase-YOUR_USER_NAME
.
Such a setup, using the local filesystem and writing to the operating systems's temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what HBase uses when it is writing the local filesytem -- would lose data unless the system was shutdown properly in versions of HBase before 0.98.4 and 1.0.0 (see HBASE-11218 Data loss in HBase standalone mode). Writing to the operating system's temporary directory can also make for data loss when the machine is restarted as this directory is usually cleared on reboot. For a more permanent setup, see the next example where we make use of an instance of HDFS; HBase data will be written to the Hadoop distributed filesystem rather than to the local filesystem's tmp directory.
In this conf/hbase-site.xml
example, the
hbase.rootdir
property points to the local HDFS instance homed on the
node h-24-30.example.com
.
${hbase.rootdir}
Let HBase create the hbase.rootdir
directory. If you don't,
you'll get warning saying HBase needs a migration run because the directory is missing
files expected by HBase (it'll create them if you let it).
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
Now skip to Section 2.2.3, “Running and Confirming Your Installation” for how to start and verify your pseudo-distributed install. [11]
To start up the initial HBase cluster...
% bin/start-hbase.sh
To start up an extra backup master(s) on the same server run...
% bin/local-master-backup.sh start 1
... the '1' means use ports 16001 & 16011, and this backup master's logfile
will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log
.
To startup multiple backup masters run...
% bin/local-master-backup.sh start 2 3
You can start up to 9 backup masters (10 total).
To start up more regionservers...
% bin/local-regionservers.sh start 1
... where '1' means use ports 16201 & 16301 and its logfile will be at
`logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log
.
To add 4 more regionservers in addition to the one you just started by running...
% bin/local-regionservers.sh start 2 3 4 5
This supports up to 99 extra regionservers (100 total).
For running a fully-distributed operation on more than one host, make the following
configurations. In hbase-site.xml
, add the property
hbase.cluster.distributed
and set it to true
and
point the HBase hbase.rootdir
at the appropriate HDFS NameNode and
location in HDFS where you would like HBase to write data. For example, if you namenode
were running at namenode.example.org on port 8020 and you wanted to home your HBase in
HDFS at /hbase
, make the following configuration.
<configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://namenode.example.org:8020/hbase</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description> </property> ... </configuration>
In addition, a fully-distributed mode requires that you modify
conf/regionservers
. The Section 2.4.1.2, “regionservers
” file lists all hosts that you would have running
HRegionServers, one host per line (This file in HBase is
like the Hadoop slaves
file). All servers listed in this file will
be started and stopped when HBase cluster start or stop is run.
See section Chapter 18, ZooKeeper for ZooKeeper setup for HBase.
Of note, if you have made HDFS client configuration on your Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to server-side configurations -- HBase will not see this configuration unless you do one of the following:
Add a pointer to your HADOOP_CONF_DIR
to the
HBASE_CLASSPATH
environment variable in
hbase-env.sh
.
Add a copy of hdfs-site.xml
(or
hadoop-site.xml
) or, better, symlinks, under
${HBASE_HOME}/conf
, or
if only a small set of HDFS client configurations, add them to
hbase-site.xml
.
An example of such an HDFS client configuration is
dfs.replication
. If for example, you want to run with a replication
factor of 5, hbase will create files with the default of 3 unless you do the above to
make the configuration available to HBase.
Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running
bin/start-hdfs.sh
over in the HADOOP_HOME
directory. You can ensure it started properly by testing the put and
get of files into the Hadoop filesystem. HBase does not normally use
the mapreduce daemons. These do not need to be started.
If you are managing your own ZooKeeper, start it and confirm its running else, HBase will start up ZooKeeper for you as part of its start process.
Start HBase with the following command:
bin/start-hbase.sh
Run the above from the HBASE_HOME
directory.
You should now have a running HBase instance. HBase logs can be found in the
logs
subdirectory. Check them out especially if HBase had trouble
starting.
HBase also puts up a UI listing vital attributes. By default its deployed on the Master
host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an
informational http server at 16030). If the Master were running on a host named
master.example.org
on the default port, to see the Master's homepage
you'd point your browser at http://master.example.org:16010
.
Prior to HBase 0.98, the default ports the master ui was deployed on port 16010, and the HBase RegionServers would listen on port 16020 by default and put up an informational http server at 16030.
Once HBase has started, see the Section 1.2.3, “Shell Exercises” for how to create tables, add data, scan your insertions, and finally disable and drop your tables.
To stop HBase after exiting the HBase shell enter
$ ./bin/stop-hbase.sh stopping hbase...............
Shutdown can take a moment to complete. It can take longer if your cluster is comprised of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.
[10] The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.
[11] See Section 2.2.2.1.1, “Pseudo-distributed Extras” for notes on how to start extra Masters and RegionServers when running pseudo-distributed.