1.4. HBase run modes: Standalone and Distributed

HBase has two run modes: Section 1.4.1, “Standalone HBase” and Section 1.4.2, “Distributed”. Out of the box, HBase runs in standalone mode. To set up a distributed deploy, you will need to configure HBase by editing files in the HBase conf directory.

Whatever your mode, you will need to edit conf/hbase-env.sh to tell HBase which java to use. In this file you set HBase environment variables such as the heapsize and other options for the JVM, the preferred location for log files, etc. Set JAVA_HOME to point at the root of your java install.

1.4.1. Standalone HBase

This is the default mode. Standalone mode is what is described in the ??? section. In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.

1.4.2. Distributed

Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a pseudo-distributed-- and fully-distributed where the daemons are spread across all nodes in the cluster [10].

Distributed modes require an instance of the Hadoop Distributed File System (HDFS). See the Hadoop requirements and instructions for how to set up a HDFS. Before proceeding, ensure you have an appropriate, working HDFS.

Below we describe the different distributed setups. Starting, verification and exploration of your install, whether a pseudo-distributed or fully-distributed configuration is described in a section that follows, Section 1.4.3, “Running and Confirming Your Installation”. The same verification script applies to both deploy types.

1.4.2.1. Pseudo-distributed

A pseudo-distributed mode is simply a distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.

Once you have confirmed your HDFS setup, edit conf/hbase-site.xml. This is the file into which you add local customizations and overrides for <xreg></xreg> and Section 1.4.2.2.3, “HDFS Client Configuration”. Point HBase at the running Hadoop HDFS instance by setting the hbase.rootdir property. This property points HBase at the Hadoop filesystem instance to use. For example, adding the properties below to your hbase-site.xml says that HBase should use the /hbase directory in the HDFS whose namenode is at port 8020 on your local machine, and that it should run with one replica only (recommended for pseudo-distributed mode):

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:8020/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.
    </description>
  </property>
  ...
</configuration>

Note

Let HBase create the hbase.rootdir directory. If you don't, you'll get warning saying HBase needs a migration run because the directory is missing files expected by HBase (it'll create them if you let it).

Note

Above we bind to localhost. This means that a remote client cannot connect. Amend accordingly, if you want to connect from a remote location.

Now skip to Section 1.4.3, “Running and Confirming Your Installation” for how to start and verify your pseudo-distributed install. [11]

1.4.2.2. Fully-distributed

For running a fully-distributed operation on more than one host, make the following configurations. In hbase-site.xml, add the property hbase.cluster.distributed and set it to true and point the HBase hbase.rootdir at the appropriate HDFS NameNode and location in HDFS where you would like HBase to write data. For example, if you namenode were running at namenode.example.org on port 8020 and you wanted to home your HBase in HDFS at /hbase, make the following configuration.

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode.example.org:8020/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
  ...
</configuration>
1.4.2.2.1. regionservers

In addition, a fully-distributed mode requires that you modify conf/regionservers. The Section 1.7.1.2, “regionservers file lists all hosts that you would have running HRegionServers, one host per line (This file in HBase is like the Hadoop slaves file). All servers listed in this file will be started and stopped when HBase cluster start or stop is run.

1.4.2.2.2. ZooKeeper and HBase

See section Section 1.5, “ZooKeeper” for ZooKeeper setup for HBase.

1.4.2.2.3. HDFS Client Configuration

Of note, if you have made HDFS client configuration on your Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to server-side configurations -- HBase will not see this configuration unless you do one of the following:

  • Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.

  • Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or

  • if only a small set of HDFS client configurations, add them to hbase-site.xml.

An example of such an HDFS client configuration is dfs.replication. If for example, you want to run with a replication factor of 5, hbase will create files with the default of 3 unless you do the above to make the configuration available to HBase.

1.4.3. Running and Confirming Your Installation

Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the mapreduce daemons. These do not need to be started.

If you are managing your own ZooKeeper, start it and confirm its running else, HBase will start up ZooKeeper for you as part of its start process.

Start HBase with the following command:

bin/start-hbase.sh
Run the above from the HBASE_HOME directory.

You should now have a running HBase instance. HBase logs can be found in the logs subdirectory. Check them out especially if HBase had trouble starting.

HBase also puts up a UI listing vital attributes. By default its deployed on the Master host at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational http server at 60030). If the Master were running on a host named master.example.org on the default port, to see the Master's homepage you'd point your browser at http://master.example.org:60010.

Once HBase has started, see the ??? for how to create tables, add data, scan your insertions, and finally disable and drop your tables.

To stop HBase after exiting the HBase shell enter

$ ./bin/stop-hbase.sh
stopping hbase...............

Shutdown can take a moment to complete. It can take longer if your cluster is comprised of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.



[10] The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.

[11] See Pseudo-distributed mode extras for notes on how to start extra Masters and RegionServers when running pseudo-distributed.