org.apache.hadoop.hbase
Class HRegion

java.lang.Object
  extended by org.apache.hadoop.hbase.HRegion
All Implemented Interfaces:
HConstants

public class HRegion
extends Object
implements HConstants

HRegion stores data for a certain region of a table. It stores all columns for each row. A given table consists of one or more HRegions.

We maintain multiple HStores for a single HRegion.

An HStore is a set of rows with some column data; together, they make up all the data for the rows.

Each HRegion has a 'startKey' and 'endKey'.

The first is inclusive, the second is exclusive (except for the final region) The endKey of region 0 is the same as startKey for region 1 (if it exists). The startKey for the first region is null. The endKey for the final region is null.

Locking at the HRegion level serves only one purpose: preventing the region from being closed (and consequently split) while other operations are ongoing. Each row level operation obtains both a row lock and a region read lock for the duration of the operation. While a scanner is being constructed, getScanner holds a read lock. If the scanner is successfully constructed, it holds a read lock until it is closed. A close takes out a write lock and consequently will block for ongoing operations and will block new operations from starting while the close is in progress.

An HRegion is defined by its table and its key extent.

It consists of at least one HStore. The number of HStores should be configurable, so that data which is accessed together is stored in the same HStore. Right now, we approximate that by building a single HStore for each column family. (This config info will be communicated via the tabledesc.)

The HTableDescriptor contains metainfo about the HRegion's table. regionName is a unique identifier for this HRegion. (startKey, endKey] defines the keyspace for this HRegion.


Field Summary
 
Fields inherited from interface org.apache.hadoop.hbase.HConstants
ALL_META_COLUMNS, ALL_VERSIONS, COL_REGIONINFO, COL_REGIONINFO_ARRAY, COL_SERVER, COL_SPLITA, COL_SPLITB, COL_STARTCODE, COLUMN_FAMILY, COLUMN_FAMILY_ARRAY, COLUMN_FAMILY_STR, DEFAULT_HOST, DEFAULT_MASTER_ADDRESS, DEFAULT_MASTER_INFOPORT, DEFAULT_MASTER_PORT, DEFAULT_MAX_FILE_SIZE, DEFAULT_REGION_SERVER_CLASS, DEFAULT_REGIONSERVER_ADDRESS, DEFAULT_REGIONSERVER_INFOPORT, EMPTY_START_ROW, EMPTY_TEXT, FILE_SYSTEM_VERSION, HBASE_DIR, HREGION_LOGDIR_NAME, HREGION_OLDLOGFILE_NAME, LAST_ROW, LATEST_TIMESTAMP, MASTER_ADDRESS, META_TABLE_NAME, REGION_SERVER_CLASS, REGIONSERVER_ADDRESS, ROOT_TABLE_NAME, THREAD_WAKE_FREQUENCY, UTF8_ENCODING, VERSION_FILE_NAME
 
Constructor Summary
HRegion(org.apache.hadoop.fs.Path basedir, HLog log, org.apache.hadoop.fs.FileSystem fs, HBaseConfiguration conf, HRegionInfo regionInfo, org.apache.hadoop.fs.Path initialFiles, FlushRequester requester)
          HRegion constructor.
HRegion(org.apache.hadoop.fs.Path basedir, HLog log, org.apache.hadoop.fs.FileSystem fs, HBaseConfiguration conf, HRegionInfo regionInfo, org.apache.hadoop.fs.Path initialFiles, FlushRequester requester, org.apache.hadoop.util.Progressable reporter)
          HRegion constructor.
 
Method Summary
static void addRegionToMETA(HRegion meta, HRegion r)
          Inserts a new region's meta information into the passed meta region.
 void batchUpdate(long timestamp, BatchUpdate b)
           
 List<HStoreFile> close()
          Close down this HRegion.
 boolean compactStores()
          Compact all the stores.
static HRegion createHRegion(HRegionInfo info, org.apache.hadoop.fs.Path rootDir, HBaseConfiguration conf)
          Convenience method creating new HRegions.
 void deleteAll(org.apache.hadoop.io.Text row, long ts)
          Delete all cells of the same age as the passed timestamp or older.
 void deleteAll(org.apache.hadoop.io.Text row, org.apache.hadoop.io.Text column, long ts)
          Delete all cells of the same age as the passed timestamp or older.
 void deleteFamily(org.apache.hadoop.io.Text row, org.apache.hadoop.io.Text family, long timestamp)
          Delete all cells for a row with matching column family with timestamps less than or equal to timestamp.
 boolean equals(Object o)
          
 byte[] get(org.apache.hadoop.io.Text row, org.apache.hadoop.io.Text column)
          Fetch a single data item.
 byte[][] get(org.apache.hadoop.io.Text row, org.apache.hadoop.io.Text column, int numVersions)
          Fetch multiple versions of a single data item
 byte[][] get(org.apache.hadoop.io.Text row, org.apache.hadoop.io.Text column, long timestamp, int numVersions)
          Fetch multiple versions of a single data item, with timestamp.
 Map<org.apache.hadoop.io.Text,byte[]> getClosestRowBefore(org.apache.hadoop.io.Text row)
          Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.
 org.apache.hadoop.io.Text getEndKey()
           
 Map<org.apache.hadoop.io.Text,byte[]> getFull(org.apache.hadoop.io.Text row)
          Fetch all the columns for the indicated row.
 Map<org.apache.hadoop.io.Text,byte[]> getFull(org.apache.hadoop.io.Text row, long ts)
          Fetch all the columns for the indicated row at a specified timestamp.
 long getLastFlushTime()
           
 HLog getLog()
           
static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path rootdir, HRegionInfo info)
          Computes the Path of the HRegion
 HRegionInfo getRegionInfo()
           
 org.apache.hadoop.io.Text getRegionName()
           
 HScannerInterface getScanner(org.apache.hadoop.io.Text[] cols, org.apache.hadoop.io.Text firstRow, long timestamp, RowFilterInterface filter)
          Return an iterator that scans over the HRegion, returning the indicated columns for only the rows that match the data filter.
 org.apache.hadoop.io.Text getStartKey()
           
 int hashCode()
          
 boolean isClosed()
           
static void makeColumnFamilyDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path basedir, String encodedRegionName, org.apache.hadoop.io.Text colFamily, HTableDescriptor tabledesc)
          Make the directories for a specific column family
static HRegion merge(HRegion a, HRegion b)
          Merge two regions whether they are adjacent or not.
static HRegion openHRegion(HRegionInfo info, org.apache.hadoop.fs.Path rootDir, HLog log, HBaseConfiguration conf)
          Convenience method to open a HRegion outside of an HRegionServer context.
static boolean rowIsInRange(HRegionInfo info, org.apache.hadoop.io.Text row)
          Determines if the specified row is within the row range specified by the specified HRegionInfo
 String toString()
          
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HRegion

public HRegion(org.apache.hadoop.fs.Path basedir,
               HLog log,
               org.apache.hadoop.fs.FileSystem fs,
               HBaseConfiguration conf,
               HRegionInfo regionInfo,
               org.apache.hadoop.fs.Path initialFiles,
               FlushRequester requester)
        throws IOException
HRegion constructor.

Parameters:
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
basedir - qualified path of directory where region should be located, usually the table directory.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region
initialFiles - If there are initial files (implying that the HRegion is new), then read them from the supplied path.
requester - an object that implements FlushRequester or null
Throws:
IOException

HRegion

public HRegion(org.apache.hadoop.fs.Path basedir,
               HLog log,
               org.apache.hadoop.fs.FileSystem fs,
               HBaseConfiguration conf,
               HRegionInfo regionInfo,
               org.apache.hadoop.fs.Path initialFiles,
               FlushRequester requester,
               org.apache.hadoop.util.Progressable reporter)
        throws IOException
HRegion constructor.

Parameters:
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
basedir - qualified path of directory where region should be located, usually the table directory.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region
initialFiles - If there are initial files (implying that the HRegion is new), then read them from the supplied path.
requester - an object that implements FlushRequester or null
reporter - Call on a period so hosting server can report we're making progress to master -- otherwise master might think region deploy failed. Can be null.
Throws:
IOException
Method Detail

merge

public static HRegion merge(HRegion a,
                            HRegion b)
                     throws IOException
Merge two regions whether they are adjacent or not.

Parameters:
a - region a
b - region b
Returns:
new merged region
Throws:
IOException

getRegionInfo

public HRegionInfo getRegionInfo()
Returns:
a HRegionInfo object for this region

isClosed

public boolean isClosed()
Returns:
true if region is closed

close

public List<HStoreFile> close()
                       throws IOException
Close down this HRegion. Flush the cache, shut down each HStore, don't service any more calls.

This method could take some time to execute, so don't call it from a time-sensitive thread.

Returns:
Vector of all the storage files that the HRegion's component HStores make use of. It's a list of all HStoreFile objects. Returns empty vector if already closed and null if judged that it should not close.
Throws:
IOException

getStartKey

public org.apache.hadoop.io.Text getStartKey()
Returns:
start key for region

getEndKey

public org.apache.hadoop.io.Text getEndKey()
Returns:
end key for region

getRegionName

public org.apache.hadoop.io.Text getRegionName()
Returns:
region name

getLog

public HLog getLog()
Returns:
HLog in use for this region

getLastFlushTime

public long getLastFlushTime()
Returns:
the last time the region was flushed

compactStores

public boolean compactStores()
                      throws IOException
Compact all the stores. This should be called periodically to make sure the stores are kept manageable.

This operation could block for a long time, so don't call it from a time-sensitive thread.

Note that no locking is necessary at this level because compaction only conflicts with a region split, and that cannot happen because the region server does them sequentially and not in parallel.

Returns:
Returns TRUE if a compaction. FALSE, if no compaction.
Throws:
IOException

get

public byte[] get(org.apache.hadoop.io.Text row,
                  org.apache.hadoop.io.Text column)
           throws IOException
Fetch a single data item.

Parameters:
row -
column -
Returns:
column value
Throws:
IOException

get

public byte[][] get(org.apache.hadoop.io.Text row,
                    org.apache.hadoop.io.Text column,
                    int numVersions)
             throws IOException
Fetch multiple versions of a single data item

Parameters:
row -
column -
numVersions -
Returns:
array of values one element per version
Throws:
IOException

get

public byte[][] get(org.apache.hadoop.io.Text row,
                    org.apache.hadoop.io.Text column,
                    long timestamp,
                    int numVersions)
             throws IOException
Fetch multiple versions of a single data item, with timestamp.

Parameters:
row -
column -
timestamp -
numVersions -
Returns:
array of values one element per version that matches the timestamp
Throws:
IOException

getFull

public Map<org.apache.hadoop.io.Text,byte[]> getFull(org.apache.hadoop.io.Text row)
                                              throws IOException
Fetch all the columns for the indicated row. Returns a TreeMap that maps column names to values. We should eventually use Bloom filters here, to reduce running time. If the database has many column families and is very sparse, then we could be checking many files needlessly. A small Bloom for each row would help us determine which column groups are useful for that row. That would let us avoid a bunch of disk activity.

Parameters:
row -
Returns:
Map values
Throws:
IOException

getFull

public Map<org.apache.hadoop.io.Text,byte[]> getFull(org.apache.hadoop.io.Text row,
                                                     long ts)
                                              throws IOException
Fetch all the columns for the indicated row at a specified timestamp. Returns a TreeMap that maps column names to values. We should eventually use Bloom filters here, to reduce running time. If the database has many column families and is very sparse, then we could be checking many files needlessly. A small Bloom for each row would help us determine which column groups are useful for that row. That would let us avoid a bunch of disk activity.

Parameters:
row -
ts -
Returns:
Map values
Throws:
IOException

getClosestRowBefore

public Map<org.apache.hadoop.io.Text,byte[]> getClosestRowBefore(org.apache.hadoop.io.Text row)
                                                          throws IOException
Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.

Parameters:
row - row key
Returns:
map of values
Throws:
IOException

getScanner

public HScannerInterface getScanner(org.apache.hadoop.io.Text[] cols,
                                    org.apache.hadoop.io.Text firstRow,
                                    long timestamp,
                                    RowFilterInterface filter)
                             throws IOException
Return an iterator that scans over the HRegion, returning the indicated columns for only the rows that match the data filter. This Iterator must be closed by the caller.

Parameters:
cols - columns to scan. If column name is a column family, all columns of the specified column family are returned. Its also possible to pass a regex in the column qualifier. A column qualifier is judged to be a regex if it contains at least one of the following characters: \+|^&*$[]]}{)(.
firstRow - row which is the starting point of the scan
timestamp - only return rows whose timestamp is <= this value
filter - row filter
Returns:
HScannerInterface
Throws:
IOException

batchUpdate

public void batchUpdate(long timestamp,
                        BatchUpdate b)
                 throws IOException
Parameters:
timestamp -
b -
Throws:
IOException

deleteAll

public void deleteAll(org.apache.hadoop.io.Text row,
                      org.apache.hadoop.io.Text column,
                      long ts)
               throws IOException
Delete all cells of the same age as the passed timestamp or older.

Parameters:
row -
column -
ts - Delete all entries that have this timestamp or older
Throws:
IOException

deleteAll

public void deleteAll(org.apache.hadoop.io.Text row,
                      long ts)
               throws IOException
Delete all cells of the same age as the passed timestamp or older.

Parameters:
row -
ts - Delete all entries that have this timestamp or older
Throws:
IOException

deleteFamily

public void deleteFamily(org.apache.hadoop.io.Text row,
                         org.apache.hadoop.io.Text family,
                         long timestamp)
                  throws IOException
Delete all cells for a row with matching column family with timestamps less than or equal to timestamp.

Parameters:
row - The row to operate on
family - The column family to match
timestamp - Timestamp to match
Throws:
IOException

equals

public boolean equals(Object o)

Overrides:
equals in class Object

hashCode

public int hashCode()

Overrides:
hashCode in class Object

toString

public String toString()

Overrides:
toString in class Object

createHRegion

public static HRegion createHRegion(HRegionInfo info,
                                    org.apache.hadoop.fs.Path rootDir,
                                    HBaseConfiguration conf)
                             throws IOException
Convenience method creating new HRegions. Used by createTable and by the bootstrap code in the HMaster constructor. Note, this method creates an HLog for the created region. It needs to be closed explicitly. Use getLog() to get access.

Parameters:
info - Info for region to create.
rootDir - Root directory for HBase instance
conf -
Returns:
new HRegion
Throws:
IOException

openHRegion

public static HRegion openHRegion(HRegionInfo info,
                                  org.apache.hadoop.fs.Path rootDir,
                                  HLog log,
                                  HBaseConfiguration conf)
                           throws IOException
Convenience method to open a HRegion outside of an HRegionServer context.

Parameters:
info - Info for region to be opened.
rootDir - Root directory for HBase instance
log - HLog for region to use. This method will call HLog#setSequenceNumber(long) passing the result of the call to HRegion#getMinSequenceId() to ensure the log id is properly kept up. HRegionStore does this every time it opens a new region.
conf -
Returns:
new HRegion
Throws:
IOException

addRegionToMETA

public static void addRegionToMETA(HRegion meta,
                                   HRegion r)
                            throws IOException
Inserts a new region's meta information into the passed meta region. Used by the HMaster bootstrap code adding new table to ROOT table.

Parameters:
meta - META HRegion to be updated
r - HRegion to add to meta
Throws:
IOException

getRegionDir

public static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path rootdir,
                                                     HRegionInfo info)
Computes the Path of the HRegion

Parameters:
rootdir - qualified path of HBase root directory
info - HRegionInfo for the region
Returns:
qualified path of region directory

rowIsInRange

public static boolean rowIsInRange(HRegionInfo info,
                                   org.apache.hadoop.io.Text row)
Determines if the specified row is within the row range specified by the specified HRegionInfo

Parameters:
info - HRegionInfo that specifies the row range
row - row to be checked
Returns:
true if the row is within the range specified by the HRegionInfo

makeColumnFamilyDirs

public static void makeColumnFamilyDirs(org.apache.hadoop.fs.FileSystem fs,
                                        org.apache.hadoop.fs.Path basedir,
                                        String encodedRegionName,
                                        org.apache.hadoop.io.Text colFamily,
                                        HTableDescriptor tabledesc)
                                 throws IOException
Make the directories for a specific column family

Parameters:
fs - the file system
basedir - base directory where region will live (usually the table dir)
encodedRegionName - encoded region name
colFamily - the column family
tabledesc - table descriptor of table
Throws:
IOException


Copyright © 2008 The Apache Software Foundation