org.apache.hadoop.hbase.regionserver
Class HRegion

java.lang.Object
  extended by org.apache.hadoop.hbase.regionserver.HRegion
All Implemented Interfaces:
HConstants

public class HRegion
extends Object
implements HConstants

HRegion stores data for a certain region of a table. It stores all columns for each row. A given table consists of one or more HRegions.

We maintain multiple HStores for a single HRegion.

An HStore is a set of rows with some column data; together, they make up all the data for the rows.

Each HRegion has a 'startKey' and 'endKey'.

The first is inclusive, the second is exclusive (except for the final region) The endKey of region 0 is the same as startKey for region 1 (if it exists). The startKey for the first region is null. The endKey for the final region is null.

Locking at the HRegion level serves only one purpose: preventing the region from being closed (and consequently split) while other operations are ongoing. Each row level operation obtains both a row lock and a region read lock for the duration of the operation. While a scanner is being constructed, getScanner holds a read lock. If the scanner is successfully constructed, it holds a read lock until it is closed. A close takes out a write lock and consequently will block for ongoing operations and will block new operations from starting while the close is in progress.

An HRegion is defined by its table and its key extent.

It consists of at least one HStore. The number of HStores should be configurable, so that data which is accessed together is stored in the same HStore. Right now, we approximate that by building a single HStore for each column family. (This config info will be communicated via the tabledesc.)

The HTableDescriptor contains metainfo about the HRegion's table. regionName is a unique identifier for this HRegion. (startKey, endKey] defines the keyspace for this HRegion.


Field Summary
 
Fields inherited from interface org.apache.hadoop.hbase.HConstants
ALL_META_COLUMNS, ALL_VERSIONS, COL_REGIONINFO, COL_REGIONINFO_ARRAY, COL_SERVER, COL_SPLITA, COL_SPLITB, COL_STARTCODE, COLUMN_FAMILY, COLUMN_FAMILY_ARRAY, COLUMN_FAMILY_HISTORIAN, COLUMN_FAMILY_HISTORIAN_STR, COLUMN_FAMILY_STR, DEFAULT_CLIENT_RETRIES, DEFAULT_HOST, DEFAULT_MASTER_ADDRESS, DEFAULT_MASTER_INFOPORT, DEFAULT_MASTER_PORT, DEFAULT_MAX_FILE_SIZE, DEFAULT_REGION_SERVER_CLASS, DEFAULT_REGIONSERVER_ADDRESS, DEFAULT_REGIONSERVER_INFOPORT, DEFAULT_SIZE_RESERVATION_BLOCK, EMPTY_BYTE_ARRAY, EMPTY_END_ROW, EMPTY_START_ROW, FILE_SYSTEM_VERSION, FOREVER, HBASE_CLIENT_RETRIES_NUMBER_KEY, HBASE_DIR, HREGION_LOGDIR_NAME, HREGION_OLDLOGFILE_NAME, IN_MEMORY, LAST_ROW, LATEST_TIMESTAMP, MAJOR_COMPACTION_PERIOD, MASTER_ADDRESS, META_ROW_DELIMITER, META_TABLE_NAME, NAME, NINES, REGION_SERVER_CLASS, REGION_SERVER_IMPL, REGIONSERVER_ADDRESS, RETRY_BACKOFF, ROOT_TABLE_NAME, THREAD_WAKE_FREQUENCY, UTF8_ENCODING, VERSION_FILE_NAME, VERSIONS, ZERO_L, ZEROES
 
Constructor Summary
HRegion(org.apache.hadoop.fs.Path basedir, HLog log, org.apache.hadoop.fs.FileSystem fs, HBaseConfiguration conf, HRegionInfo regionInfo, org.apache.hadoop.fs.Path initialFiles, FlushRequester flushListener)
          HRegion constructor.
HRegion(org.apache.hadoop.fs.Path basedir, HLog log, org.apache.hadoop.fs.FileSystem fs, HBaseConfiguration conf, HRegionInfo regionInfo, org.apache.hadoop.fs.Path initialFiles, FlushRequester flushListener, org.apache.hadoop.util.Progressable reporter)
          HRegion constructor.
 
Method Summary
static void addRegionToMETA(HRegion meta, HRegion r)
          Inserts a new region's meta information into the passed meta region.
 void batchUpdate(BatchUpdate b)
           
protected  void checkReadOnly()
           
 List<HStoreFile> close()
          Close down this HRegion.
 byte[] compactStores()
          Called by compaction thread and after region is opened to compact the HStores if necessary.
static HRegion createHRegion(HRegionInfo info, org.apache.hadoop.fs.Path rootDir, HBaseConfiguration conf)
          Convenience method creating new HRegions.
 void deleteAll(byte[] row, byte[] column, long ts)
          Delete all cells of the same age as the passed timestamp or older.
 void deleteAll(byte[] row, long ts)
          Delete all cells of the same age as the passed timestamp or older.
 void deleteFamily(byte[] row, byte[] family, long timestamp)
          Delete all cells for a row with matching column family with timestamps less than or equal to timestamp.
static void deleteRegion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, HRegionInfo info)
          Deletes all the files for a HRegion
protected  void doReconstructionLog(org.apache.hadoop.fs.Path oldLogFile, long maxSeqId, org.apache.hadoop.util.Progressable reporter)
           
 boolean equals(Object o)
          
 boolean flushcache()
          Flush the cache.
 Cell get(byte[] row, byte[] column)
          Fetch a single data item.
 Cell[] get(byte[] row, byte[] column, int numVersions)
          Fetch multiple versions of a single data item
 Cell[] get(byte[] row, byte[] column, long timestamp, int numVersions)
          Fetch multiple versions of a single data item, with timestamp.
 org.apache.hadoop.fs.Path getBaseDir()
           
 RowResult getClosestRowBefore(byte[] row)
          Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.
 HBaseConfiguration getConf()
           
 byte[] getEndKey()
           
 org.apache.hadoop.fs.FileSystem getFilesystem()
           
 Map<byte[],Cell> getFull(byte[] row, Set<byte[]> columns, long ts)
          Fetch all the columns for the indicated row at a specified timestamp.
 long getLargestHStoreSize()
           
 long getLastFlushTime()
           
 HLog getLog()
           
 org.apache.hadoop.fs.Path getRegionDir()
           
static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path rootdir, HRegionInfo info)
          Computes the Path of the HRegion
static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path tabledir, int name)
          Computes the Path of the HRegion
 long getRegionId()
           
 HRegionInfo getRegionInfo()
           
 byte[] getRegionName()
           
 InternalScanner getScanner(byte[][] cols, byte[] firstRow, long timestamp, RowFilterInterface filter)
          Return an iterator that scans over the HRegion, returning the indicated columns for only the rows that match the data filter.
 byte[] getStartKey()
           
 HStore getStore(byte[] column)
          Return HStore instance.
 HTableDescriptor getTableDesc()
           
 int hashCode()
          
protected  HStore instantiateHStore(org.apache.hadoop.fs.Path baseDir, HColumnDescriptor c, org.apache.hadoop.fs.Path oldLogFile, org.apache.hadoop.util.Progressable reporter)
           
 boolean isClosed()
           
static void makeColumnFamilyDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path basedir, int encodedRegionName, byte[] colFamily, HTableDescriptor tabledesc)
          Make the directories for a specific column family
static HRegion merge(HRegion a, HRegion b)
          Merge two regions whether they are adjacent or not.
static HRegion mergeAdjacent(HRegion srcA, HRegion srcB)
          Merge two HRegions.
static void offlineRegionInMETA(HRegionInterface srvr, byte[] metaRegionName, HRegionInfo info)
          Utility method used by HMaster marking regions offlined.
static HRegion openHRegion(HRegionInfo info, org.apache.hadoop.fs.Path rootDir, HLog log, HBaseConfiguration conf)
          Convenience method to open a HRegion outside of an HRegionServer context.
static void removeRegionFromMETA(HRegionInterface srvr, byte[] metaRegionName, byte[] regionName)
          Delete a region's meta information from the passed meta region.
static boolean rowIsInRange(HRegionInfo info, byte[] row)
          Determines if the specified row is within the row range specified by the specified HRegionInfo
 String toString()
          
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HRegion

public HRegion(org.apache.hadoop.fs.Path basedir,
               HLog log,
               org.apache.hadoop.fs.FileSystem fs,
               HBaseConfiguration conf,
               HRegionInfo regionInfo,
               org.apache.hadoop.fs.Path initialFiles,
               FlushRequester flushListener)
        throws IOException
HRegion constructor.

Parameters:
basedir - qualified path of directory where region should be located, usually the table directory.
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region
initialFiles - If there are initial files (implying that the HRegion is new), then read them from the supplied path.
flushListener - an object that implements CacheFlushListener or null or null
Throws:
IOException

HRegion

public HRegion(org.apache.hadoop.fs.Path basedir,
               HLog log,
               org.apache.hadoop.fs.FileSystem fs,
               HBaseConfiguration conf,
               HRegionInfo regionInfo,
               org.apache.hadoop.fs.Path initialFiles,
               FlushRequester flushListener,
               org.apache.hadoop.util.Progressable reporter)
        throws IOException
HRegion constructor.

Parameters:
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
basedir - qualified path of directory where region should be located, usually the table directory.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region
initialFiles - If there are initial files (implying that the HRegion is new), then read them from the supplied path.
flushListener - an object that implements CacheFlushListener or null
reporter - Call on a period so hosting server can report we're making progress to master -- otherwise master might think region deploy failed. Can be null.
Throws:
IOException
Method Detail

mergeAdjacent

public static HRegion mergeAdjacent(HRegion srcA,
                                    HRegion srcB)
                             throws IOException
Merge two HRegions. The regions must be adjacent andmust not overlap.

Parameters:
srcA -
srcB -
Returns:
new merged HRegion
Throws:
IOException

merge

public static HRegion merge(HRegion a,
                            HRegion b)
                     throws IOException
Merge two regions whether they are adjacent or not.

Parameters:
a - region a
b - region b
Returns:
new merged region
Throws:
IOException

getRegionInfo

public HRegionInfo getRegionInfo()
Returns:
a HRegionInfo object for this region

isClosed

public boolean isClosed()
Returns:
true if region is closed

close

public List<HStoreFile> close()
                       throws IOException
Close down this HRegion. Flush the cache, shut down each HStore, don't service any more calls.

This method could take some time to execute, so don't call it from a time-sensitive thread.

Returns:
Vector of all the storage files that the HRegion's component HStores make use of. It's a list of all HStoreFile objects. Returns empty vector if already closed and null if judged that it should not close.
Throws:
IOException

getStartKey

public byte[] getStartKey()
Returns:
start key for region

getEndKey

public byte[] getEndKey()
Returns:
end key for region

getRegionId

public long getRegionId()
Returns:
region id

getRegionName

public byte[] getRegionName()
Returns:
region name

getTableDesc

public HTableDescriptor getTableDesc()
Returns:
HTableDescriptor for this region

getLog

public HLog getLog()
Returns:
HLog in use for this region

getConf

public HBaseConfiguration getConf()
Returns:
Configuration object

getRegionDir

public org.apache.hadoop.fs.Path getRegionDir()
Returns:
region directory Path

getFilesystem

public org.apache.hadoop.fs.FileSystem getFilesystem()
Returns:
FileSystem being used by this region

getLastFlushTime

public long getLastFlushTime()
Returns:
the last time the region was flushed

getLargestHStoreSize

public long getLargestHStoreSize()
Returns:
returns size of largest HStore.

compactStores

public byte[] compactStores()
                     throws IOException
Called by compaction thread and after region is opened to compact the HStores if necessary.

This operation could block for a long time, so don't call it from a time-sensitive thread. Note that no locking is necessary at this level because compaction only conflicts with a region split, and that cannot happen because the region server does them sequentially and not in parallel.

Returns:
mid key if split is needed
Throws:
IOException

flushcache

public boolean flushcache()
                   throws IOException
Flush the cache. When this method is called the cache will be flushed unless:
  1. the cache is empty
  2. the region is closed.
  3. a flush is already in progress
  4. writes are disabled

This method may block for some time, so it should not be called from a time-sensitive thread.

Returns:
true if cache was flushed
Throws:
IOException
DroppedSnapshotException - Thrown when replay of hlog is required because a Snapshot was not properly persisted.

get

public Cell get(byte[] row,
                byte[] column)
         throws IOException
Fetch a single data item.

Parameters:
row -
column -
Returns:
column value
Throws:
IOException

get

public Cell[] get(byte[] row,
                  byte[] column,
                  int numVersions)
           throws IOException
Fetch multiple versions of a single data item

Parameters:
row -
column -
numVersions -
Returns:
array of values one element per version
Throws:
IOException

get

public Cell[] get(byte[] row,
                  byte[] column,
                  long timestamp,
                  int numVersions)
           throws IOException
Fetch multiple versions of a single data item, with timestamp.

Parameters:
row -
column -
timestamp -
numVersions -
Returns:
array of values one element per version that matches the timestamp
Throws:
IOException

getFull

public Map<byte[],Cell> getFull(byte[] row,
                                Set<byte[]> columns,
                                long ts)
                         throws IOException
Fetch all the columns for the indicated row at a specified timestamp. Returns a TreeMap that maps column names to values. We should eventually use Bloom filters here, to reduce running time. If the database has many column families and is very sparse, then we could be checking many files needlessly. A small Bloom for each row would help us determine which column groups are useful for that row. That would let us avoid a bunch of disk activity.

Parameters:
row -
columns - Array of columns you'd like to retrieve. When null, get all.
ts -
Returns:
Map values
Throws:
IOException

getClosestRowBefore

public RowResult getClosestRowBefore(byte[] row)
                              throws IOException
Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.

Parameters:
row - row key
Returns:
map of values
Throws:
IOException

getScanner

public InternalScanner getScanner(byte[][] cols,
                                  byte[] firstRow,
                                  long timestamp,
                                  RowFilterInterface filter)
                           throws IOException
Return an iterator that scans over the HRegion, returning the indicated columns for only the rows that match the data filter. This Iterator must be closed by the caller.

Parameters:
cols - columns to scan. If column name is a column family, all columns of the specified column family are returned. Its also possible to pass a regex in the column qualifier. A column qualifier is judged to be a regex if it contains at least one of the following characters: \+|^&*$[]]}{)(.
firstRow - row which is the starting point of the scan
timestamp - only return rows whose timestamp is <= this value
filter - row filter
Returns:
InternalScanner
Throws:
IOException

batchUpdate

public void batchUpdate(BatchUpdate b)
                 throws IOException
Parameters:
b -
Throws:
IOException

deleteAll

public void deleteAll(byte[] row,
                      byte[] column,
                      long ts)
               throws IOException
Delete all cells of the same age as the passed timestamp or older.

Parameters:
row -
column -
ts - Delete all entries that have this timestamp or older
Throws:
IOException

deleteAll

public void deleteAll(byte[] row,
                      long ts)
               throws IOException
Delete all cells of the same age as the passed timestamp or older.

Parameters:
row -
ts - Delete all entries that have this timestamp or older
Throws:
IOException

deleteFamily

public void deleteFamily(byte[] row,
                         byte[] family,
                         long timestamp)
                  throws IOException
Delete all cells for a row with matching column family with timestamps less than or equal to timestamp.

Parameters:
row - The row to operate on
family - The column family to match
timestamp - Timestamp to match
Throws:
IOException

checkReadOnly

protected void checkReadOnly()
                      throws IOException
Throws:
IOException - Throws exception if region is in read-only mode.

doReconstructionLog

protected void doReconstructionLog(org.apache.hadoop.fs.Path oldLogFile,
                                   long maxSeqId,
                                   org.apache.hadoop.util.Progressable reporter)
                            throws UnsupportedEncodingException,
                                   IOException
Throws:
UnsupportedEncodingException
IOException

instantiateHStore

protected HStore instantiateHStore(org.apache.hadoop.fs.Path baseDir,
                                   HColumnDescriptor c,
                                   org.apache.hadoop.fs.Path oldLogFile,
                                   org.apache.hadoop.util.Progressable reporter)
                            throws IOException
Throws:
IOException

getStore

public HStore getStore(byte[] column)
Return HStore instance. Use with caution. Exposed for use of fixup utilities.

Parameters:
column - Name of column family hosted by this region.
Returns:
Store that goes with the family on passed column. TODO: Make this lookup faster.

equals

public boolean equals(Object o)

Overrides:
equals in class Object

hashCode

public int hashCode()

Overrides:
hashCode in class Object

toString

public String toString()

Overrides:
toString in class Object

getBaseDir

public org.apache.hadoop.fs.Path getBaseDir()
Returns:
Path of region base directory

createHRegion

public static HRegion createHRegion(HRegionInfo info,
                                    org.apache.hadoop.fs.Path rootDir,
                                    HBaseConfiguration conf)
                             throws IOException
Convenience method creating new HRegions. Used by createTable and by the bootstrap code in the HMaster constructor. Note, this method creates an HLog for the created region. It needs to be closed explicitly. Use getLog() to get access.

Parameters:
info - Info for region to create.
rootDir - Root directory for HBase instance
conf -
Returns:
new HRegion
Throws:
IOException

openHRegion

public static HRegion openHRegion(HRegionInfo info,
                                  org.apache.hadoop.fs.Path rootDir,
                                  HLog log,
                                  HBaseConfiguration conf)
                           throws IOException
Convenience method to open a HRegion outside of an HRegionServer context.

Parameters:
info - Info for region to be opened.
rootDir - Root directory for HBase instance
log - HLog for region to use. This method will call HLog#setSequenceNumber(long) passing the result of the call to HRegion#getMinSequenceId() to ensure the log id is properly kept up. HRegionStore does this every time it opens a new region.
conf -
Returns:
new HRegion
Throws:
IOException

addRegionToMETA

public static void addRegionToMETA(HRegion meta,
                                   HRegion r)
                            throws IOException
Inserts a new region's meta information into the passed meta region. Used by the HMaster bootstrap code adding new table to ROOT table.

Parameters:
meta - META HRegion to be updated
r - HRegion to add to meta
Throws:
IOException

removeRegionFromMETA

public static void removeRegionFromMETA(HRegionInterface srvr,
                                        byte[] metaRegionName,
                                        byte[] regionName)
                                 throws IOException
Delete a region's meta information from the passed meta region.

Parameters:
srvr - META server to be updated
metaRegionName - Meta region name
regionName - HRegion to remove from meta
Throws:
IOException

offlineRegionInMETA

public static void offlineRegionInMETA(HRegionInterface srvr,
                                       byte[] metaRegionName,
                                       HRegionInfo info)
                                throws IOException
Utility method used by HMaster marking regions offlined.

Parameters:
srvr - META server to be updated
metaRegionName - Meta region name
info - HRegion to update in meta
Throws:
IOException

deleteRegion

public static void deleteRegion(org.apache.hadoop.fs.FileSystem fs,
                                org.apache.hadoop.fs.Path rootdir,
                                HRegionInfo info)
                         throws IOException
Deletes all the files for a HRegion

Parameters:
fs - the file system object
rootdir - qualified path of HBase root directory
info - HRegionInfo for region to be deleted
Throws:
IOException

getRegionDir

public static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path tabledir,
                                                     int name)
Computes the Path of the HRegion

Parameters:
tabledir - qualified path for table
name - ENCODED region name
Returns:
Path of HRegion directory

getRegionDir

public static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path rootdir,
                                                     HRegionInfo info)
Computes the Path of the HRegion

Parameters:
rootdir - qualified path of HBase root directory
info - HRegionInfo for the region
Returns:
qualified path of region directory

rowIsInRange

public static boolean rowIsInRange(HRegionInfo info,
                                   byte[] row)
Determines if the specified row is within the row range specified by the specified HRegionInfo

Parameters:
info - HRegionInfo that specifies the row range
row - row to be checked
Returns:
true if the row is within the range specified by the HRegionInfo

makeColumnFamilyDirs

public static void makeColumnFamilyDirs(org.apache.hadoop.fs.FileSystem fs,
                                        org.apache.hadoop.fs.Path basedir,
                                        int encodedRegionName,
                                        byte[] colFamily,
                                        HTableDescriptor tabledesc)
                                 throws IOException
Make the directories for a specific column family

Parameters:
fs - the file system
basedir - base directory where region will live (usually the table dir)
encodedRegionName - encoded region name
colFamily - the column family
tabledesc - table descriptor of table
Throws:
IOException


Copyright © 2008 The Apache Software Foundation