org.apache.hadoop.hbase
Class HRegion

java.lang.Object
  extended by org.apache.hadoop.hbase.HRegion
All Implemented Interfaces:
HConstants

public class HRegion
extends Object
implements HConstants

HRegion stores data for a certain region of a table. It stores all columns for each row. A given table consists of one or more HRegions.

We maintain multiple HStores for a single HRegion.

An HStore is a set of rows with some column data; together, they make up all the data for the rows.

Each HRegion has a 'startKey' and 'endKey'.

The first is inclusive, the second is exclusive (except for the final region) The endKey of region 0 is the same as startKey for region 1 (if it exists). The startKey for the first region is null. The endKey for the final region is null.

Locking at the HRegion level serves only one purpose: preventing the region from being closed (and consequently split) while other operations are ongoing. Each row level operation obtains both a row lock and a region read lock for the duration of the operation. While a scanner is being constructed, getScanner holds a read lock. If the scanner is successfully constructed, it holds a read lock until it is closed. A close takes out a write lock and consequently will block for ongoing operations and will block new operations from starting while the close is in progress.

An HRegion is defined by its table and its key extent.

It consists of at least one HStore. The number of HStores should be configurable, so that data which is accessed together is stored in the same HStore. Right now, we approximate that by building a single HStore for each column family. (This config info will be communicated via the tabledesc.)

The HTableDescriptor contains metainfo about the HRegion's table. regionName is a unique identifier for this HRegion. (startKey, endKey] defines the keyspace for this HRegion.


Field Summary
protected  long threadWakeFrequency
           
 
Fields inherited from interface org.apache.hadoop.hbase.HConstants
ALL_META_COLUMNS, ALL_VERSIONS, COL_REGIONINFO, COL_REGIONINFO_ARRAY, COL_SERVER, COL_SPLITA, COL_SPLITB, COL_STARTCODE, COLUMN_FAMILY, COLUMN_FAMILY_ARRAY, COLUMN_FAMILY_STR, DEFAULT_HBASE_DIR, DEFAULT_HOST, DEFAULT_MASTER_ADDRESS, DEFAULT_MASTER_INFOPORT, DEFAULT_MASTER_PORT, DEFAULT_MAX_FILE_SIZE, DEFAULT_REGION_SERVER_CLASS, DEFAULT_REGIONSERVER_ADDRESS, DEFAULT_REGIONSERVER_INFOPORT, EMPTY_START_ROW, EMPTY_TEXT, FILE_SYSTEM_VERSION, HBASE_DIR, HREGION_LOGDIR_NAME, HREGION_OLDLOGFILE_NAME, LATEST_TIMESTAMP, MASTER_ADDRESS, META_TABLE_NAME, REGION_SERVER_CLASS, REGIONSERVER_ADDRESS, ROOT_TABLE_NAME, THREAD_WAKE_FREQUENCY, UTF8_ENCODING, VERSION_FILE_NAME
 
Constructor Summary
HRegion(Path basedir, HLog log, FileSystem fs, HBaseConfiguration conf, HRegionInfo regionInfo, Path initialFiles, CacheFlushListener listener)
          HRegion constructor.
 
Method Summary
 void batchUpdate(long timestamp, BatchUpdate b)
           
 List<HStoreFile> close()
          Close down this HRegion.
 void deleteAll(Text row, long ts)
          Delete all cells of the same age as the passed timestamp or older.
 void deleteAll(Text row, Text column, long ts)
          Delete all cells of the same age as the passed timestamp or older.
 void deleteFamily(Text row, Text family, long timestamp)
          Delete all cells for a row with matching column family with timestamps less than or equal to timestamp.
 byte[] get(Text row, Text column)
          Fetch a single data item.
 byte[][] get(Text row, Text column, int numVersions)
          Fetch multiple versions of a single data item
 byte[][] get(Text row, Text column, long timestamp, int numVersions)
          Fetch multiple versions of a single data item, with timestamp.
 Map<Text,byte[]> getClosestRowBefore(Text row, long ts)
          Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.
 Text getEndKey()
           
 Map<Text,byte[]> getFull(Text row)
          Fetch all the columns for the indicated row.
 Map<Text,byte[]> getFull(Text row, long ts)
          Fetch all the columns for the indicated row at a specified timestamp.
 long getLastFlushTime()
           
 HLog getLog()
           
static Path getRegionDir(Path rootdir, HRegionInfo info)
          Computes the Path of the HRegion
 HRegionInfo getRegionInfo()
           
 Text getRegionName()
           
 HScannerInterface getScanner(Text[] cols, Text firstRow, long timestamp, RowFilterInterface filter)
          Return an iterator that scans over the HRegion, returning the indicated columns for only the rows that match the data filter.
 Text getStartKey()
           
 String toString()
          
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

threadWakeFrequency

protected final long threadWakeFrequency
Constructor Detail

HRegion

public HRegion(Path basedir,
               HLog log,
               FileSystem fs,
               HBaseConfiguration conf,
               HRegionInfo regionInfo,
               Path initialFiles,
               CacheFlushListener listener)
        throws IOException
HRegion constructor.

Parameters:
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
basedir - qualified path of directory where region should be located, usually the table directory.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region
initialFiles - If there are initial files (implying that the HRegion is new), then read them from the supplied path.
listener - an object that implements CacheFlushListener or null
Throws:
IOException
Method Detail

getRegionInfo

public HRegionInfo getRegionInfo()
Returns:
a HRegionInfo object for this region

close

public List<HStoreFile> close()
                       throws IOException
Close down this HRegion. Flush the cache, shut down each HStore, don't service any more calls.

This method could take some time to execute, so don't call it from a time-sensitive thread.

Returns:
Vector of all the storage files that the HRegion's component HStores make use of. It's a list of all HStoreFile objects. Returns empty vector if already closed and null if judged that it should not close.
Throws:
IOException

getStartKey

public Text getStartKey()
Returns:
start key for region

getEndKey

public Text getEndKey()
Returns:
end key for region

getRegionName

public Text getRegionName()
Returns:
region name

getLog

public HLog getLog()
Returns:
HLog in use for this region

getLastFlushTime

public long getLastFlushTime()
Returns:
the last time the region was flushed

get

public byte[] get(Text row,
                  Text column)
           throws IOException
Fetch a single data item.

Parameters:
row -
column -
Returns:
column value
Throws:
IOException

get

public byte[][] get(Text row,
                    Text column,
                    int numVersions)
             throws IOException
Fetch multiple versions of a single data item

Parameters:
row -
column -
numVersions -
Returns:
array of values one element per version
Throws:
IOException

get

public byte[][] get(Text row,
                    Text column,
                    long timestamp,
                    int numVersions)
             throws IOException
Fetch multiple versions of a single data item, with timestamp.

Parameters:
row -
column -
timestamp -
numVersions -
Returns:
array of values one element per version that matches the timestamp
Throws:
IOException

getFull

public Map<Text,byte[]> getFull(Text row)
                         throws IOException
Fetch all the columns for the indicated row. Returns a TreeMap that maps column names to values. We should eventually use Bloom filters here, to reduce running time. If the database has many column families and is very sparse, then we could be checking many files needlessly. A small Bloom for each row would help us determine which column groups are useful for that row. That would let us avoid a bunch of disk activity.

Parameters:
row -
Returns:
Map values
Throws:
IOException

getFull

public Map<Text,byte[]> getFull(Text row,
                                long ts)
                         throws IOException
Fetch all the columns for the indicated row at a specified timestamp. Returns a TreeMap that maps column names to values. We should eventually use Bloom filters here, to reduce running time. If the database has many column families and is very sparse, then we could be checking many files needlessly. A small Bloom for each row would help us determine which column groups are useful for that row. That would let us avoid a bunch of disk activity.

Parameters:
row -
ts -
Returns:
Map values
Throws:
IOException

getClosestRowBefore

public Map<Text,byte[]> getClosestRowBefore(Text row,
                                            long ts)
                                     throws IOException
Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.

Parameters:
row - row key
ts -
Returns:
map of values
Throws:
IOException

getScanner

public HScannerInterface getScanner(Text[] cols,
                                    Text firstRow,
                                    long timestamp,
                                    RowFilterInterface filter)
                             throws IOException
Return an iterator that scans over the HRegion, returning the indicated columns for only the rows that match the data filter. This Iterator must be closed by the caller.

Parameters:
cols - columns to scan. If column name is a column family, all columns of the specified column family are returned. Its also possible to pass a regex in the column qualifier. A column qualifier is judged to be a regex if it contains at least one of the following characters: \+|^&*$[]]}{)(.
firstRow - row which is the starting point of the scan
timestamp - only return rows whose timestamp is <= this value
filter - row filter
Returns:
HScannerInterface
Throws:
IOException

batchUpdate

public void batchUpdate(long timestamp,
                        BatchUpdate b)
                 throws IOException
Parameters:
timestamp -
b -
Throws:
IOException

deleteAll

public void deleteAll(Text row,
                      Text column,
                      long ts)
               throws IOException
Delete all cells of the same age as the passed timestamp or older.

Parameters:
row -
column -
ts - Delete all entries that have this timestamp or older
Throws:
IOException

deleteAll

public void deleteAll(Text row,
                      long ts)
               throws IOException
Delete all cells of the same age as the passed timestamp or older.

Parameters:
row -
ts - Delete all entries that have this timestamp or older
Throws:
IOException

deleteFamily

public void deleteFamily(Text row,
                         Text family,
                         long timestamp)
                  throws IOException
Delete all cells for a row with matching column family with timestamps less than or equal to timestamp.

Parameters:
row - The row to operate on
family - The column family to match
timestamp - Timestamp to match
Throws:
IOException

toString

public String toString()

Overrides:
toString in class Object

getRegionDir

public static Path getRegionDir(Path rootdir,
                                HRegionInfo info)
Computes the Path of the HRegion

Parameters:
rootdir - qualified path of HBase root directory
info - HRegionInfo for the region
Returns:
qualified path of region directory


Copyright © 2006 The Apache Software Foundation