org.apache.hadoop.hbase.regionserver
Class HRegion

java.lang.Object
  extended by org.apache.hadoop.hbase.regionserver.HRegion
All Implemented Interfaces:
HeapSize

public class HRegion
extends Object
implements HeapSize

HRegion stores data for a certain region of a table. It stores all columns for each row. A given table consists of one or more HRegions.

We maintain multiple HStores for a single HRegion.

An Store is a set of rows with some column data; together, they make up all the data for the rows.

Each HRegion has a 'startKey' and 'endKey'.

The first is inclusive, the second is exclusive (except for the final region) The endKey of region 0 is the same as startKey for region 1 (if it exists). The startKey for the first region is null. The endKey for the final region is null.

Locking at the HRegion level serves only one purpose: preventing the region from being closed (and consequently split) while other operations are ongoing. Each row level operation obtains both a row lock and a region read lock for the duration of the operation. While a scanner is being constructed, getScanner holds a read lock. If the scanner is successfully constructed, it holds a read lock until it is closed. A close takes out a write lock and consequently will block for ongoing operations and will block new operations from starting while the close is in progress.

An HRegion is defined by its table and its key extent.

It consists of at least one Store. The number of Stores should be configurable, so that data which is accessed together is stored in the same Store. Right now, we approximate that by building a single Store for each column family. (This config info will be communicated via the tabledesc.)

The HTableDescriptor contains metainfo about the HRegion's table. regionName is a unique identifier for this HRegion. (startKey, endKey] defines the keyspace for this HRegion.


Field Summary
static long DEEP_OVERHEAD
           
static long FIXED_OVERHEAD
           
static org.apache.commons.logging.Log LOG
           
static String REGION_TEMP_SUBDIR
          Temporary subdirectory of the region directory used for compaction output.
static String REGIONINFO_FILE
          Name of the region info file that resides just under the region directory.
protected  Map<byte[],Store> stores
           
 
Constructor Summary
HRegion()
          Should only be used for testing purposes
HRegion(org.apache.hadoop.fs.Path tableDir, HLog log, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, HRegionInfo regionInfo, HTableDescriptor htd, RegionServerServices rsServices)
          HRegion constructor.
 
Method Summary
 long addAndGetGlobalMemstoreSize(long memStoreSize)
          Increase the size of mem store in this region and the size of global mem store
static void addRegionToMETA(HRegion meta, HRegion r)
          Inserts a new region's meta information into the passed meta region.
 Result append(Append append, Integer lockid, boolean writeToWAL)
          Perform one or more append operations on a row.
 OperationStatus[] batchMutate(Pair<Mutation,Integer>[] mutationsAndLocks)
          Perform a batch of mutations.
 boolean bulkLoadHFiles(List<Pair<byte[],String>> familyPaths)
          Attempts to atomically load a group of hfiles.
 boolean checkAndMutate(byte[] row, byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, WritableByteArrayComparable comparator, org.apache.hadoop.io.Writable w, Integer lockId, boolean writeToWAL)
           
protected  void checkReadOnly()
           
 byte[] checkSplit()
          Return the splitpoint.
 List<StoreFile> close()
          Close down this HRegion.
 List<StoreFile> close(boolean abort)
          Close down this HRegion.
static void closeHRegion(HRegion r)
          This will do the necessary cleanup a call to createHRegion(HRegionInfo, Path, Configuration, HTableDescriptor) requires.
 boolean compact(CompactionRequest cr)
           
 void compactStores()
          This is a helper function that compact all the stores synchronously It is used by utilities and testing
 void compactStores(boolean majorCompaction)
          This is a helper function that compact all the stores synchronously It is used by utilities and testing
static HDFSBlocksDistribution computeHDFSBlocksDistribution(org.apache.hadoop.conf.Configuration conf, HTableDescriptor tableDescriptor, String regionEncodedName)
          This is a helper function to compute HDFS block distribution on demand
static HRegion createHRegion(HRegionInfo info, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.conf.Configuration conf, HTableDescriptor hTableDescriptor)
          Convenience method creating new HRegions.
static HRegion createHRegion(HRegionInfo info, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.conf.Configuration conf, HTableDescriptor hTableDescriptor, HLog hlog)
          Convenience method creating new HRegions.
 void delete(Delete delete, Integer lockid, boolean writeToWAL)
           
static void deleteRegion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, HRegionInfo info)
          Deletes all the files for a HRegion
 boolean equals(Object o)
           
 ExecResult exec(Exec call)
          Executes a single CoprocessorProtocol method using the registered protocol handlers.
 boolean flushcache()
          Flush the cache.
 Result get(Get get, Integer lockid)
           
 Result getClosestRowBefore(byte[] row, byte[] family)
          Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.
 int getCompactPriority()
           
protected  long getCompleteCacheFlushSequenceId(long currentSequenceId)
          Get the sequence number to be associated with this cache flush.
 org.apache.hadoop.conf.Configuration getConf()
           
 RegionCoprocessorHost getCoprocessorHost()
           
 byte[] getEndKey()
           
 org.apache.hadoop.fs.FileSystem getFilesystem()
           
 HDFSBlocksDistribution getHDFSBlocksDistribution()
          This function will return the HDFS blocks distribution based on the data captured when HFile is created
 long getLargestHStoreSize()
           
 long getLastFlushTime()
           
 Integer getLock(Integer lockid, byte[] row, boolean waitForLock)
          Returns existing row lock if found, otherwise obtains a new row lock and returns it.
 HLog getLog()
           
 AtomicLong getMemstoreSize()
           
 MultiVersionConsistencyControl getMVCC()
           
 long getReadRequestsCount()
           
 List<Pair<Long,Long>> getRecentFlushInfo()
           
 org.apache.hadoop.fs.Path getRegionDir()
           
static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path rootdir, HRegionInfo info)
          Computes the Path of the HRegion
static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path tabledir, String name)
          Computes the Path of the HRegion
 long getRegionId()
           
 HRegionInfo getRegionInfo()
           
 byte[] getRegionName()
           
 String getRegionNameAsString()
           
 long getRequestsCount()
           
 RegionScanner getScanner(Scan scan)
          Return an iterator that scans over the HRegion, returning the indicated columns and rows specified by the Scan.
protected  RegionScanner getScanner(Scan scan, List<KeyValueScanner> additionalScanners)
           
 long getSmallestReadPoint()
           
 byte[] getStartKey()
           
 Store getStore(byte[] column)
          Return HStore instance.
 List<String> getStoreFileList(byte[][] columns)
          Return list of storeFiles for the set of CFs.
protected  ThreadPoolExecutor getStoreFileOpenAndCloseThreadPool(String threadNamePrefix)
           
protected  ThreadPoolExecutor getStoreOpenAndCloseThreadPool(String threadNamePrefix)
           
 Map<byte[],Store> getStores()
           
 HTableDescriptor getTableDesc()
           
 org.apache.hadoop.fs.Path getTableDir()
           
 long getWriteRequestsCount()
           
 int hashCode()
           
 boolean hasReferences()
           
 long heapSize()
           
 Result increment(Increment increment, Integer lockid, boolean writeToWAL)
          Perform one or more increment operations on a row.
 long incrementColumnValue(byte[] row, byte[] family, byte[] qualifier, long amount, boolean writeToWAL)
           
 long initialize()
          Initialize this region.
 long initialize(CancelableProgressable reporter)
          Initialize this region.
protected  Store instantiateHStore(org.apache.hadoop.fs.Path tableDir, HColumnDescriptor c)
           
protected  RegionScanner instantiateRegionScanner(Scan scan, List<KeyValueScanner> additionalScanners)
           
protected  boolean internalFlushcache(HLog wal, long myseqid, MonitoredTask status)
           
protected  boolean internalFlushcache(MonitoredTask status)
          Flush the memstore.
 boolean isAvailable()
           
 boolean isClosed()
           
 boolean isClosing()
           
 boolean isSplittable()
           
static void main(String[] args)
          Facility for dumping and compacting catalog tables.
static void makeColumnFamilyDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path tabledir, HRegionInfo hri, byte[] colFamily)
          Make the directories for a specific column family
static HRegion merge(HRegion a, HRegion b)
          Merge two regions whether they are adjacent or not.
static HRegion mergeAdjacent(HRegion srcA, HRegion srcB)
          Merge two HRegions.
 void mutateRow(RowMutations rm)
           
 void mutateRowsWithLocks(Collection<Mutation> mutations, Collection<byte[]> rowsToLock)
          Perform atomic mutations within the region.
 boolean needsCompaction()
          Checks every store to see if one has too many store files
static HRegion newHRegion(org.apache.hadoop.fs.Path tableDir, HLog log, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, HRegionInfo regionInfo, HTableDescriptor htd, RegionServerServices rsServices)
          A utility method to create new instances of HRegion based on the HConstants.REGION_IMPL configuration property.
 Integer obtainRowLock(byte[] row)
          Obtain a lock on the given row.
protected  HRegion openHRegion(CancelableProgressable reporter)
          Open HRegion.
static HRegion openHRegion(HRegionInfo info, HTableDescriptor htd, HLog wal, org.apache.hadoop.conf.Configuration conf)
          Open a Region.
static HRegion openHRegion(HRegionInfo info, HTableDescriptor htd, HLog wal, org.apache.hadoop.conf.Configuration conf, RegionServerServices rsServices, CancelableProgressable reporter)
          Open a Region.
static HRegion openHRegion(org.apache.hadoop.fs.Path tableDir, HRegionInfo info, HTableDescriptor htd, HLog wal, org.apache.hadoop.conf.Configuration conf)
           
static HRegion openHRegion(org.apache.hadoop.fs.Path tableDir, HRegionInfo info, HTableDescriptor htd, HLog wal, org.apache.hadoop.conf.Configuration conf, RegionServerServices rsServices, CancelableProgressable reporter)
          Open a Region.
protected  void prepareToSplit()
          Give the region a chance to prepare before it is split.
 OperationStatus[] put(Pair<Put,Integer>[] putsAndLocks)
          Deprecated. Instead use batchMutate(Pair[])
 void put(Put put)
           
 OperationStatus[] put(Put[] puts)
          Perform a batch put with no pre-specified locks
 void put(Put put, boolean writeToWAL)
           
 void put(Put put, Integer lockid)
           
 void put(Put put, Integer lockid, boolean writeToWAL)
           
<T extends CoprocessorProtocol>
boolean
registerProtocol(Class<T> protocol, T handler)
          Registers a new CoprocessorProtocol subclass and instance to be available for handling exec(Exec) calls.
 void releaseRowLock(Integer lockId)
          Release the row lock!
protected  long replayRecoveredEditsIfAny(org.apache.hadoop.fs.Path regiondir, long minSeqId, CancelableProgressable reporter, MonitoredTask status)
          Read the edits log put under this region by wal log splitting process.
protected  boolean restoreEdit(Store s, KeyValue kv)
          Used by tests
static boolean rowIsInRange(HRegionInfo info, byte[] row)
          Determines if the specified row is within the row range specified by the specified HRegionInfo
 void setCoprocessorHost(RegionCoprocessorHost coprocessorHost)
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

stores

protected final Map<byte[],Store> stores

REGION_TEMP_SUBDIR

public static final String REGION_TEMP_SUBDIR
Temporary subdirectory of the region directory used for compaction output.

See Also:
Constant Field Values

REGIONINFO_FILE

public static final String REGIONINFO_FILE
Name of the region info file that resides just under the region directory.

See Also:
Constant Field Values

FIXED_OVERHEAD

public static final long FIXED_OVERHEAD

DEEP_OVERHEAD

public static final long DEEP_OVERHEAD
Constructor Detail

HRegion

public HRegion()
Should only be used for testing purposes


HRegion

public HRegion(org.apache.hadoop.fs.Path tableDir,
               HLog log,
               org.apache.hadoop.fs.FileSystem fs,
               org.apache.hadoop.conf.Configuration conf,
               HRegionInfo regionInfo,
               HTableDescriptor htd,
               RegionServerServices rsServices)
HRegion constructor. his constructor should only be used for testing and extensions. Instances of HRegion should be instantiated with the newHRegion(Path, HLog, FileSystem, Configuration, HRegionInfo, HTableDescriptor, RegionServerServices) method.

Parameters:
tableDir - qualified path of directory where region should be located, usually the table directory.
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region is new), then read them from the supplied path.
rsServices - reference to RegionServerServices or null
See Also:
newHRegion(Path, HLog, FileSystem, Configuration, HRegionInfo, HTableDescriptor, RegionServerServices)
Method Detail

getSmallestReadPoint

public long getSmallestReadPoint()

initialize

public long initialize()
                throws IOException
Initialize this region.

Returns:
What the next sequence (edit) id should be.
Throws:
IOException - e

initialize

public long initialize(CancelableProgressable reporter)
                throws IOException
Initialize this region.

Parameters:
reporter - Tickle every so often if initialize is taking a while.
Returns:
What the next sequence (edit) id should be.
Throws:
IOException - e

hasReferences

public boolean hasReferences()
Returns:
True if this region has references.

getHDFSBlocksDistribution

public HDFSBlocksDistribution getHDFSBlocksDistribution()
This function will return the HDFS blocks distribution based on the data captured when HFile is created

Returns:
The HDFS blocks distribution for the region.

computeHDFSBlocksDistribution

public static HDFSBlocksDistribution computeHDFSBlocksDistribution(org.apache.hadoop.conf.Configuration conf,
                                                                   HTableDescriptor tableDescriptor,
                                                                   String regionEncodedName)
                                                            throws IOException
This is a helper function to compute HDFS block distribution on demand

Parameters:
conf - configuration
tableDescriptor - HTableDescriptor of the table
regionEncodedName - encoded name of the region
Returns:
The HDFS blocks distribution for the given region.
Throws:
IOException

getMemstoreSize

public AtomicLong getMemstoreSize()

addAndGetGlobalMemstoreSize

public long addAndGetGlobalMemstoreSize(long memStoreSize)
Increase the size of mem store in this region and the size of global mem store

Parameters:
memStoreSize -
Returns:
the size of memstore in this region

getRegionInfo

public HRegionInfo getRegionInfo()
Returns:
a HRegionInfo object for this region

getRequestsCount

public long getRequestsCount()
Returns:
requestsCount for this region

getReadRequestsCount

public long getReadRequestsCount()
Returns:
readRequestsCount for this region

getWriteRequestsCount

public long getWriteRequestsCount()
Returns:
writeRequestsCount for this region

isClosed

public boolean isClosed()
Returns:
true if region is closed

isClosing

public boolean isClosing()
Returns:
True if closing process has started.

isAvailable

public boolean isAvailable()
Returns:
true if region is available (not closed and not closing)

isSplittable

public boolean isSplittable()
Returns:
true if region is splittable

getMVCC

public MultiVersionConsistencyControl getMVCC()

close

public List<StoreFile> close()
                      throws IOException
Close down this HRegion. Flush the cache, shut down each HStore, don't service any more calls.

This method could take some time to execute, so don't call it from a time-sensitive thread.

Returns:
Vector of all the storage files that the HRegion's component HStores make use of. It's a list of all HStoreFile objects. Returns empty vector if already closed and null if judged that it should not close.
Throws:
IOException - e

close

public List<StoreFile> close(boolean abort)
                      throws IOException
Close down this HRegion. Flush the cache unless abort parameter is true, Shut down each HStore, don't service any more calls. This method could take some time to execute, so don't call it from a time-sensitive thread.

Parameters:
abort - true if server is aborting (only during testing)
Returns:
Vector of all the storage files that the HRegion's component HStores make use of. It's a list of HStoreFile objects. Can be null if we are not to close at this time or we are already closed.
Throws:
IOException - e

getStoreOpenAndCloseThreadPool

protected ThreadPoolExecutor getStoreOpenAndCloseThreadPool(String threadNamePrefix)

getStoreFileOpenAndCloseThreadPool

protected ThreadPoolExecutor getStoreFileOpenAndCloseThreadPool(String threadNamePrefix)

getStartKey

public byte[] getStartKey()
Returns:
start key for region

getEndKey

public byte[] getEndKey()
Returns:
end key for region

getRegionId

public long getRegionId()
Returns:
region id

getRegionName

public byte[] getRegionName()
Returns:
region name

getRegionNameAsString

public String getRegionNameAsString()
Returns:
region name as string for logging

getTableDesc

public HTableDescriptor getTableDesc()
Returns:
HTableDescriptor for this region

getLog

public HLog getLog()
Returns:
HLog in use for this region

getConf

public org.apache.hadoop.conf.Configuration getConf()
Returns:
Configuration object

getRegionDir

public org.apache.hadoop.fs.Path getRegionDir()
Returns:
region directory Path

getRegionDir

public static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path tabledir,
                                                     String name)
Computes the Path of the HRegion

Parameters:
tabledir - qualified path for table
name - ENCODED region name
Returns:
Path of HRegion directory

getFilesystem

public org.apache.hadoop.fs.FileSystem getFilesystem()
Returns:
FileSystem being used by this region

getLastFlushTime

public long getLastFlushTime()
Returns:
the last time the region was flushed

getRecentFlushInfo

public List<Pair<Long,Long>> getRecentFlushInfo()
Returns:
info about the last flushes

getLargestHStoreSize

public long getLargestHStoreSize()
Returns:
returns size of largest HStore.

compactStores

public void compactStores(boolean majorCompaction)
                   throws IOException
This is a helper function that compact all the stores synchronously It is used by utilities and testing

Parameters:
majorCompaction - True to force a major compaction regardless of thresholds
Throws:
IOException - e

compactStores

public void compactStores()
                   throws IOException
This is a helper function that compact all the stores synchronously It is used by utilities and testing

Throws:
IOException - e

compact

public boolean compact(CompactionRequest cr)
                throws IOException
Throws:
IOException

flushcache

public boolean flushcache()
                   throws IOException
Flush the cache. When this method is called the cache will be flushed unless:
  1. the cache is empty
  2. the region is closed.
  3. a flush is already in progress
  4. writes are disabled

This method may block for some time, so it should not be called from a time-sensitive thread.

Returns:
true if cache was flushed
Throws:
IOException - general io exceptions
DroppedSnapshotException - Thrown when replay of hlog is required because a Snapshot was not properly persisted.

internalFlushcache

protected boolean internalFlushcache(MonitoredTask status)
                              throws IOException
Flush the memstore. Flushing the memstore is a little tricky. We have a lot of updates in the memstore, all of which have also been written to the log. We need to write those updates in the memstore out to disk, while being able to process reads/writes as much as possible during the flush operation. Also, the log has to state clearly the point in time at which the memstore was flushed. (That way, during recovery, we know when we can rely on the on-disk flushed structures and when we have to recover the memstore from the log.)

So, we have a three-step process:

This method is protected, but can be accessed via several public routes.

This method may block for some time.

Parameters:
status -
Returns:
true if the region needs compacting
Throws:
IOException - general io exceptions
DroppedSnapshotException - Thrown when replay of hlog is required because a Snapshot was not properly persisted.

internalFlushcache

protected boolean internalFlushcache(HLog wal,
                                     long myseqid,
                                     MonitoredTask status)
                              throws IOException
Parameters:
wal - Null if we're NOT to go via hlog/wal.
myseqid - The seqid to use if wal is null writing out flush file.
status -
Returns:
true if the region needs compacting
Throws:
IOException
See Also:
internalFlushcache(MonitoredTask)

getCompleteCacheFlushSequenceId

protected long getCompleteCacheFlushSequenceId(long currentSequenceId)
Get the sequence number to be associated with this cache flush. Used by TransactionalRegion to not complete pending transactions.

Parameters:
currentSequenceId -
Returns:
sequence id to complete the cache flush with

getClosestRowBefore

public Result getClosestRowBefore(byte[] row,
                                  byte[] family)
                           throws IOException
Return all the data for the row that matches row exactly, or the one that immediately preceeds it, at or immediately before ts.

Parameters:
row - row key
family - column family to find on
Returns:
map of values
Throws:
IOException - read exceptions

getScanner

public RegionScanner getScanner(Scan scan)
                         throws IOException
Return an iterator that scans over the HRegion, returning the indicated columns and rows specified by the Scan.

This Iterator must be closed by the caller.

Parameters:
scan - configured Scan
Returns:
RegionScanner
Throws:
IOException - read exceptions

getScanner

protected RegionScanner getScanner(Scan scan,
                                   List<KeyValueScanner> additionalScanners)
                            throws IOException
Throws:
IOException

instantiateRegionScanner

protected RegionScanner instantiateRegionScanner(Scan scan,
                                                 List<KeyValueScanner> additionalScanners)
                                          throws IOException
Throws:
IOException

delete

public void delete(Delete delete,
                   Integer lockid,
                   boolean writeToWAL)
            throws IOException
Parameters:
delete - delete object
lockid - existing lock id, or null for grab a lock
writeToWAL - append to the write ahead lock or not
Throws:
IOException - read exceptions

put

public void put(Put put)
         throws IOException
Parameters:
put -
Throws:
IOException

put

public void put(Put put,
                boolean writeToWAL)
         throws IOException
Parameters:
put -
writeToWAL -
Throws:
IOException

put

public void put(Put put,
                Integer lockid)
         throws IOException
Parameters:
put -
lockid -
Throws:
IOException

put

public void put(Put put,
                Integer lockid,
                boolean writeToWAL)
         throws IOException
Parameters:
put -
lockid -
writeToWAL -
Throws:
IOException

put

public OperationStatus[] put(Put[] puts)
                      throws IOException
Perform a batch put with no pre-specified locks

Throws:
IOException
See Also:
batchMutate(Pair[])

put

@Deprecated
public OperationStatus[] put(Pair<Put,Integer>[] putsAndLocks)
                      throws IOException
Deprecated. Instead use batchMutate(Pair[])

Perform a batch of puts.

Parameters:
putsAndLocks - the list of puts paired with their requested lock IDs.
Returns:
an array of OperationStatus which internally contains the OperationStatusCode and the exceptionMessage if any.
Throws:
IOException

batchMutate

public OperationStatus[] batchMutate(Pair<Mutation,Integer>[] mutationsAndLocks)
                              throws IOException
Perform a batch of mutations. It supports only Put and Delete mutations and will ignore other types passed.

Parameters:
mutationsAndLocks - the list of mutations paired with their requested lock IDs.
Returns:
an array of OperationStatus which internally contains the OperationStatusCode and the exceptionMessage if any.
Throws:
IOException

checkAndMutate

public boolean checkAndMutate(byte[] row,
                              byte[] family,
                              byte[] qualifier,
                              CompareFilter.CompareOp compareOp,
                              WritableByteArrayComparable comparator,
                              org.apache.hadoop.io.Writable w,
                              Integer lockId,
                              boolean writeToWAL)
                       throws IOException
Parameters:
row -
family -
qualifier -
compareOp -
comparator -
lockId -
writeToWAL -
Returns:
true if the new put was execute, false otherwise
Throws:
IOException

checkReadOnly

protected void checkReadOnly()
                      throws IOException
Throws:
IOException - Throws exception if region is in read-only mode.

replayRecoveredEditsIfAny

protected long replayRecoveredEditsIfAny(org.apache.hadoop.fs.Path regiondir,
                                         long minSeqId,
                                         CancelableProgressable reporter,
                                         MonitoredTask status)
                                  throws UnsupportedEncodingException,
                                         IOException
Read the edits log put under this region by wal log splitting process. Put the recovered edits back up into this region.

We can ignore any log message that has a sequence ID that's equal to or lower than minSeqId. (Because we know such log messages are already reflected in the HFiles.)

While this is running we are putting pressure on memory yet we are outside of our usual accounting because we are not yet an onlined region (this stuff is being run as part of Region initialization). This means that if we're up against global memory limits, we'll not be flagged to flush because we are not online. We can't be flushed by usual mechanisms anyways; we're not yet online so our relative sequenceids are not yet aligned with HLog sequenceids -- not till we come up online, post processing of split edits.

But to help relieve memory pressure, at least manage our own heap size flushing if are in excess of per-region limits. Flushing, though, we have to be careful and avoid using the regionserver/hlog sequenceid. Its running on a different line to whats going on in here in this region context so if we crashed replaying these edits, but in the midst had a flush that used the regionserver log with a sequenceid in excess of whats going on in here in this region and with its split editlogs, then we could miss edits the next time we go to recover. So, we have to flush inline, using seqids that make sense in a this single region context only -- until we online.

Parameters:
regiondir -
minSeqId - Any edit found in split editlogs needs to be in excess of this minSeqId to be applied, else its skipped.
reporter -
Returns:
the sequence id of the last edit added to this region out of the recovered edits log or minSeqId if nothing added from editlogs.
Throws:
UnsupportedEncodingException
IOException

restoreEdit

protected boolean restoreEdit(Store s,
                              KeyValue kv)
Used by tests

Parameters:
s - Store to add edit too.
kv - KeyValue to add.
Returns:
True if we should flush.

instantiateHStore

protected Store instantiateHStore(org.apache.hadoop.fs.Path tableDir,
                                  HColumnDescriptor c)
                           throws IOException
Throws:
IOException

getStore

public Store getStore(byte[] column)
Return HStore instance. Use with caution. Exposed for use of fixup utilities.

Parameters:
column - Name of column family hosted by this region.
Returns:
Store that goes with the family on passed column. TODO: Make this lookup faster.

getStores

public Map<byte[],Store> getStores()

getStoreFileList

public List<String> getStoreFileList(byte[][] columns)
                              throws IllegalArgumentException
Return list of storeFiles for the set of CFs. Uses closeLock to prevent the race condition where a region closes in between the for loop - closing the stores one by one, some stores will return 0 files.

Returns:
List of storeFiles.
Throws:
IllegalArgumentException

obtainRowLock

public Integer obtainRowLock(byte[] row)
                      throws IOException
Obtain a lock on the given row. Blocks until success. I know it's strange to have two mappings:
   ROWS  ==> LOCKS
 
as well as
   LOCKS ==> ROWS
 
But it acts as a guard on the client; a miswritten client just can't submit the name of a row and start writing to it; it must know the correct lockid, which matches the lock list in memory.

It would be more memory-efficient to assume a correctly-written client, which maybe we'll do in the future.

Parameters:
row - Name of row to lock.
Returns:
The id of the held lock.
Throws:
IOException

releaseRowLock

public void releaseRowLock(Integer lockId)
Release the row lock!

Parameters:
lockId - The lock ID to release.

getLock

public Integer getLock(Integer lockid,
                       byte[] row,
                       boolean waitForLock)
                throws IOException
Returns existing row lock if found, otherwise obtains a new row lock and returns it.

Parameters:
lockid - requested by the user, or null if the user didn't already hold lock
row - the row to lock
waitForLock - if true, will block until the lock is available, otherwise will simply return null if it could not acquire the lock.
Returns:
lockid or null if waitForLock is false and the lock was unavailable.
Throws:
IOException

bulkLoadHFiles

public boolean bulkLoadHFiles(List<Pair<byte[],String>> familyPaths)
                       throws IOException
Attempts to atomically load a group of hfiles. This is critical for loading rows with multiple column families atomically.

Parameters:
familyPaths - List of Pair
Returns:
true if successful, false if failed recoverably
Throws:
IOException - if failed unrecoverably.

equals

public boolean equals(Object o)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

toString

public String toString()
Overrides:
toString in class Object

getTableDir

public org.apache.hadoop.fs.Path getTableDir()
Returns:
Path of region base directory

newHRegion

public static HRegion newHRegion(org.apache.hadoop.fs.Path tableDir,
                                 HLog log,
                                 org.apache.hadoop.fs.FileSystem fs,
                                 org.apache.hadoop.conf.Configuration conf,
                                 HRegionInfo regionInfo,
                                 HTableDescriptor htd,
                                 RegionServerServices rsServices)
A utility method to create new instances of HRegion based on the HConstants.REGION_IMPL configuration property.

Parameters:
tableDir - qualified path of directory where region should be located, usually the table directory.
log - The HLog is the outbound log for any updates to the HRegion (There's a single HLog for all the HRegions on a single HRegionServer.) The log file is a logfile from the previous execution that's custom-computed for this HRegion. The HRegionServer computes and sorts the appropriate log info for this HRegion. If there is a previous log file (implying that the HRegion has been written-to before), then read it from the supplied path.
fs - is the filesystem.
conf - is global configuration settings.
regionInfo - - HRegionInfo that describes the region is new), then read them from the supplied path.
htd -
rsServices -
Returns:
the new instance

createHRegion

public static HRegion createHRegion(HRegionInfo info,
                                    org.apache.hadoop.fs.Path rootDir,
                                    org.apache.hadoop.conf.Configuration conf,
                                    HTableDescriptor hTableDescriptor)
                             throws IOException
Convenience method creating new HRegions. Used by createTable and by the bootstrap code in the HMaster constructor. Note, this method creates an HLog for the created region. It needs to be closed explicitly. Use getLog() to get access. When done with a region created using this method, you will need to explicitly close the HLog it created too; it will not be done for you. Not closing the log will leave at least a daemon thread running. Call closeHRegion(HRegion) and it will do necessary cleanup for you.

Parameters:
info - Info for region to create.
rootDir - Root directory for HBase instance
conf -
hTableDescriptor -
Returns:
new HRegion
Throws:
IOException

closeHRegion

public static void closeHRegion(HRegion r)
                         throws IOException
This will do the necessary cleanup a call to createHRegion(HRegionInfo, Path, Configuration, HTableDescriptor) requires. This method will close the region and then close its associated HLog file. You use it if you call the other createHRegion, the one that takes an HLog instance but don't be surprised by the call to the HLog.closeAndDelete() on the HLog the HRegion was carrying.

Parameters:
r -
Throws:
IOException

createHRegion

public static HRegion createHRegion(HRegionInfo info,
                                    org.apache.hadoop.fs.Path rootDir,
                                    org.apache.hadoop.conf.Configuration conf,
                                    HTableDescriptor hTableDescriptor,
                                    HLog hlog)
                             throws IOException
Convenience method creating new HRegions. Used by createTable. The HLog for the created region needs to be closed explicitly. Use getLog() to get access.

Parameters:
info - Info for region to create.
rootDir - Root directory for HBase instance
conf -
hTableDescriptor -
hlog - shared HLog
Returns:
new HRegion
Throws:
IOException

openHRegion

public static HRegion openHRegion(HRegionInfo info,
                                  HTableDescriptor htd,
                                  HLog wal,
                                  org.apache.hadoop.conf.Configuration conf)
                           throws IOException
Open a Region.

Parameters:
info - Info for region to be opened.
wal - HLog for region to use. This method will call HLog#setSequenceNumber(long) passing the result of the call to HRegion#getMinSequenceId() to ensure the log id is properly kept up. HRegionStore does this every time it opens a new region.
conf -
Returns:
new HRegion
Throws:
IOException

openHRegion

public static HRegion openHRegion(HRegionInfo info,
                                  HTableDescriptor htd,
                                  HLog wal,
                                  org.apache.hadoop.conf.Configuration conf,
                                  RegionServerServices rsServices,
                                  CancelableProgressable reporter)
                           throws IOException
Open a Region.

Parameters:
info - Info for region to be opened
htd -
wal - HLog for region to use. This method will call HLog#setSequenceNumber(long) passing the result of the call to HRegion#getMinSequenceId() to ensure the log id is properly kept up. HRegionStore does this every time it opens a new region.
conf -
rsServices - An interface we can request flushes against.
reporter - An interface we can report progress against.
Returns:
new HRegion
Throws:
IOException

openHRegion

public static HRegion openHRegion(org.apache.hadoop.fs.Path tableDir,
                                  HRegionInfo info,
                                  HTableDescriptor htd,
                                  HLog wal,
                                  org.apache.hadoop.conf.Configuration conf)
                           throws IOException
Throws:
IOException

openHRegion

public static HRegion openHRegion(org.apache.hadoop.fs.Path tableDir,
                                  HRegionInfo info,
                                  HTableDescriptor htd,
                                  HLog wal,
                                  org.apache.hadoop.conf.Configuration conf,
                                  RegionServerServices rsServices,
                                  CancelableProgressable reporter)
                           throws IOException
Open a Region.

Parameters:
tableDir - Table directory
info - Info for region to be opened.
wal - HLog for region to use. This method will call HLog#setSequenceNumber(long) passing the result of the call to HRegion#getMinSequenceId() to ensure the log id is properly kept up. HRegionStore does this every time it opens a new region.
conf -
reporter - An interface we can report progress against.
Returns:
new HRegion
Throws:
IOException

openHRegion

protected HRegion openHRegion(CancelableProgressable reporter)
                       throws IOException
Open HRegion. Calls initialize and sets sequenceid.

Parameters:
reporter -
Returns:
Returns this
Throws:
IOException

addRegionToMETA

public static void addRegionToMETA(HRegion meta,
                                   HRegion r)
                            throws IOException
Inserts a new region's meta information into the passed meta region. Used by the HMaster bootstrap code adding new table to ROOT table.

Parameters:
meta - META HRegion to be updated
r - HRegion to add to meta
Throws:
IOException

deleteRegion

public static void deleteRegion(org.apache.hadoop.fs.FileSystem fs,
                                org.apache.hadoop.fs.Path rootdir,
                                HRegionInfo info)
                         throws IOException
Deletes all the files for a HRegion

Parameters:
fs - the file system object
rootdir - qualified path of HBase root directory
info - HRegionInfo for region to be deleted
Throws:
IOException

getRegionDir

public static org.apache.hadoop.fs.Path getRegionDir(org.apache.hadoop.fs.Path rootdir,
                                                     HRegionInfo info)
Computes the Path of the HRegion

Parameters:
rootdir - qualified path of HBase root directory
info - HRegionInfo for the region
Returns:
qualified path of region directory

rowIsInRange

public static boolean rowIsInRange(HRegionInfo info,
                                   byte[] row)
Determines if the specified row is within the row range specified by the specified HRegionInfo

Parameters:
info - HRegionInfo that specifies the row range
row - row to be checked
Returns:
true if the row is within the range specified by the HRegionInfo

makeColumnFamilyDirs

public static void makeColumnFamilyDirs(org.apache.hadoop.fs.FileSystem fs,
                                        org.apache.hadoop.fs.Path tabledir,
                                        HRegionInfo hri,
                                        byte[] colFamily)
                                 throws IOException
Make the directories for a specific column family

Parameters:
fs - the file system
tabledir - base directory where region will live (usually the table dir)
hri -
colFamily - the column family
Throws:
IOException

mergeAdjacent

public static HRegion mergeAdjacent(HRegion srcA,
                                    HRegion srcB)
                             throws IOException
Merge two HRegions. The regions must be adjacent and must not overlap.

Parameters:
srcA -
srcB -
Returns:
new merged HRegion
Throws:
IOException

merge

public static HRegion merge(HRegion a,
                            HRegion b)
                     throws IOException
Merge two regions whether they are adjacent or not.

Parameters:
a - region a
b - region b
Returns:
new merged region
Throws:
IOException

get

public Result get(Get get,
                  Integer lockid)
           throws IOException
Parameters:
get - get object
lockid - existing lock id, or null for no previous lock
Returns:
result
Throws:
IOException - read exceptions

mutateRow

public void mutateRow(RowMutations rm)
               throws IOException
Throws:
IOException

mutateRowsWithLocks

public void mutateRowsWithLocks(Collection<Mutation> mutations,
                                Collection<byte[]> rowsToLock)
                         throws IOException
Perform atomic mutations within the region.

Parameters:
mutations - The list of mutations to perform. mutations can contain operations for multiple rows. Caller has to ensure that all rows are contained in this region.
rowsToLock - Rows to lock If multiple rows are locked care should be taken that rowsToLock is sorted in order to avoid deadlocks.
Throws:
IOException

append

public Result append(Append append,
                     Integer lockid,
                     boolean writeToWAL)
              throws IOException
Perform one or more append operations on a row.

Appends performed are done under row lock but reads do not take locks out so this can be seen partially complete by gets and scans.

Parameters:
append -
lockid -
writeToWAL -
Returns:
new keyvalues after increment
Throws:
IOException

increment

public Result increment(Increment increment,
                        Integer lockid,
                        boolean writeToWAL)
                 throws IOException
Perform one or more increment operations on a row.

Increments performed are done under row lock but reads do not take locks out so this can be seen partially complete by gets and scans.

Parameters:
increment -
lockid -
writeToWAL -
Returns:
new keyvalues after increment
Throws:
IOException

incrementColumnValue

public long incrementColumnValue(byte[] row,
                                 byte[] family,
                                 byte[] qualifier,
                                 long amount,
                                 boolean writeToWAL)
                          throws IOException
Parameters:
row -
family -
qualifier -
amount -
writeToWAL -
Returns:
The new value.
Throws:
IOException

heapSize

public long heapSize()
Specified by:
heapSize in interface HeapSize
Returns:
Approximate 'exclusive deep size' of implementing object. Includes count of payload and hosting object sizings.

registerProtocol

public <T extends CoprocessorProtocol> boolean registerProtocol(Class<T> protocol,
                                                                T handler)
Registers a new CoprocessorProtocol subclass and instance to be available for handling exec(Exec) calls.

Only a single protocol type/handler combination may be registered per region. After the first registration, subsequent calls with the same protocol type will fail with a return value of false.

Type Parameters:
T - the protocol type
Parameters:
protocol - a CoprocessorProtocol subinterface defining the protocol methods
handler - an instance implementing the interface
Returns:
true if the registration was successful, false otherwise

exec

public ExecResult exec(Exec call)
                throws IOException
Executes a single CoprocessorProtocol method using the registered protocol handlers. CoprocessorProtocol implementations must be registered via the registerProtocol(Class, org.apache.hadoop.hbase.ipc.CoprocessorProtocol) method before they are available.

Parameters:
call - an Exec instance identifying the protocol, method name, and parameters for the method invocation
Returns:
an ExecResult instance containing the region name of the invocation and the return value
Throws:
IOException - if no registered protocol handler is found or an error occurs during the invocation
See Also:
registerProtocol(Class, org.apache.hadoop.hbase.ipc.CoprocessorProtocol)

prepareToSplit

protected void prepareToSplit()
Give the region a chance to prepare before it is split.


checkSplit

public byte[] checkSplit()
Return the splitpoint. null indicates the region isn't splittable If the splitpoint isn't explicitly specified, it will go over the stores to find the best splitpoint. Currently the criteria of best splitpoint is based on the size of the store.


getCompactPriority

public int getCompactPriority()
Returns:
The priority that this region should have in the compaction queue

needsCompaction

public boolean needsCompaction()
Checks every store to see if one has too many store files

Returns:
true if any store has too many store files

getCoprocessorHost

public RegionCoprocessorHost getCoprocessorHost()
Returns:
the coprocessor host

setCoprocessorHost

public void setCoprocessorHost(RegionCoprocessorHost coprocessorHost)
Parameters:
coprocessorHost - the new coprocessor host

main

public static void main(String[] args)
                 throws IOException
Facility for dumping and compacting catalog tables. Only does catalog tables since these are only tables we for sure know schema on. For usage run:
   ./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion
 

Parameters:
args -
Throws:
IOException


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.