|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.hbase.util.RegionSplitter
public class RegionSplitter
The RegionSplitter
class provides several utilities to help in the
administration lifecycle for developers who choose to manually split regions
instead of having HBase handle that automatically. The most useful utilities
are:
Both operations can be safely done on a live server.
Question: How do I turn off automatic splitting?
Answer: Automatic splitting is determined by the configuration value
"hbase.hregion.max.filesize". It is not recommended that you set this
to Long.MAX_VALUE in case you forget about manual splits. A suggested setting
is 100GB, which would result in > 1hr major compactions if reached.
Question: Why did the original authors decide to manually split?
Answer: Specific workload characteristics of our use case allowed us
to benefit from a manual split system.
Question: Why is manual splitting good for this workload?
Answer: Although automated splitting is not a bad option, there are
benefits to manual splitting.
Question: What's the optimal number of pre-split regions to create?
Answer: Mileage will vary depending upon your application.
The short answer for our application is that we started with 10 pre-split regions / server and watched our data growth over time. It's better to err on the side of too little regions and rolling split later.
The more complicated answer is that this depends upon the largest storefile
in your region. With a growing data size, this will get larger over time. You
want the largest region to be just big enough that the Store
compact
selection algorithm only compacts it due to a timed major. If you don't, your
cluster can be prone to compaction storms as the algorithm decides to run
major compactions on a large series of regions all at once. Note that
compaction storms are due to the uniform data growth, not the manual split
decision.
If you pre-split your regions too thin, you can increase the major compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size grows too large, use this script to perform a network IO safe rolling split of all regions.
Nested Class Summary | |
---|---|
static class |
RegionSplitter.MD5StringSplit
MD5StringSplit is the default RegionSplitter.SplitAlgorithm for creating pre-split
tables. |
static interface |
RegionSplitter.SplitAlgorithm
A generic interface for the RegionSplitter code to use for all it's functionality. |
Constructor Summary | |
---|---|
RegionSplitter()
|
Method Summary | |
---|---|
static void |
main(String[] args)
The main function for the RegionSplitter application. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RegionSplitter()
Method Detail |
---|
public static void main(String[] args) throws IOException, InterruptedException, org.apache.commons.cli.ParseException
args
- Usage: RegionSplitter <TABLE> <-c <# regions> -f
<family:family:...> | -r [-o <# outstanding
splits>]> [-D <conf.param=value>]
IOException
- HBase IO problem
InterruptedException
- user requested exit
org.apache.commons.cli.ParseException
- problem parsing user input
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |