org.apache.hadoop.hbase.util
Class HBaseFsck

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.hbase.util.HBaseFsck
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class HBaseFsck
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

HBaseFsck (hbck) is a tool for checking and repairing region consistency and table integrity problems in a corrupted HBase.

Region consistency checks verify that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance.

Table integrity checks verify that all possible row keys resolve to exactly one region of a table. This means there are no individual degenerate or backwards regions; no holes between regions; and that there are no overlapping regions.

The general repair strategy works in two phases:

  1. Repair Table Integrity on HDFS. (merge or fabricate regions)
  2. Repair Region Consistency with .META. and assignments

For table integrity repairs, the tables' region directories are scanned for .regioninfo files. Each table's integrity is then verified. If there are any orphan regions (regions with no .regioninfo files) or holes, new regions are fabricated. Backwards regions are sidelined as well as empty degenerate (endkey==startkey) regions. If there are any overlapping regions, a new region is created and all data is merged into the new region.

Table integrity repairs deal solely with HDFS and could potentially be done offline -- the hbase region servers or master do not need to be running. This phase can eventually be used to completely reconstruct the META table in an offline fashion.

Region consistency requires three conditions -- 1) valid .regioninfo file present in an HDFS region dir, 2) valid row with .regioninfo data in META, and 3) a region is deployed only at the regionserver that was assigned to with proper state in the master.

Region consistency repairs require hbase to be online so that hbck can contact the HBase master and region servers. The hbck#connect() method must first be called successfully. Much of the region consistency information is transient and less risky to repair.

If hbck is run from the command line, there are a handful of arguments that can be used to limit the kinds of repairs hbck will do. See the code in printUsageAndExit() for more details.


Nested Class Summary
static interface HBaseFsck.ErrorReporter
           
static class HBaseFsck.HbckInfo
          Maintain information about a particular region.
static class HBaseFsck.RegionRepairException
          Exception thrown when a integrity repair operation fails in an unresolvable way.
 class HBaseFsck.TableInfo
          Maintain information about a particular table.
 
Field Summary
static long DEFAULT_SLEEP_BEFORE_RERUN
           
static long DEFAULT_TIME_LAG
           
protected  ExecutorService executor
           
 
Constructor Summary
HBaseFsck(org.apache.hadoop.conf.Configuration conf)
          Constructor
HBaseFsck(org.apache.hadoop.conf.Configuration conf, ExecutorService exec)
          Constructor
 
Method Summary
 void checkRegionBoundaries()
           
 void connect()
          To repair region consistency, one must call connect() in order to repair online state.
protected  HFileCorruptionChecker createHFileCorruptionChecker(boolean sidelineCorruptHFiles)
           
static void debugLsr(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path p)
          ls -r for debugging purposes
static void debugLsr(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path p, HBaseFsck.ErrorReporter errors)
          ls -r for debugging purposes
 void dumpOverlapProblems(com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> regions)
           
 void dumpSidelinedRegions(Map<org.apache.hadoop.fs.Path,HBaseFsck.HbckInfo> regions)
           
 HBaseFsck exec(ExecutorService exec, String[] args)
           
 void fixOrphanTables()
          To fix orphan table by creating a .tableinfo file under tableDir
1.
 HBaseFsck.ErrorReporter getErrors()
           
 HFileCorruptionChecker getHFilecorruptionChecker()
           
 int getMaxMerge()
           
 int getMaxOverlapsToSideline()
           
 com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> getOverlapGroups(String table)
           
 int getRetCode()
           
 void includeTable(String table)
           
static byte[] keyOnly(byte[] b)
           
 void loadHdfsRegionDirs()
          Scan HDFS for all regions, recording their information into regionInfoMap
static void main(String[] args)
          Main program
 int mergeRegionDirs(org.apache.hadoop.fs.Path targetRegionDir, HBaseFsck.HbckInfo contained)
          Merge hdfs data by moving from contained HbckInfo into targetRegionDir.
 void offlineHdfsIntegrityRepair()
          This repair method analyzes hbase data in hdfs and repairs it to satisfy the table integrity rules.
 int onlineConsistencyRepair()
          This repair method requires the cluster to be online since it contacts region servers and the masters.
 int onlineHbck()
          Contacts the master and prints out cluster-wide information
protected  HBaseFsck printUsageAndExit()
           
 boolean rebuildMeta(boolean fix)
          Rebuilds meta from information in hdfs/fs.
 int run(String[] args)
           
 void setCheckHdfs(boolean checking)
           
 void setDisplayFullReport()
          Display the full report from fsck.
 void setFixAssignments(boolean shouldFix)
          Fix inconsistencies found by fsck.
 void setFixHdfsHoles(boolean shouldFix)
           
 void setFixHdfsOrphans(boolean shouldFix)
           
 void setFixHdfsOverlaps(boolean shouldFix)
           
 void setFixMeta(boolean shouldFix)
           
 void setFixReferenceFiles(boolean shouldFix)
           
 void setFixSplitParents(boolean shouldFix)
           
 void setFixTableOrphans(boolean shouldFix)
           
 void setFixVersionFile(boolean shouldFix)
           
 void setHFileCorruptionChecker(HFileCorruptionChecker hfcc)
           
 void setIgnorePreCheckPermission(boolean ignorePreCheckPermission)
           
 void setMaxMerge(int mm)
           
 void setMaxOverlapsToSideline(int mo)
           
 void setRetCode(int code)
           
 void setSidelineBigOverlaps(boolean sbo)
           
 void setSidelineDir(String sidelineDir)
           
 void setTimeLag(long seconds)
          We are interested in only those tables that have not changed their state in META during the last few seconds specified by hbase.admin.fsck.timelag
 boolean shouldFixVersionFile()
           
 boolean shouldIgnorePreCheckPermission()
           
 boolean shouldSidelineBigOverlaps()
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

DEFAULT_TIME_LAG

public static final long DEFAULT_TIME_LAG
See Also:
Constant Field Values

DEFAULT_SLEEP_BEFORE_RERUN

public static final long DEFAULT_SLEEP_BEFORE_RERUN
See Also:
Constant Field Values

executor

protected ExecutorService executor
Constructor Detail

HBaseFsck

public HBaseFsck(org.apache.hadoop.conf.Configuration conf)
          throws MasterNotRunningException,
                 ZooKeeperConnectionException,
                 IOException,
                 ClassNotFoundException
Constructor

Parameters:
conf - Configuration object
Throws:
MasterNotRunningException - if the master is not running
ZooKeeperConnectionException - if unable to connect to ZooKeeper
IOException
ClassNotFoundException

HBaseFsck

public HBaseFsck(org.apache.hadoop.conf.Configuration conf,
                 ExecutorService exec)
          throws MasterNotRunningException,
                 ZooKeeperConnectionException,
                 IOException,
                 ClassNotFoundException
Constructor

Parameters:
conf - Configuration object
Throws:
MasterNotRunningException - if the master is not running
ZooKeeperConnectionException - if unable to connect to ZooKeeper
IOException
ClassNotFoundException
Method Detail

connect

public void connect()
             throws IOException
To repair region consistency, one must call connect() in order to repair online state.

Throws:
IOException

offlineHdfsIntegrityRepair

public void offlineHdfsIntegrityRepair()
                                throws IOException,
                                       InterruptedException
This repair method analyzes hbase data in hdfs and repairs it to satisfy the table integrity rules. HBase doesn't need to be online for this operation to work.

Throws:
IOException
InterruptedException

onlineConsistencyRepair

public int onlineConsistencyRepair()
                            throws IOException,
                                   org.apache.zookeeper.KeeperException,
                                   InterruptedException
This repair method requires the cluster to be online since it contacts region servers and the masters. It makes each region's state in HDFS, in .META., and deployments consistent.

Returns:
If > 0 , number of errors detected, if < 0 there was an unrecoverable error. If 0, we have a clean hbase.
Throws:
IOException
org.apache.zookeeper.KeeperException
InterruptedException

onlineHbck

public int onlineHbck()
               throws IOException,
                      org.apache.zookeeper.KeeperException,
                      InterruptedException
Contacts the master and prints out cluster-wide information

Returns:
0 on success, non-zero on failure
Throws:
IOException
org.apache.zookeeper.KeeperException
InterruptedException

keyOnly

public static byte[] keyOnly(byte[] b)

checkRegionBoundaries

public void checkRegionBoundaries()

getErrors

public HBaseFsck.ErrorReporter getErrors()

fixOrphanTables

public void fixOrphanTables()
                     throws IOException
To fix orphan table by creating a .tableinfo file under tableDir
1. if TableInfo is cached, to recover the .tableinfo accordingly
2. else create a default .tableinfo file with following items
 2.1 the correct tablename
 2.2 the correct colfamily list
 2.3 the default properties for both HTableDescriptor and HColumnDescriptor

Throws:
IOException

rebuildMeta

public boolean rebuildMeta(boolean fix)
                    throws IOException,
                           InterruptedException
Rebuilds meta from information in hdfs/fs. Depends on configuration settings passed into hbck constructor to point to a particular fs/dir.

Parameters:
fix - flag that determines if method should attempt to fix holes
Returns:
true if successful, false if attempt failed.
Throws:
IOException
InterruptedException

loadHdfsRegionDirs

public void loadHdfsRegionDirs()
                        throws IOException,
                               InterruptedException
Scan HDFS for all regions, recording their information into regionInfoMap

Throws:
IOException
InterruptedException

mergeRegionDirs

public int mergeRegionDirs(org.apache.hadoop.fs.Path targetRegionDir,
                           HBaseFsck.HbckInfo contained)
                    throws IOException
Merge hdfs data by moving from contained HbckInfo into targetRegionDir.

Returns:
number of file move fixes done to merge regions.
Throws:
IOException

dumpOverlapProblems

public void dumpOverlapProblems(com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> regions)

dumpSidelinedRegions

public void dumpSidelinedRegions(Map<org.apache.hadoop.fs.Path,HBaseFsck.HbckInfo> regions)

getOverlapGroups

public com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> getOverlapGroups(String table)

setDisplayFullReport

public void setDisplayFullReport()
Display the full report from fsck. This displays all live and dead region servers, and all known regions.


setFixAssignments

public void setFixAssignments(boolean shouldFix)
Fix inconsistencies found by fsck. This should try to fix errors (if any) found by fsck utility.


setFixMeta

public void setFixMeta(boolean shouldFix)

setCheckHdfs

public void setCheckHdfs(boolean checking)

setFixHdfsHoles

public void setFixHdfsHoles(boolean shouldFix)

setFixTableOrphans

public void setFixTableOrphans(boolean shouldFix)

setFixHdfsOverlaps

public void setFixHdfsOverlaps(boolean shouldFix)

setFixHdfsOrphans

public void setFixHdfsOrphans(boolean shouldFix)

setFixVersionFile

public void setFixVersionFile(boolean shouldFix)

shouldFixVersionFile

public boolean shouldFixVersionFile()

setSidelineBigOverlaps

public void setSidelineBigOverlaps(boolean sbo)

shouldSidelineBigOverlaps

public boolean shouldSidelineBigOverlaps()

setFixSplitParents

public void setFixSplitParents(boolean shouldFix)

setFixReferenceFiles

public void setFixReferenceFiles(boolean shouldFix)

shouldIgnorePreCheckPermission

public boolean shouldIgnorePreCheckPermission()

setIgnorePreCheckPermission

public void setIgnorePreCheckPermission(boolean ignorePreCheckPermission)

setMaxMerge

public void setMaxMerge(int mm)
Parameters:
mm - maximum number of regions to merge into a single region.

getMaxMerge

public int getMaxMerge()

setMaxOverlapsToSideline

public void setMaxOverlapsToSideline(int mo)

getMaxOverlapsToSideline

public int getMaxOverlapsToSideline()

includeTable

public void includeTable(String table)

setTimeLag

public void setTimeLag(long seconds)
We are interested in only those tables that have not changed their state in META during the last few seconds specified by hbase.admin.fsck.timelag

Parameters:
seconds - - the time in seconds

setSidelineDir

public void setSidelineDir(String sidelineDir)
Parameters:
sidelineDir - - HDFS path to sideline data

createHFileCorruptionChecker

protected HFileCorruptionChecker createHFileCorruptionChecker(boolean sidelineCorruptHFiles)
                                                       throws IOException
Throws:
IOException

getHFilecorruptionChecker

public HFileCorruptionChecker getHFilecorruptionChecker()

setHFileCorruptionChecker

public void setHFileCorruptionChecker(HFileCorruptionChecker hfcc)

setRetCode

public void setRetCode(int code)

getRetCode

public int getRetCode()

printUsageAndExit

protected HBaseFsck printUsageAndExit()

main

public static void main(String[] args)
                 throws Exception
Main program

Parameters:
args -
Throws:
HBaseFsck.RegionRepairException
Exception

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
Exception

exec

public HBaseFsck exec(ExecutorService exec,
                      String[] args)
               throws org.apache.zookeeper.KeeperException,
                      IOException,
                      InterruptedException
Throws:
org.apache.zookeeper.KeeperException
IOException
InterruptedException

debugLsr

public static void debugLsr(org.apache.hadoop.conf.Configuration conf,
                            org.apache.hadoop.fs.Path p)
                     throws IOException
ls -r for debugging purposes

Throws:
IOException

debugLsr

public static void debugLsr(org.apache.hadoop.conf.Configuration conf,
                            org.apache.hadoop.fs.Path p,
                            HBaseFsck.ErrorReporter errors)
                     throws IOException
ls -r for debugging purposes

Throws:
IOException


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.