org.apache.hadoop.hbase.util
Class HBaseFsck

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.hbase.util.HBaseFsck
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

@InterfaceAudience.Public
@InterfaceStability.Evolving
public class HBaseFsck
extends org.apache.hadoop.conf.Configured

HBaseFsck (hbck) is a tool for checking and repairing region consistency and table integrity problems in a corrupted HBase.

Region consistency checks verify that hbase:meta, region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance.

Table integrity checks verify that all possible row keys resolve to exactly one region of a table. This means there are no individual degenerate or backwards regions; no holes between regions; and that there are no overlapping regions.

The general repair strategy works in two phases:

  1. Repair Table Integrity on HDFS. (merge or fabricate regions)
  2. Repair Region Consistency with hbase:meta and assignments

For table integrity repairs, the tables' region directories are scanned for .regioninfo files. Each table's integrity is then verified. If there are any orphan regions (regions with no .regioninfo files) or holes, new regions are fabricated. Backwards regions are sidelined as well as empty degenerate (endkey==startkey) regions. If there are any overlapping regions, a new region is created and all data is merged into the new region.

Table integrity repairs deal solely with HDFS and could potentially be done offline -- the hbase region servers or master do not need to be running. This phase can eventually be used to completely reconstruct the hbase:meta table in an offline fashion.

Region consistency requires three conditions -- 1) valid .regioninfo file present in an HDFS region dir, 2) valid row with .regioninfo data in META, and 3) a region is deployed only at the regionserver that was assigned to with proper state in the master.

Region consistency repairs require hbase to be online so that hbck can contact the HBase master and region servers. The hbck#connect() method must first be called successfully. Much of the region consistency information is transient and less risky to repair.

If hbck is run from the command line, there are a handful of arguments that can be used to limit the kinds of repairs hbck will do. See the code in printUsageAndExit() for more details.


Nested Class Summary
static interface HBaseFsck.ErrorReporter
           
static class HBaseFsck.HbckInfo
          Maintain information about a particular region.
static class HBaseFsck.RegionRepairException
          Exception thrown when a integrity repair operation fails in an unresolvable way.
 class HBaseFsck.TableInfo
          Maintain information about a particular table.
 
Field Summary
static long DEFAULT_SLEEP_BEFORE_RERUN
           
static long DEFAULT_TIME_LAG
           
protected  ExecutorService executor
           
 
Constructor Summary
HBaseFsck(org.apache.hadoop.conf.Configuration conf)
          Constructor
HBaseFsck(org.apache.hadoop.conf.Configuration conf, ExecutorService exec)
          Constructor
 
Method Summary
 void checkRegionBoundaries()
           
 void connect()
          To repair region consistency, one must call connect() in order to repair online state.
protected  HFileCorruptionChecker createHFileCorruptionChecker(boolean sidelineCorruptHFiles)
           
static void debugLsr(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path p)
          ls -r for debugging purposes
static void debugLsr(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path p, HBaseFsck.ErrorReporter errors)
          ls -r for debugging purposes
 void dumpOverlapProblems(com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> regions)
           
 void dumpSidelinedRegions(Map<org.apache.hadoop.fs.Path,HBaseFsck.HbckInfo> regions)
           
 HBaseFsck exec(ExecutorService exec, String[] args)
           
 void fixEmptyMetaCells()
          To fix the empty REGIONINFO_QUALIFIER rows from hbase:meta
 void fixOrphanTables()
          To fix orphan table by creating a .tableinfo file under tableDir
1.
 HBaseFsck.ErrorReporter getErrors()
           
 HFileCorruptionChecker getHFilecorruptionChecker()
           
 int getMaxMerge()
           
 int getMaxOverlapsToSideline()
           
 com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> getOverlapGroups(TableName table)
           
 int getRetCode()
           
 void includeTable(TableName table)
           
static byte[] keyOnly(byte[] b)
           
 void loadHdfsRegionDirs()
          Scan HDFS for all regions, recording their information into regionInfoMap
static void main(String[] args)
          Main program
 int mergeRegionDirs(org.apache.hadoop.fs.Path targetRegionDir, HBaseFsck.HbckInfo contained)
          Merge hdfs data by moving from contained HbckInfo into targetRegionDir.
 void offlineHdfsIntegrityRepair()
          This repair method analyzes hbase data in hdfs and repairs it to satisfy the table integrity rules.
 int onlineConsistencyRepair()
          This repair method requires the cluster to be online since it contacts region servers and the masters.
 int onlineHbck()
          Contacts the master and prints out cluster-wide information
protected  HBaseFsck printUsageAndExit()
           
 boolean rebuildMeta(boolean fix)
          Rebuilds meta from information in hdfs/fs.
 void setCheckHdfs(boolean checking)
           
static void setDisplayFullReport()
          Display the full report from fsck.
 void setFixAssignments(boolean shouldFix)
          Fix inconsistencies found by fsck.
 void setFixEmptyMetaCells(boolean shouldFix)
           
 void setFixHdfsHoles(boolean shouldFix)
           
 void setFixHdfsOrphans(boolean shouldFix)
           
 void setFixHdfsOverlaps(boolean shouldFix)
           
 void setFixMeta(boolean shouldFix)
           
 void setFixReferenceFiles(boolean shouldFix)
           
 void setFixSplitParents(boolean shouldFix)
           
 void setFixTableLocks(boolean shouldFix)
          Set table locks fix mode.
 void setFixTableOrphans(boolean shouldFix)
           
 void setFixTableZNodes(boolean shouldFix)
          Set orphaned table ZNodes fix mode.
 void setFixVersionFile(boolean shouldFix)
           
 void setHFileCorruptionChecker(HFileCorruptionChecker hfcc)
           
 void setIgnorePreCheckPermission(boolean ignorePreCheckPermission)
           
 void setMaxMerge(int mm)
           
 void setMaxOverlapsToSideline(int mo)
           
 void setRetCode(int code)
           
 void setSidelineBigOverlaps(boolean sbo)
           
 void setSidelineDir(String sidelineDir)
           
 void setTimeLag(long seconds)
          We are interested in only those tables that have not changed their state in hbase:meta during the last few seconds specified by hbase.admin.fsck.timelag
 boolean shouldFixVersionFile()
           
 boolean shouldIgnorePreCheckPermission()
           
 boolean shouldSidelineBigOverlaps()
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_TIME_LAG

public static final long DEFAULT_TIME_LAG
See Also:
Constant Field Values

DEFAULT_SLEEP_BEFORE_RERUN

public static final long DEFAULT_SLEEP_BEFORE_RERUN
See Also:
Constant Field Values

executor

protected ExecutorService executor
Constructor Detail

HBaseFsck

public HBaseFsck(org.apache.hadoop.conf.Configuration conf)
          throws MasterNotRunningException,
                 ZooKeeperConnectionException,
                 IOException,
                 ClassNotFoundException
Constructor

Parameters:
conf - Configuration object
Throws:
MasterNotRunningException - if the master is not running
ZooKeeperConnectionException - if unable to connect to ZooKeeper
IOException
ClassNotFoundException

HBaseFsck

public HBaseFsck(org.apache.hadoop.conf.Configuration conf,
                 ExecutorService exec)
          throws MasterNotRunningException,
                 ZooKeeperConnectionException,
                 IOException,
                 ClassNotFoundException
Constructor

Parameters:
conf - Configuration object
Throws:
MasterNotRunningException - if the master is not running
ZooKeeperConnectionException - if unable to connect to ZooKeeper
IOException
ClassNotFoundException
Method Detail

connect

public void connect()
             throws IOException
To repair region consistency, one must call connect() in order to repair online state.

Throws:
IOException

offlineHdfsIntegrityRepair

public void offlineHdfsIntegrityRepair()
                                throws IOException,
                                       InterruptedException
This repair method analyzes hbase data in hdfs and repairs it to satisfy the table integrity rules. HBase doesn't need to be online for this operation to work.

Throws:
IOException
InterruptedException

onlineConsistencyRepair

public int onlineConsistencyRepair()
                            throws IOException,
                                   org.apache.zookeeper.KeeperException,
                                   InterruptedException
This repair method requires the cluster to be online since it contacts region servers and the masters. It makes each region's state in HDFS, in hbase:meta, and deployments consistent.

Returns:
If > 0 , number of errors detected, if < 0 there was an unrecoverable error. If 0, we have a clean hbase.
Throws:
IOException
org.apache.zookeeper.KeeperException
InterruptedException

onlineHbck

public int onlineHbck()
               throws IOException,
                      org.apache.zookeeper.KeeperException,
                      InterruptedException,
                      com.google.protobuf.ServiceException
Contacts the master and prints out cluster-wide information

Returns:
0 on success, non-zero on failure
Throws:
IOException
org.apache.zookeeper.KeeperException
InterruptedException
com.google.protobuf.ServiceException

keyOnly

public static byte[] keyOnly(byte[] b)

checkRegionBoundaries

public void checkRegionBoundaries()

getErrors

public HBaseFsck.ErrorReporter getErrors()

fixEmptyMetaCells

public void fixEmptyMetaCells()
                       throws IOException
To fix the empty REGIONINFO_QUALIFIER rows from hbase:meta

Throws:
IOException

fixOrphanTables

public void fixOrphanTables()
                     throws IOException
To fix orphan table by creating a .tableinfo file under tableDir
1. if TableInfo is cached, to recover the .tableinfo accordingly
2. else create a default .tableinfo file with following items
 2.1 the correct tablename
 2.2 the correct colfamily list
 2.3 the default properties for both HTableDescriptor and HColumnDescriptor

Throws:
IOException

rebuildMeta

public boolean rebuildMeta(boolean fix)
                    throws IOException,
                           InterruptedException
Rebuilds meta from information in hdfs/fs. Depends on configuration settings passed into hbck constructor to point to a particular fs/dir.

Parameters:
fix - flag that determines if method should attempt to fix holes
Returns:
true if successful, false if attempt failed.
Throws:
IOException
InterruptedException

loadHdfsRegionDirs

public void loadHdfsRegionDirs()
                        throws IOException,
                               InterruptedException
Scan HDFS for all regions, recording their information into regionInfoMap

Throws:
IOException
InterruptedException

mergeRegionDirs

public int mergeRegionDirs(org.apache.hadoop.fs.Path targetRegionDir,
                           HBaseFsck.HbckInfo contained)
                    throws IOException
Merge hdfs data by moving from contained HbckInfo into targetRegionDir.

Returns:
number of file move fixes done to merge regions.
Throws:
IOException

dumpOverlapProblems

public void dumpOverlapProblems(com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> regions)

dumpSidelinedRegions

public void dumpSidelinedRegions(Map<org.apache.hadoop.fs.Path,HBaseFsck.HbckInfo> regions)

getOverlapGroups

public com.google.common.collect.Multimap<byte[],HBaseFsck.HbckInfo> getOverlapGroups(TableName table)

setDisplayFullReport

public static void setDisplayFullReport()
Display the full report from fsck. This displays all live and dead region servers, and all known regions.


setFixTableLocks

public void setFixTableLocks(boolean shouldFix)
Set table locks fix mode. Delete table locks held for a long time


setFixTableZNodes

public void setFixTableZNodes(boolean shouldFix)
Set orphaned table ZNodes fix mode. Set the table state to disable in the orphaned table ZNode.


setFixAssignments

public void setFixAssignments(boolean shouldFix)
Fix inconsistencies found by fsck. This should try to fix errors (if any) found by fsck utility.


setFixMeta

public void setFixMeta(boolean shouldFix)

setFixEmptyMetaCells

public void setFixEmptyMetaCells(boolean shouldFix)

setCheckHdfs

public void setCheckHdfs(boolean checking)

setFixHdfsHoles

public void setFixHdfsHoles(boolean shouldFix)

setFixTableOrphans

public void setFixTableOrphans(boolean shouldFix)

setFixHdfsOverlaps

public void setFixHdfsOverlaps(boolean shouldFix)

setFixHdfsOrphans

public void setFixHdfsOrphans(boolean shouldFix)

setFixVersionFile

public void setFixVersionFile(boolean shouldFix)

shouldFixVersionFile

public boolean shouldFixVersionFile()

setSidelineBigOverlaps

public void setSidelineBigOverlaps(boolean sbo)

shouldSidelineBigOverlaps

public boolean shouldSidelineBigOverlaps()

setFixSplitParents

public void setFixSplitParents(boolean shouldFix)

setFixReferenceFiles

public void setFixReferenceFiles(boolean shouldFix)

shouldIgnorePreCheckPermission

public boolean shouldIgnorePreCheckPermission()

setIgnorePreCheckPermission

public void setIgnorePreCheckPermission(boolean ignorePreCheckPermission)

setMaxMerge

public void setMaxMerge(int mm)
Parameters:
mm - maximum number of regions to merge into a single region.

getMaxMerge

public int getMaxMerge()

setMaxOverlapsToSideline

public void setMaxOverlapsToSideline(int mo)

getMaxOverlapsToSideline

public int getMaxOverlapsToSideline()

includeTable

public void includeTable(TableName table)

setTimeLag

public void setTimeLag(long seconds)
We are interested in only those tables that have not changed their state in hbase:meta during the last few seconds specified by hbase.admin.fsck.timelag

Parameters:
seconds - - the time in seconds

setSidelineDir

public void setSidelineDir(String sidelineDir)
Parameters:
sidelineDir - - HDFS path to sideline data

createHFileCorruptionChecker

protected HFileCorruptionChecker createHFileCorruptionChecker(boolean sidelineCorruptHFiles)
                                                       throws IOException
Throws:
IOException

getHFilecorruptionChecker

public HFileCorruptionChecker getHFilecorruptionChecker()

setHFileCorruptionChecker

public void setHFileCorruptionChecker(HFileCorruptionChecker hfcc)

setRetCode

public void setRetCode(int code)

getRetCode

public int getRetCode()

printUsageAndExit

protected HBaseFsck printUsageAndExit()

main

public static void main(String[] args)
                 throws Exception
Main program

Parameters:
args -
Throws:
HBaseFsck.RegionRepairException
Exception

exec

public HBaseFsck exec(ExecutorService exec,
                      String[] args)
               throws org.apache.zookeeper.KeeperException,
                      IOException,
                      com.google.protobuf.ServiceException,
                      InterruptedException
Throws:
org.apache.zookeeper.KeeperException
IOException
com.google.protobuf.ServiceException
InterruptedException

debugLsr

public static void debugLsr(org.apache.hadoop.conf.Configuration conf,
                            org.apache.hadoop.fs.Path p)
                     throws IOException
ls -r for debugging purposes

Throws:
IOException

debugLsr

public static void debugLsr(org.apache.hadoop.conf.Configuration conf,
                            org.apache.hadoop.fs.Path p,
                            HBaseFsck.ErrorReporter errors)
                     throws IOException
ls -r for debugging purposes

Throws:
IOException


Copyright © 2007–2015 The Apache Software Foundation. All rights reserved.