12.2. WAL splitting

How edits are recovered from a crashed RegionServer

When a RegionServer crashes, it will lose its ephemeral lease in ZooKeeper...TODO

12.2.1. hbase.hlog.split.skip.errors

When set to true, the default, any error encountered splitting will be logged, the problematic WAL will be moved into the .corrupt directory under the hbase rootdir, and processing will continue. If set to false, the exception will be propagated and the split logged as failed.[16]

12.2.2. How EOFExceptions are treated when splitting a crashed RegionServers' WALs

If we get an EOF while splitting logs, we proceed with the split even when hbase.hlog.split.skip.errors == false. An EOF while reading the last log in the set of files to split is near-guaranteed since the RegionServer likely crashed mid-write of a record. But we'll continue even if we got an EOF reading other than the last file in the set.[17]



[16] See HBASE-2958 When hbase.hlog.split.skip.errors is set to false, we fail the split but thats it. We need to do more than just fail split if this flag is set.