org.apache.hadoop.hbase.mapreduce
Class Import

java.lang.Object
  extended by org.apache.hadoop.hbase.mapreduce.Import

@InterfaceAudience.Public
@InterfaceStability.Stable
public class Import
extends Object

Import data written by Export.


Field Summary
static String BULK_OUTPUT_CONF_KEY
           
static String CF_RENAME_PROP
           
static String FILTER_ARGS_CONF_KEY
           
static String FILTER_CLASS_CONF_KEY
           
static String HAS_LARGE_RESULT
           
static String TABLE_NAME
           
static String WAL_DURABILITY
           
 
Constructor Summary
Import()
           
 
Method Summary
static void addFilterAndArguments(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs)
          Add a Filter to be instantiated on import
static void configureCfRenaming(org.apache.hadoop.conf.Configuration conf, Map<String,String> renameMap)
          Sets a configuration property with key CF_RENAME_PROP in conf that tells the mapper how to rename column families.
static org.apache.hadoop.mapreduce.Job createSubmittableJob(org.apache.hadoop.conf.Configuration conf, String[] args)
          Sets up the actual job.
static Cell filterKv(Filter filter, Cell kv)
          Attempt to filter out the keyvalue
static void flushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf)
          If the durability is set to Durability.SKIP_WAL and the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash.
static Filter instantiateFilter(org.apache.hadoop.conf.Configuration conf)
          Create a Filter to apply to all incoming keys (KeyValues) to optionally not include in the job output
static void main(String[] args)
          Main entry point.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CF_RENAME_PROP

public static final String CF_RENAME_PROP
See Also:
Constant Field Values

BULK_OUTPUT_CONF_KEY

public static final String BULK_OUTPUT_CONF_KEY
See Also:
Constant Field Values

FILTER_CLASS_CONF_KEY

public static final String FILTER_CLASS_CONF_KEY
See Also:
Constant Field Values

FILTER_ARGS_CONF_KEY

public static final String FILTER_ARGS_CONF_KEY
See Also:
Constant Field Values

TABLE_NAME

public static final String TABLE_NAME
See Also:
Constant Field Values

WAL_DURABILITY

public static final String WAL_DURABILITY
See Also:
Constant Field Values

HAS_LARGE_RESULT

public static final String HAS_LARGE_RESULT
See Also:
Constant Field Values
Constructor Detail

Import

public Import()
Method Detail

instantiateFilter

public static Filter instantiateFilter(org.apache.hadoop.conf.Configuration conf)
Create a Filter to apply to all incoming keys (KeyValues) to optionally not include in the job output

Parameters:
conf - Configuration from which to load the filter
Returns:
the filter to use for the task, or null if no filter to should be used
Throws:
IllegalArgumentException - if the filter is misconfigured

filterKv

public static Cell filterKv(Filter filter,
                            Cell kv)
                     throws IOException
Attempt to filter out the keyvalue

Parameters:
kv - KeyValue on which to apply the filter
Returns:
null if the key should not be written, otherwise returns the original KeyValue
Throws:
IOException

configureCfRenaming

public static void configureCfRenaming(org.apache.hadoop.conf.Configuration conf,
                                       Map<String,String> renameMap)

Sets a configuration property with key CF_RENAME_PROP in conf that tells the mapper how to rename column families.

Alternately, instead of calling this function, you could set the configuration key CF_RENAME_PROP yourself. The value should look like

srcCf1:destCf1,srcCf2:destCf2,....
. This would have the same effect on the mapper behavior.

Parameters:
conf - the Configuration in which the CF_RENAME_PROP key will be set
renameMap - a mapping from source CF names to destination CF names

addFilterAndArguments

public static void addFilterAndArguments(org.apache.hadoop.conf.Configuration conf,
                                         Class<? extends Filter> clazz,
                                         List<String> filterArgs)
                                  throws IOException
Add a Filter to be instantiated on import

Parameters:
conf - Configuration to update (will be passed to the job)
clazz - Filter subclass to instantiate on the server.
filterArgs - List of arguments to pass to the filter on instantiation
Throws:
IOException

createSubmittableJob

public static org.apache.hadoop.mapreduce.Job createSubmittableJob(org.apache.hadoop.conf.Configuration conf,
                                                                   String[] args)
                                                            throws IOException
Sets up the actual job.

Parameters:
conf - The current configuration.
args - The command line parameters.
Returns:
The newly created job.
Throws:
IOException - When setting up the job fails.

flushRegionsIfNecessary

public static void flushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf)
                                    throws IOException,
                                           InterruptedException
If the durability is set to Durability.SKIP_WAL and the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash. This method flushes all the regions of the table in the scenarios of import data to hbase with Durability.SKIP_WAL

Throws:
IOException
InterruptedException

main

public static void main(String[] args)
                 throws Exception
Main entry point.

Parameters:
args - The command line parameters.
Throws:
Exception - When running the job fails.


Copyright © 2007–2015 The Apache Software Foundation. All rights reserved.