org.apache.accumulo.examples.simple.mapreduce
Class TeraSortIngest

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class TeraSortIngest
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool

Generate the *almost* official terasort input data set. (See below) The user specifies the number of rows and the output directory and this class runs a map/reduce program to generate the data. The format of the data is:

This TeraSort is slightly modified to allow for variable length key sizes and value sizes. The row length isn't variable. To generate a terabyte of data in the same way TeraSort does use 10000000000 rows and 10/10 byte key length and 78/78 byte value length. Along with the 10 byte row id and \r\n this gives you 100 byte row * 10000000000 rows = 1tb. Min/Max ranges for key and value parameters are inclusive/inclusive respectively. Params [numsplits] numsplits allows you specify how many splits, and therefore mappers, to use


Nested Class Summary
static class TeraSortIngest.SortGenMapper
          The Mapper class that given a row number, will generate the appropriate output line.
 
Constructor Summary
TeraSortIngest()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

TeraSortIngest

public TeraSortIngest()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
Exception


Copyright © 2012 The Apache Software Foundation. All Rights Reserved.