org.apache.accumulo.examples.simple.mapreduce
Class TeraSortIngest
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class TeraSortIngest
- extends org.apache.hadoop.conf.Configured
- implements org.apache.hadoop.util.Tool
Generate the *almost* official terasort input data set. (See below) The user specifies the number of rows and the output directory and this class runs a
map/reduce program to generate the data. The format of the data is:
- (10 bytes key) (10 bytes rowid) (78 bytes filler) \r \n
- The keys are random characters from the set ' ' .. '~'.
- The rowid is the right justified row id as a int.
- The filler consists of 7 runs of 10 characters from 'A' to 'Z'.
This TeraSort is slightly modified to allow for variable length key sizes and value sizes. The row length isn't variable. To generate a terabyte of data in
the same way TeraSort does use 10000000000 rows and 10/10 byte key length and 78/78 byte value length. Along with the 10 byte row id and \r\n this gives you
100 byte row * 10000000000 rows = 1tb. Min/Max ranges for key and value parameters are inclusive/inclusive respectively.
Nested Class Summary |
static class |
TeraSortIngest.SortGenMapper
The Mapper class that given a row number, will generate the appropriate output line. |
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
TeraSortIngest
public TeraSortIngest()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Specified by:
run
in interface org.apache.hadoop.util.Tool
- Throws:
Exception
Copyright © 2013 Apache Accumulo Project. All Rights Reserved.