name | value | description |
io.sort.factor | 10 | The number of streams to merge at once while sorting
files. This determines the number of open file handles. |
io.sort.mb | 100 | The total amount of buffer memory to use while sorting
files, in megabytes. By default, gives each merge stream 1MB, which
should minimize seeks. |
io.file.buffer.size | 4096 | The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations. |
io.bytes.per.checksum | 512 | The number of bytes per checksum. Must not be larger than
io.file.buffer.size. |
io.skip.checksum.errors | false | If true, when a checksum error is encountered while
reading a sequence file, entries are skipped, instead of throwing an
exception. |
io.map.index.skip | 0 | Number of index entries to skip between each entry.
Zero by default. Setting this to values larger than zero can
facilitate opening large map files using less memory. |
fs.default.name | local | The name of the default file system. Either the
literal string "local" or a host:port for DFS. |
dfs.datanode.port | 50010 | The port number that the dfs datanode server uses as a starting
point to look for a free port to listen on.
|
dfs.name.dir | /tmp/hadoop/dfs/name | Determines where on the local filesystem the DFS name node
should store the name table. |
dfs.data.dir | /tmp/hadoop/dfs/data | Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma- or space-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. |
dfs.replication | 3 | How many copies we try to have at all times. The actual
number of replications is at max the number of datanodes in the
cluster. |
dfs.df.interval | 3000 | Disk usage statistics refresh interval in msec. |
mapred.job.tracker | local | The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
|
mapred.job.tracker.info.port | 50030 | The port that the MapReduce job tracker info webserver runs at.
|
mapred.task.tracker.output.port | 50040 | The port number that the MapReduce task tracker output server uses as a starting
point to look for a free port to listen on.
|
mapred.task.tracker.report.port | 50050 | The port number that the MapReduce task tracker report server uses as a starting
point to look for a free port to listen on.
|
mapred.local.dir | /tmp/hadoop/mapred/local | The local directory where MapReduce stores intermediate
data files. May be a space- or comma- separated list of
directories on different devices in order to spread disk i/o.
|
mapred.system.dir | /tmp/hadoop/mapred/system | The shared directory where MapReduce stores control files.
|
mapred.temp.dir | /tmp/hadoop/mapred/temp | A shared directory for temporary files.
|
mapred.map.tasks | 2 | The default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is "local".
|
mapred.reduce.tasks | 1 | The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
|
mapred.task.timeout | 600000 | The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
|
mapred.tasktracker.tasks.maximum | 2 | The maximum number of tasks that will be run
simultaneously by a task tracker.
|
mapred.child.java.opts | -Xmx200m | Java opts for the task tracker child processes. Subsumes
'mapred.child.heap.size' (If a mapred.child.heap.size value is found
in a configuration, its maximum heap size will be used and a warning
emitted that heap.size has been deprecated). Also, the following symbols,
if present, will be interpolated: @taskid@ is replaced by current TaskID;
and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A second
child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is
greater than one). Any other occurrences of '@' will go unchanged. For
example, to enable verbose gc logging to a file named for the taskid in
/tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
|
mapred.combine.buffer.size | 100000 | The number of entries the combining collector caches before
combining them and writing to disk. |
mapred.speculative.execution | true | If true, then multiple instances of some map tasks may
be executed in parallel. |
mapred.min.split.size | 0 | The minimum size chunk that map input should be split
into. Note that some file formats may have minimum split sizes that
take priority over this setting. |
ipc.client.timeout | 60000 | Defines the timeout for IPC calls in milliseconds. |