name | value | description |
hadoop.logfile.size | 10000000 | The max size of each log file |
hadoop.logfile.count | 10 | The max number of log files |
dfs.namenode.logging.level | info | The logging level for dfs namenode. Other values are "dir"(trac
e namespace mutations), "block"(trace block under/over replications and block
creations/deletions), or "all". |
io.sort.factor | 10 | The number of streams to merge at once while sorting
files. This determines the number of open file handles. |
io.sort.mb | 100 | The total amount of buffer memory to use while sorting
files, in megabytes. By default, gives each merge stream 1MB, which
should minimize seeks. |
io.file.buffer.size | 4096 | The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations. |
io.bytes.per.checksum | 512 | The number of bytes per checksum. Must not be larger than
io.file.buffer.size. |
io.skip.checksum.errors | false | If true, when a checksum error is encountered while
reading a sequence file, entries are skipped, instead of throwing an
exception. |
io.map.index.skip | 0 | Number of index entries to skip between each entry.
Zero by default. Setting this to values larger than zero can
facilitate opening large map files using less memory. |
fs.default.name | local | The name of the default file system. Either the
literal string "local" or a host:port for DFS. |
dfs.datanode.port | 50010 | The port number that the dfs datanode server uses as a starting
point to look for a free port to listen on.
|
dfs.name.dir | /tmp/hadoop/dfs/name | Determines where on the local filesystem the DFS name node
should store the name table. |
dfs.data.dir | /tmp/hadoop/dfs/data | Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. |
dfs.replication | 3 | Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
|
dfs.replication.max | 512 | Maximal block replication.
|
dfs.replication.min | 1 | Minimal block replication.
|
dfs.df.interval | 3000 | Disk usage statistics refresh interval in msec. |
dfs.client.block.write.retries | 3 | The number of retries for writing blocks to the data nodes,
before we signal failure to the application.
|
mapred.job.tracker | local | The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
|
mapred.job.tracker.info.port | 50030 | The port that the MapReduce job tracker info webserver runs at.
|
mapred.task.tracker.output.port | 50040 | The port number that the MapReduce task tracker output server uses as a starting
point to look for a free port to listen on.
|
mapred.task.tracker.report.port | 50050 | The port number that the MapReduce task tracker report server uses as a starting
point to look for a free port to listen on.
|
mapred.local.dir | /tmp/hadoop/mapred/local | The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
|
mapred.system.dir | /tmp/hadoop/mapred/system | The shared directory where MapReduce stores control files.
|
mapred.temp.dir | /tmp/hadoop/mapred/temp | A shared directory for temporary files.
|
mapred.map.tasks | 2 | The default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is "local".
|
mapred.reduce.tasks | 1 | The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
|
mapred.task.timeout | 600000 | The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
|
mapred.tasktracker.tasks.maximum | 2 | The maximum number of tasks that will be run
simultaneously by a task tracker.
|
mapred.child.java.opts | -Xmx200m | Java opts for the task tracker child processes. Subsumes
'mapred.child.heap.size' (If a mapred.child.heap.size value is found
in a configuration, its maximum heap size will be used and a warning
emitted that heap.size has been deprecated). Also, the following symbols,
if present, will be interpolated: @taskid@ is replaced by current TaskID;
and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A second
child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is
greater than one). Any other occurrences of '@' will go unchanged. For
example, to enable verbose gc logging to a file named for the taskid in
/tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
|
mapred.combine.buffer.size | 100000 | The number of entries the combining collector caches before
combining them and writing to disk. |
mapred.speculative.execution | true | If true, then multiple instances of some map tasks may
be executed in parallel. |
mapred.min.split.size | 0 | The minimum size chunk that map input should be split
into. Note that some file formats may have minimum split sizes that
take priority over this setting. |
mapred.submit.replication | 10 | The replication level for submitted job files. This
should be around the square root of the number of nodes.
|
ipc.client.timeout | 60000 | Defines the timeout for IPC calls in milliseconds. |