org.apache.hadoop.mapred
Class JobTracker

java.lang.Object
  extended by org.apache.hadoop.mapred.JobTracker

public class JobTracker
extends Object

JobTracker is the central location for submitting and tracking MR jobs in a network environment.

Author:
Mike Cafarella

Field Summary
static int FILE_NOT_FOUND
           
static long HEARTBEAT_INTERVAL
           
static org.apache.commons.logging.Log LOG
           
static int SUCCESS
           
static long TASKTRACKER_EXPIRY_INTERVAL
           
static int TRACKERS_OK
           
static int UNKNOWN_TASKTRACKER
           
static long versionID
           
static long versionID
           
 
Method Summary
 Vector completedJobs()
           
 int emitHeartbeat(org.apache.hadoop.mapred.TaskTrackerStatus trackerStatus, boolean initialContact)
          Process incoming heartbeat messages from the task trackers.
 Vector failedJobs()
           
static InetSocketAddress getAddress(Configuration conf)
           
 String getAssignedTracker(String taskId)
          Get tracker name for a given task id.
 ClusterStatus getClusterStatus()
          Get the current status of the cluster
 String getFilesystemName()
          Grab the local fs name
 int getInfoPort()
           
 org.apache.hadoop.mapred.JobInProgress getJob(String jobid)
           
 org.apache.hadoop.mapred.JobProfile getJobProfile(String jobid)
          Grab a handle to a job that is already known to the JobTracker
 JobStatus getJobStatus(String jobid)
          Grab a handle to a job that is already known to the JobTracker
 String getJobTrackerMachine()
           
 TaskReport[] getMapTaskReports(String jobid)
          Grab a bunch of info on the tasks that make up the job
 long getProtocolVersion(String protocol, long clientVersion)
          Return protocol version corresponding to protocol interface.
 TaskReport[] getReduceTaskReports(String jobid)
           
 long getStartTime()
           
 org.apache.hadoop.mapred.TaskTrackerStatus getTaskTracker(String trackerID)
           
 int getTotalSubmissions()
           
static JobTracker getTracker()
           
 int getTrackerPort()
           
 JobStatus[] jobsToComplete()
          Get the jobs that are not completed and not failed
 void killJob(String jobid)
          Kill the indicated job
 org.apache.hadoop.mapred.MapOutputLocation[] locateMapOutputs(String jobId, int[] mapTasksNeeded, int reduce)
          A TaskTracker wants to know the physical locations of completed, but not yet closed, tasks.
static void main(String[] argv)
          Start the JobTracker process.
 void offerService()
          Run forever
 org.apache.hadoop.mapred.Task pollForNewTask(String taskTracker)
          A tracker wants to know if there's a Task to run.
 String[] pollForTaskWithClosedJob(String taskTracker)
          A tracker wants to know if any of its Tasks have been closed (because the job completed, whether successfully or not)
 void reportTaskTrackerError(String taskTracker, String errorClass, String errorMessage)
          Report a problem to the job tracker.
 Vector runningJobs()
           
static void startTracker(Configuration conf)
           
static void stopTracker()
           
 JobStatus submitJob(String jobFile)
          JobTracker.submitJob() kicks off a new job.
 Collection taskTrackers()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

HEARTBEAT_INTERVAL

public static final long HEARTBEAT_INTERVAL
See Also:
Constant Field Values

TASKTRACKER_EXPIRY_INTERVAL

public static final long TASKTRACKER_EXPIRY_INTERVAL
See Also:
Constant Field Values

SUCCESS

public static final int SUCCESS
See Also:
Constant Field Values

FILE_NOT_FOUND

public static final int FILE_NOT_FOUND
See Also:
Constant Field Values

versionID

public static final long versionID
See Also:
Constant Field Values

TRACKERS_OK

public static final int TRACKERS_OK
See Also:
Constant Field Values

UNKNOWN_TASKTRACKER

public static final int UNKNOWN_TASKTRACKER
See Also:
Constant Field Values

versionID

public static final long versionID
See Also:
Constant Field Values
Method Detail

startTracker

public static void startTracker(Configuration conf)
                         throws IOException
Throws:
IOException

getTracker

public static JobTracker getTracker()

stopTracker

public static void stopTracker()
                        throws IOException
Throws:
IOException

getProtocolVersion

public long getProtocolVersion(String protocol,
                               long clientVersion)
Description copied from interface: VersionedProtocol
Return protocol version corresponding to protocol interface.

Parameters:
protocol - The classname of the protocol interface
clientVersion - The version of the protocol that the client speaks
Returns:
the version that the server will speak

getAddress

public static InetSocketAddress getAddress(Configuration conf)

offerService

public void offerService()
Run forever


getTotalSubmissions

public int getTotalSubmissions()

getJobTrackerMachine

public String getJobTrackerMachine()

getTrackerPort

public int getTrackerPort()

getInfoPort

public int getInfoPort()

getStartTime

public long getStartTime()

runningJobs

public Vector runningJobs()

failedJobs

public Vector failedJobs()

completedJobs

public Vector completedJobs()

taskTrackers

public Collection taskTrackers()

getTaskTracker

public org.apache.hadoop.mapred.TaskTrackerStatus getTaskTracker(String trackerID)

emitHeartbeat

public int emitHeartbeat(org.apache.hadoop.mapred.TaskTrackerStatus trackerStatus,
                         boolean initialContact)
Process incoming heartbeat messages from the task trackers.


pollForNewTask

public org.apache.hadoop.mapred.Task pollForNewTask(String taskTracker)
A tracker wants to know if there's a Task to run. Returns a task we'd like the TaskTracker to execute right now. Eventually this function should compute load on the various TaskTrackers, and incorporate knowledge of DFS file placement. But for right now, it just grabs a single item out of the pending task list and hands it back.


pollForTaskWithClosedJob

public String[] pollForTaskWithClosedJob(String taskTracker)
A tracker wants to know if any of its Tasks have been closed (because the job completed, whether successfully or not)


locateMapOutputs

public org.apache.hadoop.mapred.MapOutputLocation[] locateMapOutputs(String jobId,
                                                                     int[] mapTasksNeeded,
                                                                     int reduce)
A TaskTracker wants to know the physical locations of completed, but not yet closed, tasks. This exists so the reduce task thread can locate map task outputs.

Parameters:
jobId - the job id
mapTasksNeeded - an array of the mapIds that we need
reduce - the reduce's id
Returns:
an array of MapOutputLocation

getFilesystemName

public String getFilesystemName()
                         throws IOException
Grab the local fs name

Throws:
IOException

reportTaskTrackerError

public void reportTaskTrackerError(String taskTracker,
                                   String errorClass,
                                   String errorMessage)
                            throws IOException
Report a problem to the job tracker.

Parameters:
taskTracker - the name of the task tracker
errorClass - the kind of error (eg. the class that was thrown)
errorMessage - the human readable error message
Throws:
IOException - if there was a problem in communication or on the remote side

submitJob

public JobStatus submitJob(String jobFile)
                    throws IOException
JobTracker.submitJob() kicks off a new job. Create a 'JobInProgress' object, which contains both JobProfile and JobStatus. Those two sub-objects are sometimes shipped outside of the JobTracker. But JobInProgress adds info that's useful for the JobTracker alone. We add the JIP to the jobInitQueue, which is processed asynchronously to handle split-computation and build up the right TaskTracker/Block mapping.

Throws:
IOException

getClusterStatus

public ClusterStatus getClusterStatus()
Get the current status of the cluster

Returns:
summary of the state of the cluster

killJob

public void killJob(String jobid)
Kill the indicated job


getJobProfile

public org.apache.hadoop.mapred.JobProfile getJobProfile(String jobid)
Grab a handle to a job that is already known to the JobTracker


getJobStatus

public JobStatus getJobStatus(String jobid)
Grab a handle to a job that is already known to the JobTracker


getMapTaskReports

public TaskReport[] getMapTaskReports(String jobid)
Grab a bunch of info on the tasks that make up the job


getReduceTaskReports

public TaskReport[] getReduceTaskReports(String jobid)

getAssignedTracker

public String getAssignedTracker(String taskId)
Get tracker name for a given task id.

Parameters:
taskId - the name of the task
Returns:
The name of the task tracker

jobsToComplete

public JobStatus[] jobsToComplete()
Get the jobs that are not completed and not failed

Returns:
array of JobStatus for the running/to-be-run jobs.

getJob

public org.apache.hadoop.mapred.JobInProgress getJob(String jobid)

main

public static void main(String[] argv)
                 throws IOException,
                        InterruptedException
Start the JobTracker process. This is used only for debugging. As a rule, JobTracker should be run as part of the DFS Namenode process.

Throws:
IOException
InterruptedException


Copyright © 2006 The Apache Software Foundation