org.apache.hadoop.mapred
Class JobTracker

java.lang.Object
  extended by org.apache.hadoop.mapred.JobTracker

public class JobTracker
extends Object

JobTracker is the central location for submitting and tracking MR jobs in a network environment.

Author:
Mike Cafarella

Field Summary
static int FILE_NOT_FOUND
           
static long HEARTBEAT_INTERVAL
           
static Logger LOG
           
static int SUCCESS
           
static long TASKTRACKER_EXPIRY_INTERVAL
           
static int TRACKERS_OK
           
static int UNKNOWN_TASKTRACKER
           
 
Method Summary
 Vector completedJobs()
           
 int emitHeartbeat(org.apache.hadoop.mapred.TaskTrackerStatus trackerStatus, boolean initialContact)
          Process incoming heartbeat messages from the task trackers.
 Vector failedJobs()
           
static InetSocketAddress getAddress(Configuration conf)
           
 ClusterStatus getClusterStatus()
          Get the current status of the cluster
 String getFilesystemName()
          Grab the local fs name
 int getInfoPort()
           
 org.apache.hadoop.mapred.JobInProgress getJob(String jobid)
           
 org.apache.hadoop.mapred.JobProfile getJobProfile(String jobid)
          Grab a handle to a job that is already known to the JobTracker
 org.apache.hadoop.mapred.JobStatus getJobStatus(String jobid)
          Grab a handle to a job that is already known to the JobTracker
 String getJobTrackerMachine()
           
 TaskReport[] getMapTaskReports(String jobid)
          Grab a bunch of info on the tasks that make up the job
 TaskReport[] getReduceTaskReports(String jobid)
           
 long getStartTime()
           
 org.apache.hadoop.mapred.TaskTrackerStatus getTaskTracker(String trackerID)
           
 int getTotalSubmissions()
           
static JobTracker getTracker()
           
 int getTrackerPort()
           
 void killJob(String jobid)
          Kill the indicated job
 org.apache.hadoop.mapred.MapOutputLocation[] locateMapOutputs(String taskId, String[][] mapTasksNeeded)
          A TaskTracker wants to know the physical locations of completed, but not yet closed, tasks.
static void main(String[] argv)
          Start the JobTracker process.
 void offerService()
          Run forever
 org.apache.hadoop.mapred.Task pollForNewTask(String taskTracker)
          A tracker wants to know if there's a Task to run.
 String[] pollForTaskWithClosedJob(String taskTracker)
          A tracker wants to know if any of its Tasks have been closed (because the job completed, whether successfully or not)
 Vector runningJobs()
           
static void startTracker(Configuration conf)
           
 org.apache.hadoop.mapred.JobStatus submitJob(String jobFile)
          JobTracker.submitJob() kicks off a new job.
 Collection taskTrackers()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final Logger LOG

HEARTBEAT_INTERVAL

public static final long HEARTBEAT_INTERVAL
See Also:
Constant Field Values

TASKTRACKER_EXPIRY_INTERVAL

public static final long TASKTRACKER_EXPIRY_INTERVAL
See Also:
Constant Field Values

SUCCESS

public static final int SUCCESS
See Also:
Constant Field Values

FILE_NOT_FOUND

public static final int FILE_NOT_FOUND
See Also:
Constant Field Values

TRACKERS_OK

public static final int TRACKERS_OK
See Also:
Constant Field Values

UNKNOWN_TASKTRACKER

public static final int UNKNOWN_TASKTRACKER
See Also:
Constant Field Values
Method Detail

startTracker

public static void startTracker(Configuration conf)
                         throws IOException
Throws:
IOException

getTracker

public static JobTracker getTracker()

getAddress

public static InetSocketAddress getAddress(Configuration conf)

offerService

public void offerService()
Run forever


getTotalSubmissions

public int getTotalSubmissions()

getJobTrackerMachine

public String getJobTrackerMachine()

getTrackerPort

public int getTrackerPort()

getInfoPort

public int getInfoPort()

getStartTime

public long getStartTime()

runningJobs

public Vector runningJobs()

failedJobs

public Vector failedJobs()

completedJobs

public Vector completedJobs()

taskTrackers

public Collection taskTrackers()

getTaskTracker

public org.apache.hadoop.mapred.TaskTrackerStatus getTaskTracker(String trackerID)

emitHeartbeat

public int emitHeartbeat(org.apache.hadoop.mapred.TaskTrackerStatus trackerStatus,
                         boolean initialContact)
Process incoming heartbeat messages from the task trackers.


pollForNewTask

public org.apache.hadoop.mapred.Task pollForNewTask(String taskTracker)
A tracker wants to know if there's a Task to run. Returns a task we'd like the TaskTracker to execute right now. Eventually this function should compute load on the various TaskTrackers, and incorporate knowledge of DFS file placement. But for right now, it just grabs a single item out of the pending task list and hands it back.


pollForTaskWithClosedJob

public String[] pollForTaskWithClosedJob(String taskTracker)
A tracker wants to know if any of its Tasks have been closed (because the job completed, whether successfully or not)


locateMapOutputs

public org.apache.hadoop.mapred.MapOutputLocation[] locateMapOutputs(String taskId,
                                                                     String[][] mapTasksNeeded)
A TaskTracker wants to know the physical locations of completed, but not yet closed, tasks. This exists so the reduce task thread can locate map task outputs.

Parameters:
taskId - the reduce task id
mapTasksNeeded - an array of UTF8 naming map task ids whose output is needed.
Returns:
an array of MapOutputLocation

getFilesystemName

public String getFilesystemName()
                         throws IOException
Grab the local fs name

Throws:
IOException

submitJob

public org.apache.hadoop.mapred.JobStatus submitJob(String jobFile)
                                             throws IOException
JobTracker.submitJob() kicks off a new job. Create a 'JobInProgress' object, which contains both JobProfile and JobStatus. Those two sub-objects are sometimes shipped outside of the JobTracker. But JobInProgress adds info that's useful for the JobTracker alone. We add the JIP to the jobInitQueue, which is processed asynchronously to handle split-computation and build up the right TaskTracker/Block mapping.

Throws:
IOException

getClusterStatus

public ClusterStatus getClusterStatus()
Get the current status of the cluster

Returns:
summary of the state of the cluster

killJob

public void killJob(String jobid)
Kill the indicated job


getJobProfile

public org.apache.hadoop.mapred.JobProfile getJobProfile(String jobid)
Grab a handle to a job that is already known to the JobTracker


getJobStatus

public org.apache.hadoop.mapred.JobStatus getJobStatus(String jobid)
Grab a handle to a job that is already known to the JobTracker


getMapTaskReports

public TaskReport[] getMapTaskReports(String jobid)
Grab a bunch of info on the tasks that make up the job


getReduceTaskReports

public TaskReport[] getReduceTaskReports(String jobid)

getJob

public org.apache.hadoop.mapred.JobInProgress getJob(String jobid)

main

public static void main(String[] argv)
                 throws IOException,
                        InterruptedException
Start the JobTracker process. This is used only for debugging. As a rule, JobTracker should be run as part of the DFS Namenode process.

Throws:
IOException
InterruptedException


Copyright © 2006 The Apache Software Foundation