org.apache.hadoop.mapred
Class PhasedFileSystem

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.fs.FileSystem
          extended by org.apache.hadoop.mapred.PhasedFileSystem
All Implemented Interfaces:
Configurable

public class PhasedFileSystem
extends FileSystem

This class acts as a proxy to the actual file system being used. It writes files to a temporary location and on commit, moves to final location. On abort or a failure in commit the temporary file is deleted PhasedFileSystem works in context of a task. A different instance of PhasedFileSystem should be used for every task. Temporary files are written in ("mapred.system.dir")// If one tasks opens a large number of files in succession then its better to commit(Path) individual files when done. Otherwise commit() can be used to commit all open files at once.


Field Summary
 
Fields inherited from class org.apache.hadoop.fs.FileSystem
LOG
 
Constructor Summary
protected PhasedFileSystem(Configuration conf)
          This Constructor should not be used in this or any derived class.
  PhasedFileSystem(FileSystem fs, JobConf conf)
          This Constructor is used to wrap a FileSystem object to a Phased FilsSystem.
  PhasedFileSystem(FileSystem fs, String jobid, String tipid, String taskid)
          This Constructor is used to wrap a FileSystem object to a Phased FilsSystem.
 
Method Summary
 void abort()
          Aborts the file creation, all uncommitted files created by this PhasedFileSystem instance are deleted.
 void abort(Path p)
          Aborts a single file.
 void close()
          Closes base file system.
 void commit()
          Commits files to their final locations as passed in create* methods.
 void commit(Path fPath)
          Commits a single file file to its final locations as passed in create* methods.
 void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          Called when we're all done writing to the target.
 void copyFromLocalFile(Path src, Path dst)
          The src file is on the local disk.
 void copyToLocalFile(Path src, Path dst, boolean copyCrc)
          The src file is under FS, and the dst is on the local disk.
 FSOutputStream createRaw(Path f, boolean overwrite, short replication, long blockSize)
          Opens an OutputStream at the indicated Path.
 FSOutputStream createRaw(Path f, boolean overwrite, short replication, long blockSize, Progressable progress)
          Opens an OutputStream at the indicated Path with write-progress reporting.
 boolean deleteRaw(Path f)
          Deletes Path
 boolean exists(Path f)
          Check if exists.
 long getBlockSize(Path f)
          Get the size for a particular file.
 long getDefaultBlockSize()
          Return the number of bytes that large input files should be optimally be split into to minimize i/o time.
 short getDefaultReplication()
          Get the default replication.
 String[][] getFileCacheHints(Path f, long start, long len)
          Return a 2D array of size 1x1 or greater, containing hostnames where portions of the given file can be found.
 long getLength(Path f)
          The number of bytes in a file.
 String getName()
           
 short getReplication(Path src)
          Get replication.
 URI getUri()
          Returns a URI whose scheme and authority identify this FileSystem.
 Path getWorkingDirectory()
          Get the current working directory for the given file system
 void initialize(URI uri, Configuration conf)
          Called after a new FileSystem instance is constructed.
 boolean isDirectory(Path f)
          True iff the named path is a directory.
 Path[] listPathsRaw(Path f)
          List files in a directory.
 void lock(Path f, boolean shared)
          Obtain a lock on the given Path
 boolean mkdirs(Path f)
          Make the given file and all non-existent parents into directories.
 void moveFromLocalFile(Path src, Path dst)
          The src file is on the local disk.
 FSInputStream openRaw(Path f)
          Opens an InputStream for the indicated Path, whether local or via DFS.
 void release(Path f)
          Release the lock
 boolean renameRaw(Path src, Path dst)
          Renames Path src to Path dst.
 void reportChecksumFailure(Path f, FSInputStream in, long start, long length, int crc)
          Report a checksum error to the file system.
 boolean setReplicationRaw(Path src, short replication)
          Set replication for an existing file.
 void setWorkingDirectory(Path new_dir)
          Set the current working directory for the given file system.
 Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
          Returns a local File that the user can write output to.
 
Methods inherited from class org.apache.hadoop.fs.FileSystem
checkPath, copyToLocalFile, create, create, create, create, create, create, create, create, createNewFile, delete, get, get, getChecksumFile, getLocal, getNamed, globPaths, globPaths, isChecksumFile, isFile, listPaths, listPaths, listPaths, listPaths, makeQualified, open, open, parseArgs, rename, setReplication
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PhasedFileSystem

public PhasedFileSystem(FileSystem fs,
                        String jobid,
                        String tipid,
                        String taskid)
This Constructor is used to wrap a FileSystem object to a Phased FilsSystem.

Parameters:
fs - base file system object
jobid - JobId
tipid - tipId
taskid - taskId

PhasedFileSystem

public PhasedFileSystem(FileSystem fs,
                        JobConf conf)
This Constructor is used to wrap a FileSystem object to a Phased FilsSystem.

Parameters:
fs - base file system object
conf - JobConf

PhasedFileSystem

protected PhasedFileSystem(Configuration conf)
This Constructor should not be used in this or any derived class.

Parameters:
conf -
Method Detail

getUri

public URI getUri()
Description copied from class: FileSystem
Returns a URI whose scheme and authority identify this FileSystem.

Specified by:
getUri in class FileSystem

initialize

public void initialize(URI uri,
                       Configuration conf)
                throws IOException
Description copied from class: FileSystem
Called after a new FileSystem instance is constructed.

Specified by:
initialize in class FileSystem
Parameters:
uri - a uri whose authority section names the host, port, etc. for this FileSystem
conf - the configuration
Throws:
IOException

createRaw

public FSOutputStream createRaw(Path f,
                                boolean overwrite,
                                short replication,
                                long blockSize)
                         throws IOException
Description copied from class: FileSystem
Opens an OutputStream at the indicated Path.

Specified by:
createRaw in class FileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
replication - required block replication for the file.
Throws:
IOException

createRaw

public FSOutputStream createRaw(Path f,
                                boolean overwrite,
                                short replication,
                                long blockSize,
                                Progressable progress)
                         throws IOException
Description copied from class: FileSystem
Opens an OutputStream at the indicated Path with write-progress reporting.

Specified by:
createRaw in class FileSystem
Parameters:
f - the file name to open
overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
replication - required block replication for the file.
Throws:
IOException

commit

public void commit(Path fPath)
            throws IOException
Commits a single file file to its final locations as passed in create* methods. If a file already exists in final location then temporary file is deleted.

Parameters:
fPath - path to final file.
Throws:
IOException - thrown if commit fails

commit

public void commit()
            throws IOException
Commits files to their final locations as passed in create* methods. If a file already exists in final location then temporary file is deleted. This methods ignores crc files (ending with .crc). This method doesnt close the file system so it can still be used to create new files.

Throws:
IOException - if any file fails to commit

abort

public void abort(Path p)
           throws IOException
Aborts a single file. The temporary created file is deleted.

Parameters:
p - the path to final file as passed in create* call
Throws:
IOException - if File delete fails

abort

public void abort()
           throws IOException
Aborts the file creation, all uncommitted files created by this PhasedFileSystem instance are deleted. This does not close baseFS because handle to baseFS may still exist can be used to create new files.

Throws:
IOException

close

public void close()
           throws IOException
Closes base file system.

Overrides:
close in class FileSystem
Throws:
IOException

getReplication

public short getReplication(Path src)
                     throws IOException
Description copied from class: FileSystem
Get replication.

Specified by:
getReplication in class FileSystem
Parameters:
src - file name
Returns:
file replication
Throws:
IOException

setReplicationRaw

public boolean setReplicationRaw(Path src,
                                 short replication)
                          throws IOException
Description copied from class: FileSystem
Set replication for an existing file.

Specified by:
setReplicationRaw in class FileSystem
Parameters:
src - file name
replication - new replication
Returns:
true if successful; false if file does not exist or is a directory
Throws:
IOException

renameRaw

public boolean renameRaw(Path src,
                         Path dst)
                  throws IOException
Description copied from class: FileSystem
Renames Path src to Path dst. Can take place on local fs or remote DFS.

Specified by:
renameRaw in class FileSystem
Throws:
IOException

deleteRaw

public boolean deleteRaw(Path f)
                  throws IOException
Description copied from class: FileSystem
Deletes Path

Specified by:
deleteRaw in class FileSystem
Throws:
IOException

exists

public boolean exists(Path f)
               throws IOException
Description copied from class: FileSystem
Check if exists.

Specified by:
exists in class FileSystem
Throws:
IOException

isDirectory

public boolean isDirectory(Path f)
                    throws IOException
Description copied from class: FileSystem
True iff the named path is a directory.

Specified by:
isDirectory in class FileSystem
Throws:
IOException

getLength

public long getLength(Path f)
               throws IOException
Description copied from class: FileSystem
The number of bytes in a file.

Specified by:
getLength in class FileSystem
Throws:
IOException

listPathsRaw

public Path[] listPathsRaw(Path f)
                    throws IOException
Description copied from class: FileSystem
List files in a directory.

Specified by:
listPathsRaw in class FileSystem
Throws:
IOException

setWorkingDirectory

public void setWorkingDirectory(Path new_dir)
Description copied from class: FileSystem
Set the current working directory for the given file system. All relative paths will be resolved relative to it.

Specified by:
setWorkingDirectory in class FileSystem

getWorkingDirectory

public Path getWorkingDirectory()
Description copied from class: FileSystem
Get the current working directory for the given file system

Specified by:
getWorkingDirectory in class FileSystem
Returns:
the directory pathname

mkdirs

public boolean mkdirs(Path f)
               throws IOException
Description copied from class: FileSystem
Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.

Specified by:
mkdirs in class FileSystem
Throws:
IOException

lock

public void lock(Path f,
                 boolean shared)
          throws IOException
Description copied from class: FileSystem
Obtain a lock on the given Path

Specified by:
lock in class FileSystem
Throws:
IOException

release

public void release(Path f)
             throws IOException
Description copied from class: FileSystem
Release the lock

Specified by:
release in class FileSystem
Throws:
IOException

copyFromLocalFile

public void copyFromLocalFile(Path src,
                              Path dst)
                       throws IOException
Description copied from class: FileSystem
The src file is on the local disk. Add it to FS at the given dst name and the source is kept intact afterwards

Specified by:
copyFromLocalFile in class FileSystem
Throws:
IOException

moveFromLocalFile

public void moveFromLocalFile(Path src,
                              Path dst)
                       throws IOException
Description copied from class: FileSystem
The src file is on the local disk. Add it to FS at the given dst name, removing the source afterwards.

Specified by:
moveFromLocalFile in class FileSystem
Throws:
IOException

copyToLocalFile

public void copyToLocalFile(Path src,
                            Path dst,
                            boolean copyCrc)
                     throws IOException
Description copied from class: FileSystem
The src file is under FS, and the dst is on the local disk. Copy it from FS control to the local dst name. If src and dst are directories, the copyCrc parameter determines whether to copy CRC files.

Specified by:
copyToLocalFile in class FileSystem
Throws:
IOException

startLocalOutput

public Path startLocalOutput(Path fsOutputFile,
                             Path tmpLocalFile)
                      throws IOException
Description copied from class: FileSystem
Returns a local File that the user can write output to. The caller provides both the eventual FS target name and the local working file. If the FS is local, we write directly into the target. If the FS is remote, we write into the tmp local area.

Specified by:
startLocalOutput in class FileSystem
Throws:
IOException

completeLocalOutput

public void completeLocalOutput(Path fsOutputFile,
                                Path tmpLocalFile)
                         throws IOException
Description copied from class: FileSystem
Called when we're all done writing to the target. A local FS will do nothing, because we've written to exactly the right place. A remote FS will copy the contents of tmpLocalFile to the correct target at fsOutputFile.

Specified by:
completeLocalOutput in class FileSystem
Throws:
IOException

reportChecksumFailure

public void reportChecksumFailure(Path f,
                                  FSInputStream in,
                                  long start,
                                  long length,
                                  int crc)
Description copied from class: FileSystem
Report a checksum error to the file system.

Specified by:
reportChecksumFailure in class FileSystem
Parameters:
f - the file name containing the error
in - the stream open on the file
start - the position of the beginning of the bad data in the file
length - the length of the bad data in the file
crc - the expected CRC32 of the data

getBlockSize

public long getBlockSize(Path f)
                  throws IOException
Description copied from class: FileSystem
Get the size for a particular file.

Specified by:
getBlockSize in class FileSystem
Parameters:
f - the filename
Returns:
the number of bytes in a block
Throws:
IOException

getDefaultBlockSize

public long getDefaultBlockSize()
Description copied from class: FileSystem
Return the number of bytes that large input files should be optimally be split into to minimize i/o time.

Specified by:
getDefaultBlockSize in class FileSystem

getDefaultReplication

public short getDefaultReplication()
Description copied from class: FileSystem
Get the default replication.

Specified by:
getDefaultReplication in class FileSystem

getFileCacheHints

public String[][] getFileCacheHints(Path f,
                                    long start,
                                    long len)
                             throws IOException
Description copied from class: FileSystem
Return a 2D array of size 1x1 or greater, containing hostnames where portions of the given file can be found. For a nonexistent file or regions, null will be returned. This call is most helpful with DFS, where it returns hostnames of machines that contain the given file. The FileSystem will simply return an elt containing 'localhost'.

Specified by:
getFileCacheHints in class FileSystem
Throws:
IOException

getName

public String getName()
Specified by:
getName in class FileSystem

openRaw

public FSInputStream openRaw(Path f)
                      throws IOException
Description copied from class: FileSystem
Opens an InputStream for the indicated Path, whether local or via DFS.

Specified by:
openRaw in class FileSystem
Throws:
IOException


Copyright © 2006 The Apache Software Foundation