org.apache.hadoop.mapred
Class PhasedFileSystem
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.fs.FileSystem
org.apache.hadoop.mapred.PhasedFileSystem
- All Implemented Interfaces:
- Configurable
public class PhasedFileSystem
- extends FileSystem
This class acts as a proxy to the actual file system being used.
It writes files to a temporary location and on
commit, moves to final location. On abort or a failure in
commit the temporary file is deleted
PhasedFileSystem works in context of a task. A different instance of
PhasedFileSystem should be used for every task.
Temporary files are written in ("mapred.system.dir")//
If one tasks opens a large number of files in succession then its
better to commit(Path) individual files when done. Otherwise
commit() can be used to commit all open files at once.
Method Summary |
void |
abort()
Aborts the file creation, all uncommitted files created by this PhasedFileSystem
instance are deleted. |
void |
abort(Path p)
Aborts a single file. |
void |
close()
Closes base file system. |
void |
commit()
Commits files to their final locations as passed in create* methods. |
void |
commit(Path fPath)
Commits a single file file to its final locations as passed in create* methods. |
void |
completeLocalOutput(Path fsOutputFile,
Path tmpLocalFile)
Called when we're all done writing to the target. |
void |
copyFromLocalFile(Path src,
Path dst)
The src file is on the local disk. |
void |
copyToLocalFile(Path src,
Path dst,
boolean copyCrc)
The src file is under FS, and the dst is on the local disk. |
FSOutputStream |
createRaw(Path f,
boolean overwrite,
short replication,
long blockSize)
Opens an OutputStream at the indicated Path. |
FSOutputStream |
createRaw(Path f,
boolean overwrite,
short replication,
long blockSize,
Progressable progress)
Opens an OutputStream at the indicated Path with write-progress
reporting. |
boolean |
deleteRaw(Path f)
Deletes Path |
boolean |
exists(Path f)
Check if exists. |
long |
getBlockSize(Path f)
Get the size for a particular file. |
long |
getDefaultBlockSize()
Return the number of bytes that large input files should be optimally
be split into to minimize i/o time. |
short |
getDefaultReplication()
Get the default replication. |
String[][] |
getFileCacheHints(Path f,
long start,
long len)
Return a 2D array of size 1x1 or greater, containing hostnames
where portions of the given file can be found. |
long |
getLength(Path f)
The number of bytes in a file. |
String |
getName()
|
short |
getReplication(Path src)
Get replication. |
URI |
getUri()
Returns a URI whose scheme and authority identify this FileSystem. |
Path |
getWorkingDirectory()
Get the current working directory for the given file system |
void |
initialize(URI uri,
Configuration conf)
Called after a new FileSystem instance is constructed. |
boolean |
isDirectory(Path f)
True iff the named path is a directory. |
Path[] |
listPathsRaw(Path f)
List files in a directory. |
void |
lock(Path f,
boolean shared)
Deprecated. |
boolean |
mkdirs(Path f)
Make the given file and all non-existent parents into
directories. |
void |
moveFromLocalFile(Path src,
Path dst)
The src file is on the local disk. |
FSInputStream |
openRaw(Path f)
Opens an InputStream for the indicated Path, whether local
or via DFS. |
void |
release(Path f)
Deprecated. |
boolean |
renameRaw(Path src,
Path dst)
Renames Path src to Path dst. |
void |
reportChecksumFailure(Path f,
FSInputStream in,
long inPos,
FSInputStream sums,
long sumsPos)
Report a checksum error to the file system. |
boolean |
setReplicationRaw(Path src,
short replication)
Set replication for an existing file. |
void |
setWorkingDirectory(Path new_dir)
Set the current working directory for the given file system. |
Path |
startLocalOutput(Path fsOutputFile,
Path tmpLocalFile)
Returns a local File that the user can write output to. |
Methods inherited from class org.apache.hadoop.fs.FileSystem |
checkPath, copyToLocalFile, create, create, create, create, create, create, create, create, createNewFile, delete, get, get, getChecksumFile, getChecksumFileLength, getContentLength, getLocal, getNamed, globPaths, globPaths, isChecksumFile, isFile, listPaths, listPaths, listPaths, listPaths, makeQualified, open, open, parseArgs, rename, setReplication |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PhasedFileSystem
public PhasedFileSystem(FileSystem fs,
String jobid,
String tipid,
String taskid)
- This Constructor is used to wrap a FileSystem object to a
Phased FilsSystem.
- Parameters:
fs
- base file system objectjobid
- JobIdtipid
- tipIdtaskid
- taskId
PhasedFileSystem
public PhasedFileSystem(FileSystem fs,
JobConf conf)
- This Constructor is used to wrap a FileSystem object to a
Phased FilsSystem.
- Parameters:
fs
- base file system objectconf
- JobConf
PhasedFileSystem
protected PhasedFileSystem(Configuration conf)
- This Constructor should not be used in this or any derived class.
- Parameters:
conf
-
getUri
public URI getUri()
- Description copied from class:
FileSystem
- Returns a URI whose scheme and authority identify this FileSystem.
- Specified by:
getUri
in class FileSystem
initialize
public void initialize(URI uri,
Configuration conf)
throws IOException
- Description copied from class:
FileSystem
- Called after a new FileSystem instance is constructed.
- Specified by:
initialize
in class FileSystem
- Parameters:
uri
- a uri whose authority section names the host, port, etc.
for this FileSystemconf
- the configuration
- Throws:
IOException
createRaw
public FSOutputStream createRaw(Path f,
boolean overwrite,
short replication,
long blockSize)
throws IOException
- Description copied from class:
FileSystem
- Opens an OutputStream at the indicated Path.
- Specified by:
createRaw
in class FileSystem
- Parameters:
f
- the file name to openoverwrite
- if a file with this name already exists, then if true,
the file will be overwritten, and if false an error will be thrown.replication
- required block replication for the file.
- Throws:
IOException
createRaw
public FSOutputStream createRaw(Path f,
boolean overwrite,
short replication,
long blockSize,
Progressable progress)
throws IOException
- Description copied from class:
FileSystem
- Opens an OutputStream at the indicated Path with write-progress
reporting.
- Specified by:
createRaw
in class FileSystem
- Parameters:
f
- the file name to openoverwrite
- if a file with this name already exists, then if true,
the file will be overwritten, and if false an error will be thrown.replication
- required block replication for the file.
- Throws:
IOException
commit
public void commit(Path fPath)
throws IOException
- Commits a single file file to its final locations as passed in create* methods.
If a file already exists in final location then temporary file is deleted.
- Parameters:
fPath
- path to final file.
- Throws:
IOException
- thrown if commit fails
commit
public void commit()
throws IOException
- Commits files to their final locations as passed in create* methods.
If a file already exists in final location then temporary file is deleted.
This methods ignores crc files (ending with .crc). This method doesnt close
the file system so it can still be used to create new files.
- Throws:
IOException
- if any file fails to commit
abort
public void abort(Path p)
throws IOException
- Aborts a single file. The temporary created file is deleted.
- Parameters:
p
- the path to final file as passed in create* call
- Throws:
IOException
- if File delete fails
abort
public void abort()
throws IOException
- Aborts the file creation, all uncommitted files created by this PhasedFileSystem
instance are deleted. This does not close baseFS because handle to baseFS may still
exist can be used to create new files.
- Throws:
IOException
close
public void close()
throws IOException
- Closes base file system.
- Overrides:
close
in class FileSystem
- Throws:
IOException
getReplication
public short getReplication(Path src)
throws IOException
- Description copied from class:
FileSystem
- Get replication.
- Specified by:
getReplication
in class FileSystem
- Parameters:
src
- file name
- Returns:
- file replication
- Throws:
IOException
setReplicationRaw
public boolean setReplicationRaw(Path src,
short replication)
throws IOException
- Description copied from class:
FileSystem
- Set replication for an existing file.
- Specified by:
setReplicationRaw
in class FileSystem
- Parameters:
src
- file namereplication
- new replication
- Returns:
- true if successful;
false if file does not exist or is a directory
- Throws:
IOException
renameRaw
public boolean renameRaw(Path src,
Path dst)
throws IOException
- Description copied from class:
FileSystem
- Renames Path src to Path dst. Can take place on local fs
or remote DFS.
- Specified by:
renameRaw
in class FileSystem
- Throws:
IOException
deleteRaw
public boolean deleteRaw(Path f)
throws IOException
- Description copied from class:
FileSystem
- Deletes Path
- Specified by:
deleteRaw
in class FileSystem
- Throws:
IOException
exists
public boolean exists(Path f)
throws IOException
- Description copied from class:
FileSystem
- Check if exists.
- Specified by:
exists
in class FileSystem
- Throws:
IOException
isDirectory
public boolean isDirectory(Path f)
throws IOException
- Description copied from class:
FileSystem
- True iff the named path is a directory.
- Specified by:
isDirectory
in class FileSystem
- Throws:
IOException
getLength
public long getLength(Path f)
throws IOException
- Description copied from class:
FileSystem
- The number of bytes in a file.
- Specified by:
getLength
in class FileSystem
- Throws:
IOException
listPathsRaw
public Path[] listPathsRaw(Path f)
throws IOException
- Description copied from class:
FileSystem
- List files in a directory.
- Specified by:
listPathsRaw
in class FileSystem
- Throws:
IOException
setWorkingDirectory
public void setWorkingDirectory(Path new_dir)
- Description copied from class:
FileSystem
- Set the current working directory for the given file system.
All relative paths will be resolved relative to it.
- Specified by:
setWorkingDirectory
in class FileSystem
getWorkingDirectory
public Path getWorkingDirectory()
- Description copied from class:
FileSystem
- Get the current working directory for the given file system
- Specified by:
getWorkingDirectory
in class FileSystem
- Returns:
- the directory pathname
mkdirs
public boolean mkdirs(Path f)
throws IOException
- Description copied from class:
FileSystem
- Make the given file and all non-existent parents into
directories. Has the semantics of Unix 'mkdir -p'.
Existence of the directory hierarchy is not an error.
- Specified by:
mkdirs
in class FileSystem
- Throws:
IOException
lock
@Deprecated
public void lock(Path f,
boolean shared)
throws IOException
- Deprecated.
- Description copied from class:
FileSystem
- Obtain a lock on the given Path
- Specified by:
lock
in class FileSystem
- Throws:
IOException
release
@Deprecated
public void release(Path f)
throws IOException
- Deprecated.
- Description copied from class:
FileSystem
- Release the lock
- Specified by:
release
in class FileSystem
- Throws:
IOException
copyFromLocalFile
public void copyFromLocalFile(Path src,
Path dst)
throws IOException
- Description copied from class:
FileSystem
- The src file is on the local disk. Add it to FS at
the given dst name and the source is kept intact afterwards
- Specified by:
copyFromLocalFile
in class FileSystem
- Throws:
IOException
moveFromLocalFile
public void moveFromLocalFile(Path src,
Path dst)
throws IOException
- Description copied from class:
FileSystem
- The src file is on the local disk. Add it to FS at
the given dst name, removing the source afterwards.
- Specified by:
moveFromLocalFile
in class FileSystem
- Throws:
IOException
copyToLocalFile
public void copyToLocalFile(Path src,
Path dst,
boolean copyCrc)
throws IOException
- Description copied from class:
FileSystem
- The src file is under FS, and the dst is on the local disk.
Copy it from FS control to the local dst name.
If src and dst are directories, the copyCrc parameter
determines whether to copy CRC files.
- Specified by:
copyToLocalFile
in class FileSystem
- Throws:
IOException
startLocalOutput
public Path startLocalOutput(Path fsOutputFile,
Path tmpLocalFile)
throws IOException
- Description copied from class:
FileSystem
- Returns a local File that the user can write output to. The caller
provides both the eventual FS target name and the local working
file. If the FS is local, we write directly into the target. If
the FS is remote, we write into the tmp local area.
- Specified by:
startLocalOutput
in class FileSystem
- Throws:
IOException
completeLocalOutput
public void completeLocalOutput(Path fsOutputFile,
Path tmpLocalFile)
throws IOException
- Description copied from class:
FileSystem
- Called when we're all done writing to the target. A local FS will
do nothing, because we've written to exactly the right place. A remote
FS will copy the contents of tmpLocalFile to the correct target at
fsOutputFile.
- Specified by:
completeLocalOutput
in class FileSystem
- Throws:
IOException
reportChecksumFailure
public void reportChecksumFailure(Path f,
FSInputStream in,
long inPos,
FSInputStream sums,
long sumsPos)
- Description copied from class:
FileSystem
- Report a checksum error to the file system.
- Specified by:
reportChecksumFailure
in class FileSystem
- Parameters:
f
- the file name containing the errorin
- the stream open on the fileinPos
- the position of the beginning of the bad data in the filesums
- the stream open on the checksum filesumsPos
- the position of the beginning of the bad data in the checksum file
getBlockSize
public long getBlockSize(Path f)
throws IOException
- Description copied from class:
FileSystem
- Get the size for a particular file.
- Specified by:
getBlockSize
in class FileSystem
- Parameters:
f
- the filename
- Returns:
- the number of bytes in a block
- Throws:
IOException
getDefaultBlockSize
public long getDefaultBlockSize()
- Description copied from class:
FileSystem
- Return the number of bytes that large input files should be optimally
be split into to minimize i/o time.
- Specified by:
getDefaultBlockSize
in class FileSystem
getDefaultReplication
public short getDefaultReplication()
- Description copied from class:
FileSystem
- Get the default replication.
- Specified by:
getDefaultReplication
in class FileSystem
getFileCacheHints
public String[][] getFileCacheHints(Path f,
long start,
long len)
throws IOException
- Description copied from class:
FileSystem
- Return a 2D array of size 1x1 or greater, containing hostnames
where portions of the given file can be found. For a nonexistent
file or regions, null will be returned.
This call is most helpful with DFS, where it returns
hostnames of machines that contain the given file.
The FileSystem will simply return an elt containing 'localhost'.
- Specified by:
getFileCacheHints
in class FileSystem
- Throws:
IOException
getName
public String getName()
- Specified by:
getName
in class FileSystem
openRaw
public FSInputStream openRaw(Path f)
throws IOException
- Description copied from class:
FileSystem
- Opens an InputStream for the indicated Path, whether local
or via DFS.
- Specified by:
openRaw
in class FileSystem
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation