Description:
This processor writes FlowFiles to an HDFS cluster. It will create directories in which to store files as
needed based on the Directory property.
When files are written to HDFS, the file's owner is the user identity of the NiFi process, the file's group is the
group of the parent directory, and the read/write/execute permissions use the default umask. The owner can be
overridden using the Remote Owner property, the group can be overridden using the Remote Group
property, and the read/write/execute permissions can be overridden using the Permissions umask property.
NOTE: This processor can change owner or group only if the user identity of the NiFi process has super user
privilege in HDFS to do so.
NOTE: The Permissions umask property cannot add execute permissions to regular files.
Uses Attributes:
Attribute Name |
Description |
filename |
The name of the file written to HDFS comes from the value of this attribute. |
Properties:
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are
considered optional. If a property has a default value, it is indicated. If a property supports the use of the
NiFi Expression Language (or simply, "expression language"), that is also indicated.
- Hadoop Configuration Resources
- A file or comma separated list of files which contains the Hadoop file system configuration.
Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will
revert to a default configuration.
- Default value: none
- Directory
- The HDFS directory to which FlowFile content should be written. This property supports the
expression language so you can keep the FlowFile's directory structure by using the ${path} attribute
reference, e.g. /in/data/${path}.
- Default value: none
- Supports expression language: true
- Conflict Resolution Strategy
- Indicates what should happen when a file with the same name already exists in the output directory.
Valid options are:
- replace - existing file is overwritten by new file
- ignore - existing file is untouched, FlowFile routed to success
- fail - existing file is untouched, FlowFile routed to failure
- Default value: fail
- Block Size
- Size of each block as written to HDFS. This is a data size integer that must include units of B,
KB, MB, GB, or TB. This overrides the Hadoop Configuration.
- Default value: none
- IO Buffer Size
- Amount of memory to use to buffer file contents during IO. This is a data size integer that must
include units of B, KB, MB, GB, or TB. This overrides the Hadoop Configuration.
- Default value: none
- Replication
- Number of times that HDFS will replicate each file. This must be an integer greater than 0. This
overrides the Hadoop Configuration.
- Default value: none
- Permissions umask
- A umask represented as an octal number which determines the permissions of files written to HDFS.
This overrides the Hadoop Configuration dfs.umaskmode.
- Default value: none
- Remote Owner
- Changes the owner of the HDFS file to this value after it is written. This only works if NiFi is
running as a user that has HDFS super user privilege to change owner.
- Default value: none
- Remote Group
- Changes the group of the HDFS file to this value after it is written. This only works if NiFi is
running as a user that has HDFS super user privilege to change group.
- Default value: none
Relationships:
- success
- Files that have been successfully written to HDFS are transferred to this relationship.
- failure
- Files that could not be written to HDFS for some reason are transferred to this relationship.
See Also: