@InterfaceStability.Evolving public interface StreamWriter extends DataSourceWriter
DataSourceWriter
for use with structured streaming.
Streaming queries are divided into intervals of data called epochs, with a monotonically
increasing numeric ID. This writer handles commits and aborts for each successive epoch.Modifier and Type | Method and Description |
---|---|
void |
abort(long epochId,
WriterCommitMessage[] messages)
Aborts this writing job because some data writers are failed and keep failing when retried, or
the Spark job fails with some unknown reasons, or
commit(WriterCommitMessage[]) fails. |
default void |
abort(WriterCommitMessage[] messages)
Aborts this writing job because some data writers are failed and keep failing when retry,
or the Spark job fails with some unknown reasons,
or
DataSourceWriter.onDataWriterCommit(WriterCommitMessage) fails,
or DataSourceWriter.commit(WriterCommitMessage[]) fails. |
void |
commit(long epochId,
WriterCommitMessage[] messages)
Commits this writing job for the specified epoch with a list of commit messages.
|
default void |
commit(WriterCommitMessage[] messages)
Commits this writing job with a list of commit messages.
|
createWriterFactory, onDataWriterCommit, useCommitCoordinator
void commit(long epochId, WriterCommitMessage[] messages)
DataWriter.commit()
.
If this method fails (by throwing an exception), this writing job is considered to have been
failed, and the execution engine will attempt to call abort(WriterCommitMessage[])
.
The execution engine may call commit() multiple times for the same epoch in some circumstances.
To support exactly-once data semantics, implementations must ensure that multiple commits for
the same epoch are idempotent.void abort(long epochId, WriterCommitMessage[] messages)
commit(WriterCommitMessage[])
fails.
If this method fails (by throwing an exception), the underlying data source may require manual
cleanup.
Unless the abort is triggered by the failure of commit, the given messages will have some
null slots, as there may be only a few data writers that were committed before the abort
happens, or some data writers were committed but their commit messages haven't reached the
driver when the abort is triggered. So this is just a "best effort" for data sources to
clean up the data left by data writers.default void commit(WriterCommitMessage[] messages)
DataSourceWriter
DataWriter.commit()
.
If this method fails (by throwing an exception), this writing job is considered to to have been
failed, and DataSourceWriter.abort(WriterCommitMessage[])
would be called. The state of the destination
is undefined and @DataSourceWriter.abort(WriterCommitMessage[])
may not be able to deal with it.
Note that speculative execution may cause multiple tasks to run for a partition. By default,
Spark uses the commit coordinator to allow at most one task to commit. Implementations can
disable this behavior by overriding DataSourceWriter.useCommitCoordinator()
. If disabled, multiple
tasks may have committed successfully and one successful commit message per task will be
passed to this commit method. The remaining commit messages are ignored by Spark.commit
in interface DataSourceWriter
default void abort(WriterCommitMessage[] messages)
DataSourceWriter
DataSourceWriter.onDataWriterCommit(WriterCommitMessage)
fails,
or DataSourceWriter.commit(WriterCommitMessage[])
fails.
If this method fails (by throwing an exception), the underlying data source may require manual
cleanup.
Unless the abort is triggered by the failure of commit, the given messages should have some
null slots as there maybe only a few data writers that are committed before the abort
happens, or some data writers were committed but their commit messages haven't reached the
driver when the abort is triggered. So this is just a "best effort" for data sources to
clean up the data left by data writers.abort
in interface DataSourceWriter