@InterfaceStability.Evolving
public interface Partitioning
SupportsReportPartitioning.outputPartitioning()
. Note that this should work like a
snapshot. Once created, it should be deterministic and always report the same number of
partitions and the same "satisfy" result for a certain distribution.Modifier and Type | Method and Description |
---|---|
int |
numPartitions()
Returns the number of partitions(i.e.,
InputPartition s) the data source outputs. |
boolean |
satisfy(Distribution distribution)
Returns true if this partitioning can satisfy the given distribution, which means Spark does
not need to shuffle the output data of this data source for some certain operations.
|
int numPartitions()
InputPartition
s) the data source outputs.boolean satisfy(Distribution distribution)
Distribution
in new releases.
This method should be aware of it and always return false for unrecognized distributions. It's
recommended to check every Spark new release and support new distributions if possible, to
avoid shuffle at Spark side for more cases.