public class PlumbingStreams
extends java.lang.Object
TStream
.
Methods that manipulate the flow of tuples in a streaming topology,
but are not part of the logic of the application.Constructor and Description |
---|
PlumbingStreams() |
Modifier and Type | Method and Description |
---|---|
static <T> TStream<java.util.List<T>> |
barrier(java.util.List<TStream<T>> streams)
A tuple synchronization barrier.
|
static <T> TStream<java.util.List<T>> |
barrier(java.util.List<TStream<T>> streams,
int queueCapacity)
A tuple synchronization barrier.
|
static <T> TStream<T> |
blockingDelay(TStream<T> stream,
long delay,
java.util.concurrent.TimeUnit unit)
Insert a blocking delay between tuples.
|
static <T> TStream<T> |
blockingOneShotDelay(TStream<T> stream,
long delay,
java.util.concurrent.TimeUnit unit)
Insert a blocking delay before forwarding the first tuple and
no delay for subsequent tuples.
|
static <T> TStream<T> |
blockingThrottle(TStream<T> stream,
long delay,
java.util.concurrent.TimeUnit unit)
Maintain a constant blocking delay between tuples.
|
static <T,U,R> TStream<R> |
concurrent(TStream<T> stream,
java.util.List<Function<TStream<T>,TStream<U>>> pipelines,
Function<java.util.List<U>,R> combiner)
Perform analytics concurrently.
|
static <T,U,R> TStream<R> |
concurrentMap(TStream<T> stream,
java.util.List<Function<T,U>> mappers,
Function<java.util.List<U>,R> combiner)
Perform analytics concurrently.
|
static <T> TStream<T> |
gate(TStream<T> stream,
java.util.concurrent.Semaphore semaphore)
Control the flow of tuples to an output stream.
|
static <T> TStream<T> |
isolate(TStream<T> stream,
boolean ordered)
Isolate upstream processing from downstream processing.
|
static <T> TStream<T> |
isolate(TStream<T> stream,
int queueCapacity)
Isolate upstream processing from downstream processing.
|
static <T,R> TStream<R> |
parallel(TStream<T> stream,
int width,
ToIntFunction<T> splitter,
BiFunction<TStream<T>,java.lang.Integer,TStream<R>> pipeline)
Perform an analytic pipeline on tuples in parallel.
|
static <T,R> TStream<R> |
parallelBalanced(TStream<T> stream,
int width,
BiFunction<TStream<T>,java.lang.Integer,TStream<R>> pipeline)
Perform an analytic pipeline on tuples in parallel.
|
static <T,U> TStream<U> |
parallelMap(TStream<T> stream,
int width,
ToIntFunction<T> splitter,
BiFunction<T,java.lang.Integer,U> mapper)
Perform an analytic function on tuples in parallel.
|
static <T,K> TStream<T> |
pressureReliever(TStream<T> stream,
Function<T,K> keyFunction,
int count)
Relieve pressure on upstream processing by discarding tuples.
|
static <T> ToIntFunction<T> |
roundRobinSplitter(int width)
A round-robin splitter ToIntFunction
|
public static <T> TStream<T> blockingDelay(TStream<T> stream, long delay, java.util.concurrent.TimeUnit unit)
delay
.
Delays less than 1msec are translated to a 0 delay.
This function always adds the delay
amount after receiving
a tuple before forwarding it.
Downstream tuple processing delays will affect the overall delay of a subsequent tuple.
e.g., the input stream contains two tuples t1 and t2 and
the delay is 100ms. The forwarding of t1 is delayed by 100ms.
Then if a downstream processing delay of 80ms occurs, this function
receives t2 80ms after it forwarded t1 and it will delay another
100ms before forwarding t2. Hence the overall delay between forwarding
t1 and t2 is 180ms.
See blockingThrottle
.
T
- Tuple typestream
- Stream tdelay
- Amount of time to delay a tuple.unit
- Time unit for delay
.public static <T> TStream<T> blockingThrottle(TStream<T> stream, long delay, java.util.concurrent.TimeUnit unit)
delay
.
Delays less than 1msec are translated to a 0 delay.
Sample use:
TStream<String> stream = topology.strings("a", "b, "c");
// Create a stream with tuples throttled to 1 second intervals.
TStream<String> throttledStream = blockingThrottle(stream, 1, TimeUnit.SECOND);
// print out the throttled tuples as they arrive
throttledStream.peek(t -> System.out.println(new Date() + " - " + t));
The function adjusts for downstream processing delays.
The first tuple is not delayed. If delay
has already
elapsed since the prior tuple was forwarded, the tuple
is forwarded immediately.
Otherwise, forwarding the tuple is delayed to achieve
a delay
amount since forwarding the prior tuple.
e.g., the input stream contains two tuples t1 and t2 and the delay is 100ms. The forwarding of t1 is delayed by 100ms. Then if a downstream processing delay of 80ms occurs, this function receives t2 80ms after it forwarded t1 and it will only delay another 20ms (100ms - 80ms) before forwarding t2. Hence the overall delay between forwarding t1 and t2 remains 100ms.
T
- tuple typestream
- the stream to throttledelay
- Amount of time to delay a tuple.unit
- Time unit for delay
.public static <T> TStream<T> blockingOneShotDelay(TStream<T> stream, long delay, java.util.concurrent.TimeUnit unit)
Delays less than 1msec are translated to a 0 delay.
Sample use:
TStream<String> stream = topology.strings("a", "b, "c");
// create a stream where the first tuple is delayed by 5 seconds.
TStream<String> oneShotDelayedStream =
stream.map( blockingOneShotDelay(5, TimeUnit.SECONDS) );
T
- tuple typestream
- input streamdelay
- Amount of time to delay a tuple.unit
- Time unit for delay
.public static <T,K> TStream<T> pressureReliever(TStream<T> stream, Function<T,K> keyFunction, int count)
Any downstream processing of the returned stream is isolated
from stream
so that any slow down does not affect stream
.
When the downstream processing cannot keep up with rate of
stream
tuples will be dropped from returned stream.
Up to count
of the most recent tuples per key from stream
are maintained when downstream processing is slow, any older tuples
that have not been submitted to the returned stream will be discarded.
Tuple order is maintained within a partition but is not guaranteed to
be maintained across partitions.
T
- Tuple type.K
- Key type.stream
- Stream to be isolated from downstream processing.keyFunction
- Function defining the key of each tuple.count
- Maximum number of tuples to maintain when downstream processing is backing up.stream
.isolate
public static <T> TStream<T> isolate(TStream<T> stream, boolean ordered)
OutOfMemoryExceptions
if the processing against returned stream cannot keep up
with the arrival rate of tuples on stream
.T
- Tuple typestream
- Stream to be isolated from downstream processing.ordered
- true
to maintain arrival order on the returned stream,
false
to not guaranteed arrival order.stream
.public static <T> TStream<T> isolate(TStream<T> stream, int queueCapacity)
If the processing against the returned stream cannot keep up
with the arrival rate of tuples on stream
, upstream
processing will block until there is space in the queue between
the streams.
Processing of tuples occurs in the order they were received.
T
- Tuple typestream
- Stream to be isolated from downstream processing.queueCapacity
- size of the queue between stream
and
the returned stream.stream
.pressureReliever
public static <T,U,R> TStream<R> concurrentMap(TStream<T> stream, java.util.List<Function<T,U>> mappers, Function<java.util.List<U>,R> combiner)
This is a convenience function that calls
concurrent(TStream, List, Function)
after
creating pipeline
and combiner
functions
from the supplied mappers
and combiner
arguments.
That is, it is logically, if not exactly, the same as:
List<Function<TStream<T>,TStream<U>>> pipelines = new ArrayList<>();
for (Function<T,U> mapper : mappers)
pipelines.add(s -> s.map(mapper));
concurrent(stream, pipelines, combiner);
T
- Tuple type on input stream.U
- Tuple type generated by mappers.R
- Tuple type of the result.stream
- input streammappers
- functions to be run concurrently. Each mapper MUST
return a non-null result.
A runtime error will be generated if a null result
is returned.combiner
- function to create a result tuple from the list of
results from mappers
.
The input list order is 1:1 with the mappers
list.
I.e., list entry [0] is the result from mappers[0],
list entry [1] is the result from mappers[1], etc.public static <T,U,R> TStream<R> concurrent(TStream<T> stream, java.util.List<Function<TStream<T>,TStream<U>>> pipelines, Function<java.util.List<U>,R> combiner)
Process input tuples one at at time, invoking the specified
analytics (pipelines
) concurrently, combine the results,
and then process the next input tuple in the same manner.
Logically, instead of doing this:
sensorReadings<T> -> A1 -> A2 -> A3 -> results<R>
create a graph that's logically like this:
-
|-> A1 ->|
sensorReadings<T> -> |-> A2 ->| -> results<R>
|-> A3 ->|
more specifically a graph like this:
-
|-> isolate(1) -> pipeline1 -> |
stream -> |-> isolate(1) -> pipeline2 -> |-> barrier(10) -> combiner
|-> isolate(1) -> pipeline3 -> |
. . .
The typical use case for this is when an application has a collection of independent analytics to perform on each tuple and the analytics are sufficiently long running such that performing them concurrently is desired.
Note, this is in contrast to "parallel" stream processing, which in Java8 Streams and other contexts means processing multiple tuples in parallel, each on a replicated processing pipeline.
Threadsafety - one of the following must be true:
stream
are threadsafepipelines
do not modify the input tuplespipelines
provide their own synchronization controls
to protect concurrent modifications of the input tuples
Logically, a thread is allocated for each of the pipelines
.
The actual degree of concurrency may be TopologyProvider
dependent.
T
- Tuple type on input stream.U
- Tuple type generated by pipelines.R
- Tuple type of the result.stream
- input streampipelines
- a list of functions to add a pipeline to the topology.
Each pipeline.apply()
is called with stream
as the input, yielding the pipeline's result stream.
For each input tuple, a pipeline MUST create exactly one output tuple.
Tuple flow into the pipelines will cease if that requirement
is not met.combiner
- function to create a result tuple from the list of
results from pipelines
.
The input tuple list's order is 1:1 with the pipelines
list.
I.e., list entry [0] is the result from pipelines[0],
list entry [1] is the result from pipelines[1], etc.barrier
public static <T> TStream<java.util.List<T>> barrier(java.util.List<TStream<T>> streams)
Same as barrier(others, 1)
T
- Tuple typestreams
- input streamsbarrier(List, int)
public static <T> TStream<java.util.List<T>> barrier(java.util.List<TStream<T>> streams, int queueCapacity)
A barrier has n input streams with tuple type T
and one output stream with tuple type List<T>
.
Once the barrier receives one tuple on each of its input streams,
it generates an output tuple containing one tuple from each input stream.
It then waits until it has received another tuple from each input stream.
Input stream 0's tuple is in the output tuple's list[0], stream 1's tuple in list[1], and so on.
The barrier's output stream is isolated from the input streams.
The barrier has a queue of size queueCapacity
for each
input stream. When a tuple for an input stream is received it is
added to its queue. The stream will block if the queue is full.
T
- Type of the tuple.streams
- the list of input streamsqueueCapacity
- the size of each input stream's queueBarrier
public static <T,U> TStream<U> parallelMap(TStream<T> stream, int width, ToIntFunction<T> splitter, BiFunction<T,java.lang.Integer,U> mapper)
Same as parallel(stream, width, splitter, (s,ch) -> s.map(t -> mapper.apply(t, ch))
T
- Input stream tuple typeU
- Result stream tuple typestream
- input streamsplitter
- the tuple channel allocation functionmapper
- analytic functionwidth
- number of channelsroundRobinSplitter
,
concurrentMap
public static <T,R> TStream<R> parallel(TStream<T> stream, int width, ToIntFunction<T> splitter, BiFunction<TStream<T>,java.lang.Integer,TStream<R>> pipeline)
Splits stream
into width
parallel processing channels,
partitioning tuples among the channels using splitter
.
Each channel runs a copy of pipeline
.
The resulting stream is isolated from the upstream parallel channels.
The ordering of tuples in stream
is not maintained in the
results from parallel
.
pipeline
is not required to yield a result for each input
tuple.
A common splitter function is a roundRobinSplitter
.
The generated graph looks like this:
-
|-> isolate(10) -> pipeline-ch1 -> |
stream -> split(width,splitter) -> |-> isolate(10) -> pipeline-ch2 -> |-> union -> isolate(width)
|-> isolate(10) -> pipeline-ch3 -> |
. . .
T
- Input stream tuple typeR
- Result stream tuple typestream
- the input streamwidth
- number of parallel processing channelssplitter
- the tuple channel allocation functionpipeline
- the pipeline for each channel.
pipeline.apply(inputStream,channel)
is called to generate the pipeline for each channel.roundRobinSplitter
,
concurrent
public static <T,R> TStream<R> parallelBalanced(TStream<T> stream, int width, BiFunction<TStream<T>,java.lang.Integer,TStream<R>> pipeline)
Splits stream
into width
parallel processing channels,
partitioning tuples among the channels in a load balanced fashion.
Each channel runs a copy of pipeline
.
The resulting stream is isolated from the upstream parallel channels.
The ordering of tuples in stream
is not maintained in the
results from parallel
.
A pipeline
MUST yield a result for each input
tuple. Failure to do so will result in the channel remaining
in a busy state and no longer available to process additional tuples.
A LoadBalancedSplitter
is used to distribute tuples.
The generated graph looks like this:
-
|-> isolate(1) -> pipeline-ch1 -> peek(splitter.channelDone()) -> |
stream -> split(width,splitter) -> |-> isolate(1) -> pipeline-ch2 -> peek(splitter.channelDone()) -> |-> union -> isolate(width)
|-> isolate(1) -> pipeline-ch3 -> peek(splitter.channelDone()) -> |
. . .
Note, this implementation requires that the splitter is used from
only a single JVM. The DirectProvider
provider meets this requirement.
T
- Input stream tuple typeR
- Result stream tuple typestream
- the input streamwidth
- number of parallel processing channelspipeline
- the pipeline for each channel.
pipeline.apply(inputStream,channel)
is called to generate the pipeline for each channel.parallel(TStream, int, ToIntFunction, BiFunction)
,
LoadBalancedSplitter
public static <T> ToIntFunction<T> roundRobinSplitter(int width)
The splitter function cycles among the width
channels
on successive calls to roundRobinSplitter.applyAsInt()
,
returning 0, 1, ..., width-1, 0, 1, ..., width-1
.
T
- Tuple typewidth
- number of splitter channelsTStream.split
,
parallel
public static <T> TStream<T> gate(TStream<T> stream, java.util.concurrent.Semaphore semaphore)
A Semaphore
is used to control the flow of tuples
through the gate
. The gate acquires a permit from the
semaphore to pass the tuple through, blocking until a permit is
acquired (and applying backpressure upstream while blocked).
Elsewhere, some code calls Semaphore.release(int)
to make permits available.
If a TopologyProvider is used that can distribute a topology's streams to different JVM's the gate and the code releasing the permits must be in the same JVM.
Sample use:
Suppose you wanted to control processing such that concurrent
pipelines processed each tuple in lock-step.
I.e., You want all of the pipelines to start processing a tuple
at the same time and not start a new tuple until the current
tuple had been fully processed by each of them:
TStream<Integer> readings = ...;
Semaphore gateControl = new Semaphore(1); // allow the first to pass through
TStream<Integer> gated = gate(readings, gateControl);
// Create the concurrent pipeline combiner and have it
// signal that concurrent processing of the tuple has completed.
// In this sample the combiner just returns the received list of
// each pipeline result.
Function<TStream<List<Integer>>,TStream<List<Integer>>> combiner =
stream -> stream.map(list -> { * gateControl.release(); * return list; * });
TStream<List<Integer>> results = PlumbingStreams.concurrent(gated, pipelines, combiner);
T
- Tuple typestream
- the input streamsemaphore
- gate controlCopyright © 2016 The Apache Software Foundation. All Rights Reserved - bbe71fa-20161201-1641