public class Stream extends Object implements IAggregatableStream, ResourceDeclarer<Stream>
A Stream represents the core data model in Trident, and can be thought of as a “stream” of tuples that are processed as a series of small batches. A stream is partitioned accross the nodes in the cluster, and operations are applied to a stream in parallel accross each partition.
There are five types of operations that can be performed on streams in Trident
1. Partiton-Local Operations - Operations that are applied locally to each partition and do not involve network transfer 2. Repartitioning Operations - Operations that change how tuples are partitioned across tasks(thus causing network transfer), but do not change the content of the stream. 3. Aggregation Operations - Operations that may repartition a stream (thus causing network transfer) 4. Grouping Operations - Operations that may repartition a stream on specific fields and group together tuples whose fields values are equal. 5. Merge and Join Operations - Operations that combine different streams together.
Modifier | Constructor and Description |
---|---|
protected |
Stream(TridentTopology topology,
String name,
Node node) |
Modifier and Type | Method and Description |
---|---|
Stream |
addSharedMemory(SharedMemory request)
Add in request for shared memory that this component will use.
|
Stream |
aggregate(Aggregator agg,
Fields functionFields) |
Stream |
aggregate(CombinerAggregator agg,
Fields functionFields) |
Stream |
aggregate(Fields inputFields,
Aggregator agg,
Fields functionFields) |
Stream |
aggregate(Fields inputFields,
CombinerAggregator agg,
Fields functionFields) |
Stream |
aggregate(Fields inputFields,
ReducerAggregator agg,
Fields functionFields) |
Stream |
aggregate(ReducerAggregator agg,
Fields functionFields) |
Stream |
applyAssembly(Assembly assembly)
Applies an
Assembly to this Stream . |
Stream |
batchGlobal()
Repartitioning Operation.
|
Stream |
broadcast()
Repartitioning Operation.
|
ChainedAggregatorDeclarer |
chainedAgg() |
Stream |
each(Fields inputFields,
Filter filter) |
Stream |
each(Fields inputFields,
Function function,
Fields functionFields) |
Stream |
each(Function function,
Fields functionFields) |
Stream |
filter(Fields inputFields,
Filter filter)
Returns a stream consisting of the elements of this stream that match the given filter.
|
Stream |
filter(Filter filter)
Returns a stream consisting of the elements of this stream that match the given filter.
|
Stream |
flatMap(FlatMapFunction function)
Returns a stream consisting of the results of replacing each value of this stream with the contents produced by applying the provided mapping function to each value.
|
Stream |
flatMap(FlatMapFunction function,
Fields outputFields)
Returns a stream consisting of the results of replacing each value of this stream with the contents produced by applying the provided mapping function to each value.
|
Fields |
getOutputFields() |
Stream |
global()
Repartitioning Operation.
|
GroupedStream |
groupBy(Fields fields)
Grouping Operation.
|
Stream |
identityPartition()
Repartitioning Operation.
|
Stream |
localOrShuffle()
Repartitioning Operation.
|
Stream |
map(MapFunction function)
Returns a stream consisting of the result of applying the given mapping function to the values of this stream.
|
Stream |
map(MapFunction function,
Fields outputFields)
Returns a stream consisting of the result of applying the given mapping function to the values of this stream.
|
Stream |
max(Comparator<TridentTuple> comparator)
This aggregator operation computes the maximum of tuples in a stream by using the given
comparator with TridentTuple s. |
Stream |
maxBy(String inputFieldName)
This aggregator operation computes the maximum of tuples by the given
inputFieldName and it is assumed that its value is an instance of Comparable . |
<T> Stream |
maxBy(String inputFieldName,
Comparator<T> comparator)
This aggregator operation computes the maximum of tuples by the given
inputFieldName in a stream by using the given comparator . |
Stream |
min(Comparator<TridentTuple> comparator)
This aggregator operation computes the minimum of tuples in a stream by using the given
comparator with TridentTuple s. |
Stream |
minBy(String inputFieldName)
This aggregator operation computes the minimum of tuples by the given
inputFieldName and it is assumed that its value is an instance of Comparable . |
<T> Stream |
minBy(String inputFieldName,
Comparator<T> comparator)
This aggregator operation computes the minimum of tuples by the given
inputFieldName in a stream by using the given comparator . |
Stream |
name(String name)
Applies a label to the stream.
|
Stream |
parallelismHint(int hint)
Applies a parallelism hint to a stream.
|
Stream |
partition(CustomStreamGrouping partitioner)
Repartitioning Operation.
|
Stream |
partition(Grouping grouping)
Repartitioning Operation.
|
Stream |
partitionAggregate(Aggregator agg,
Fields functionFields) |
Stream |
partitionAggregate(CombinerAggregator agg,
Fields functionFields) |
Stream |
partitionAggregate(Fields inputFields,
Aggregator agg,
Fields functionFields) |
Stream |
partitionAggregate(Fields inputFields,
CombinerAggregator agg,
Fields functionFields) |
Stream |
partitionAggregate(Fields inputFields,
ReducerAggregator agg,
Fields functionFields) |
Stream |
partitionAggregate(ReducerAggregator agg,
Fields functionFields) |
Stream |
partitionBy(Fields fields)
Repartitioning Operation.
|
TridentState |
partitionPersist(StateFactory stateFactory,
Fields inputFields,
StateUpdater updater) |
TridentState |
partitionPersist(StateFactory stateFactory,
Fields inputFields,
StateUpdater updater,
Fields functionFields) |
TridentState |
partitionPersist(StateFactory stateFactory,
StateUpdater updater) |
TridentState |
partitionPersist(StateFactory stateFactory,
StateUpdater updater,
Fields functionFields) |
TridentState |
partitionPersist(StateSpec stateSpec,
Fields inputFields,
StateUpdater updater) |
TridentState |
partitionPersist(StateSpec stateSpec,
Fields inputFields,
StateUpdater updater,
Fields functionFields) |
TridentState |
partitionPersist(StateSpec stateSpec,
StateUpdater updater) |
TridentState |
partitionPersist(StateSpec stateSpec,
StateUpdater updater,
Fields functionFields) |
Stream |
peek(Consumer action)
Returns a stream consisting of the trident tuples of this stream, additionally performing the provided action on each trident tuple as they are consumed from the resulting stream.
|
TridentState |
persistentAggregate(StateFactory stateFactory,
CombinerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateFactory stateFactory,
Fields inputFields,
CombinerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateFactory stateFactory,
Fields inputFields,
ReducerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateFactory stateFactory,
ReducerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateSpec spec,
CombinerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateSpec spec,
Fields inputFields,
CombinerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateSpec spec,
Fields inputFields,
ReducerAggregator agg,
Fields functionFields) |
TridentState |
persistentAggregate(StateSpec spec,
ReducerAggregator agg,
Fields functionFields) |
Stream |
project(Fields keepFields)
Filters out fields from a stream, resulting in a Stream containing only the fields specified by
keepFields . |
Stream |
setCPULoad(Number load)
Sets the CPU Load resource for the current operation.
|
Stream |
setMemoryLoad(Number onHeap)
Sets the Memory Load resources for the current operation.
|
Stream |
setMemoryLoad(Number onHeap,
Number offHeap)
Sets the Memory Load resources for the current operation.
|
Stream |
shuffle()
Repartitioning Operation.
|
Stream |
slidingWindow(BaseWindowedBolt.Duration windowDuration,
BaseWindowedBolt.Duration slidingInterval,
WindowsStoreFactory windowStoreFactory,
Fields inputFields,
Aggregator aggregator,
Fields functionFields)
Returns a stream of tuples which are aggregated results of a window which slides at duration of
slidingInterval and completes a window at windowDuration . |
Stream |
slidingWindow(int windowCount,
int slideCount,
WindowsStoreFactory windowStoreFactory,
Fields inputFields,
Aggregator aggregator,
Fields functionFields)
Returns a stream of tuples which are aggregated results of a sliding window with every
windowCount of tuples and slides the window after slideCount . |
Stream |
stateQuery(TridentState state,
Fields inputFields,
QueryFunction function,
Fields functionFields) |
Stream |
stateQuery(TridentState state,
QueryFunction function,
Fields functionFields) |
Stream |
toStream() |
Stream |
tumblingWindow(BaseWindowedBolt.Duration windowDuration,
WindowsStoreFactory windowStoreFactory,
Fields inputFields,
Aggregator aggregator,
Fields functionFields)
Returns a stream of tuples which are aggregated results of a window that tumbles at duration of
windowDuration . |
Stream |
tumblingWindow(int windowCount,
WindowsStoreFactory windowStoreFactory,
Fields inputFields,
Aggregator aggregator,
Fields functionFields)
Returns a stream of tuples which are aggregated results of a tumbling window with every
windowCount of tuples. |
Stream |
window(WindowConfig windowConfig,
Fields inputFields,
Aggregator aggregator,
Fields functionFields)
Returns a stream of aggregated results based on the given window configuration which uses inmemory windowing tuple store.
|
Stream |
window(WindowConfig windowConfig,
WindowsStoreFactory windowStoreFactory,
Fields inputFields,
Aggregator aggregator,
Fields functionFields)
Returns stream of aggregated results based on the given window configuration.
|
protected Stream(TridentTopology topology, String name, Node node)
public Stream name(String name)
Applies a label to the stream. Naming a stream will append the label to the name of the bolt(s) created by Trident and will be visible in the Storm UI.
name
- public Stream parallelismHint(int hint)
Applies a parallelism hint to a stream.
public Stream setCPULoad(Number load)
Sets the CPU Load resource for the current operation.
setCPULoad
in interface ResourceDeclarer<Stream>
load
- the amount of CPUpublic Stream setMemoryLoad(Number onHeap)
Sets the Memory Load resources for the current operation. offHeap becomes default.
setMemoryLoad
in interface ResourceDeclarer<Stream>
onHeap
- the amount of on heap memorypublic Stream setMemoryLoad(Number onHeap, Number offHeap)
Sets the Memory Load resources for the current operation.
setMemoryLoad
in interface ResourceDeclarer<Stream>
onHeap
- the amount of on heap memoryoffHeap
- the amount of off heap memorypublic Stream addSharedMemory(SharedMemory request)
ResourceDeclarer
Add in request for shared memory that this component will use. See SharedOnHeap
, SharedOffHeapWithinNode
, and SharedOffHeapWithinWorker
for convenient ways to create shared memory requests.
addSharedMemory
in interface ResourceDeclarer<Stream>
request
- the shared memory request for this componentpublic Stream project(Fields keepFields)
Filters out fields from a stream, resulting in a Stream containing only the fields specified by keepFields
.
For example, if you had a Stream mystream
containing the fields ["a", "b", "c","d"]
, calling"
java mystream.project(new Fields("b", "d"))
would produce a stream containing only the fields ["b", "d"]
.
keepFields
- The fields in the Stream to keeppublic GroupedStream groupBy(Fields fields)
public Stream partition(CustomStreamGrouping partitioner)
public Stream partition(Grouping grouping)
This method takes in a custom partitioning function that implements CustomStreamGrouping
public Stream shuffle()
Use random round robin algorithm to evenly redistribute tuples across all target partitions.
public Stream localOrShuffle()
Use random round robin algorithm to evenly redistribute tuples across all target partitions, with a preference for local tasks.
public Stream global()
All tuples are sent to the same partition. The same partition is chosen for all batches in the stream.
public Stream batchGlobal()
All tuples in the batch are sent to the same partition. Different batches in the stream may go to different partitions.
public Stream broadcast()
Every tuple is replicated to all target partitions. This can useful during DRPC – for example, if you need to do a stateQuery on every partition of data.
public Stream identityPartition()
public Stream applyAssembly(Assembly assembly)
Applies an Assembly
to this Stream
.
Assembly
public Stream each(Fields inputFields, Function function, Fields functionFields)
each
in interface IAggregatableStream
public Stream partitionAggregate(Fields inputFields, Aggregator agg, Fields functionFields)
partitionAggregate
in interface IAggregatableStream
public Stream partitionAggregate(Aggregator agg, Fields functionFields)
public Stream partitionAggregate(CombinerAggregator agg, Fields functionFields)
public Stream partitionAggregate(Fields inputFields, CombinerAggregator agg, Fields functionFields)
public Stream partitionAggregate(ReducerAggregator agg, Fields functionFields)
public Stream partitionAggregate(Fields inputFields, ReducerAggregator agg, Fields functionFields)
public Stream stateQuery(TridentState state, Fields inputFields, QueryFunction function, Fields functionFields)
public Stream stateQuery(TridentState state, QueryFunction function, Fields functionFields)
public TridentState partitionPersist(StateFactory stateFactory, Fields inputFields, StateUpdater updater, Fields functionFields)
public TridentState partitionPersist(StateSpec stateSpec, Fields inputFields, StateUpdater updater, Fields functionFields)
public TridentState partitionPersist(StateFactory stateFactory, Fields inputFields, StateUpdater updater)
public TridentState partitionPersist(StateSpec stateSpec, Fields inputFields, StateUpdater updater)
public TridentState partitionPersist(StateFactory stateFactory, StateUpdater updater, Fields functionFields)
public TridentState partitionPersist(StateSpec stateSpec, StateUpdater updater, Fields functionFields)
public TridentState partitionPersist(StateFactory stateFactory, StateUpdater updater)
public TridentState partitionPersist(StateSpec stateSpec, StateUpdater updater)
public Stream filter(Filter filter)
Returns a stream consisting of the elements of this stream that match the given filter.
filter
- the filter to apply to each trident tuple to determine if it should be included.public Stream filter(Fields inputFields, Filter filter)
Returns a stream consisting of the elements of this stream that match the given filter.
inputFields
- the fields of the input trident tuple to be selected.filter
- the filter to apply to each trident tuple to determine if it should be included.public Stream map(MapFunction function)
Returns a stream consisting of the result of applying the given mapping function to the values of this stream.
function
- a mapping function to be applied to each value in this stream.public Stream map(MapFunction function, Fields outputFields)
Returns a stream consisting of the result of applying the given mapping function to the values of this stream. This method replaces old output fields with new output fields, achieving T -> V conversion.
function
- a mapping function to be applied to each value in this stream.outputFields
- new output fieldspublic Stream flatMap(FlatMapFunction function)
Returns a stream consisting of the results of replacing each value of this stream with the contents produced by applying the provided mapping function to each value. This has the effect of applying a one-to-many transformation to the values of the stream, and then flattening the resulting elements into a new stream.
function
- a mapping function to be applied to each value in this stream which produces new values.public Stream flatMap(FlatMapFunction function, Fields outputFields)
Returns a stream consisting of the results of replacing each value of this stream with the contents produced by applying the provided mapping function to each value. This has the effect of applying a one-to-many transformation to the values of the stream, and then flattening the resulting elements into a new stream. This method replaces old output fields with new output fields, achieving T -> V conversion.
function
- a mapping function to be applied to each value in this stream which produces new values.outputFields
- new output fieldspublic Stream peek(Consumer action)
Returns a stream consisting of the trident tuples of this stream, additionally performing the provided action on each trident tuple as they are consumed from the resulting stream. This is mostly useful for debugging to see the tuples as they flow past a certain point in a pipeline.
action
- the action to perform on the trident tuple as they are consumed from the streampublic ChainedAggregatorDeclarer chainedAgg()
public Stream minBy(String inputFieldName)
This aggregator operation computes the minimum of tuples by the given inputFieldName
and it is assumed that its value is an instance of Comparable
. If the value of tuple with field inputFieldName
is not an instance of Comparable
then it throws ClassCastException
inputFieldName
- input field namepublic <T> Stream minBy(String inputFieldName, Comparator<T> comparator)
This aggregator operation computes the minimum of tuples by the given inputFieldName
in a stream by using the given comparator
. If the value of tuple with field inputFieldName
is not an instance of T
then it throws ClassCastException
inputFieldName
- input field namecomparator
- comparator used in for finding minimum of two tuple values of inputFieldName
.T
- type of tuple’s given input field value.public Stream min(Comparator<TridentTuple> comparator)
This aggregator operation computes the minimum of tuples in a stream by using the given comparator
with TridentTuple
s.
comparator
- comparator used in for finding minimum of two tuple values.public Stream maxBy(String inputFieldName)
This aggregator operation computes the maximum of tuples by the given inputFieldName
and it is assumed that its value is an instance of Comparable
. If the value of tuple with field inputFieldName
is not an instance of Comparable
then it throws ClassCastException
inputFieldName
- input field namepublic <T> Stream maxBy(String inputFieldName, Comparator<T> comparator)
This aggregator operation computes the maximum of tuples by the given inputFieldName
in a stream by using the given comparator
. If the value of tuple with field inputFieldName
is not an instance of T
then it throws ClassCastException
inputFieldName
- input field namecomparator
- comparator used in for finding maximum of two tuple values of inputFieldName
.T
- type of tuple’s given input field value.public Stream max(Comparator<TridentTuple> comparator)
This aggregator operation computes the maximum of tuples in a stream by using the given comparator
with TridentTuple
s.
comparator
- comparator used in for finding maximum of two tuple values.public Stream aggregate(Aggregator agg, Fields functionFields)
public Stream aggregate(Fields inputFields, Aggregator agg, Fields functionFields)
public Stream aggregate(CombinerAggregator agg, Fields functionFields)
public Stream aggregate(Fields inputFields, CombinerAggregator agg, Fields functionFields)
public Stream aggregate(ReducerAggregator agg, Fields functionFields)
public Stream aggregate(Fields inputFields, ReducerAggregator agg, Fields functionFields)
public Stream tumblingWindow(int windowCount, WindowsStoreFactory windowStoreFactory, Fields inputFields, Aggregator aggregator, Fields functionFields)
Returns a stream of tuples which are aggregated results of a tumbling window with every windowCount
of tuples.
windowCount
- represents number of tuples in the windowwindowStoreFactory
- intermediary tuple store for storing windowing tuplesinputFields
- projected fields for aggregatoraggregator
- aggregator to run on the window of tuples to compute the result and emit to the stream.functionFields
- fields of values to emit with aggregation.public Stream tumblingWindow(BaseWindowedBolt.Duration windowDuration, WindowsStoreFactory windowStoreFactory, Fields inputFields, Aggregator aggregator, Fields functionFields)
Returns a stream of tuples which are aggregated results of a window that tumbles at duration of windowDuration
.
windowDuration
- represents tumbling window duration configurationwindowStoreFactory
- intermediary tuple store for storing windowing tuplesinputFields
- projected fields for aggregatoraggregator
- aggregator to run on the window of tuples to compute the result and emit to the stream.functionFields
- fields of values to emit with aggregation.public Stream slidingWindow(int windowCount, int slideCount, WindowsStoreFactory windowStoreFactory, Fields inputFields, Aggregator aggregator, Fields functionFields)
Returns a stream of tuples which are aggregated results of a sliding window with every windowCount
of tuples and slides the window after slideCount
.
windowCount
- represents tuples count of a windowslideCount
- the number of tuples after which the window slideswindowStoreFactory
- intermediary tuple store for storing windowing tuplesinputFields
- projected fields for aggregatoraggregator
- aggregator to run on the window of tuples to compute the result and emit to the stream.functionFields
- fields of values to emit with aggregation.public Stream slidingWindow(BaseWindowedBolt.Duration windowDuration, BaseWindowedBolt.Duration slidingInterval, WindowsStoreFactory windowStoreFactory, Fields inputFields, Aggregator aggregator, Fields functionFields)
Returns a stream of tuples which are aggregated results of a window which slides at duration of slidingInterval
and completes a window at windowDuration
.
windowDuration
- represents window duration configurationslidingInterval
- the time duration after which the window slideswindowStoreFactory
- intermediary tuple store for storing windowing tuplesinputFields
- projected fields for aggregatoraggregator
- aggregator to run on the window of tuples to compute the result and emit to the stream.functionFields
- fields of values to emit with aggregation.public Stream window(WindowConfig windowConfig, Fields inputFields, Aggregator aggregator, Fields functionFields)
Returns a stream of aggregated results based on the given window configuration which uses inmemory windowing tuple store.
windowConfig
- window configuration like window length and slide length.inputFields
- input fieldsaggregator
- aggregator to run on the window of tuples to compute the result and emit to the stream.functionFields
- fields of values to emit with aggregation.public Stream window(WindowConfig windowConfig, WindowsStoreFactory windowStoreFactory, Fields inputFields, Aggregator aggregator, Fields functionFields)
Returns stream of aggregated results based on the given window configuration.
windowConfig
- window configuration like window length and slide length.windowStoreFactory
- intermediary tuple store for storing tuples for windowinginputFields
- input fieldsaggregator
- aggregator to run on the window of tuples to compute the result and emit to the stream.functionFields
- fields of values to emit with aggregation.public TridentState persistentAggregate(StateFactory stateFactory, CombinerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateSpec spec, CombinerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateFactory stateFactory, Fields inputFields, CombinerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateSpec spec, Fields inputFields, CombinerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateFactory stateFactory, ReducerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateSpec spec, ReducerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateFactory stateFactory, Fields inputFields, ReducerAggregator agg, Fields functionFields)
public TridentState persistentAggregate(StateSpec spec, Fields inputFields, ReducerAggregator agg, Fields functionFields)
public Stream toStream()
toStream
in interface IAggregatableStream
public Fields getOutputFields()
getOutputFields
in interface IAggregatableStream
Copyright © 2020 The Apache Software Foundation. All rights reserved.