Your First Pipeline
This walkthrough covers the in_memory_pipeline example, which demonstrates the core Volley API.
What It Does
The pipeline:
- Reads events from an in-memory iterator
- Filters records
- Partitions by a key field
- Groups into 5-minute tumbling windows
- Computes a sum aggregation
- Collects results in memory
Key Concepts
Creating the Environment
#![allow(unused)]
fn main() {
let env = StreamExecutionEnvironment::new();
}
This creates the pipeline builder. All pipeline construction starts here.
Adding a Source
#![allow(unused)]
fn main() {
env.from_iter(events)
}
from_iter() creates a DataStream from a Rust iterator of StreamRecord values. For production use, you’d use from_kafka() or a blob store source.
Applying Operators
#![allow(unused)]
fn main() {
.filter_expr(col("amount").gt(lit(100)))
}
Operators on a DataStream: filter_expr, map, flat_map, apply_operator.
Keying and Windowing
#![allow(unused)]
fn main() {
.key_by(col("user_id"))
.window(TumblingWindows::of(Duration::from_secs(300)))
.aggregate(AggregationType::Sum, "amount")
}
.key_by() transitions to a KeyedStream (hash-partitioned by key). .window() transitions to WindowedKeyedStream. .aggregate() computes the result per window per key.
Executing
#![allow(unused)]
fn main() {
.collect()
.execute("aggregation-job")
.await?;
}
.collect() attaches an in-memory sink. .execute() starts the pipeline as concurrent Tokio tasks and returns when all data is processed.
Run It
cargo run --example in_memory_pipeline -p volley-examples
Expected output:
Pipeline 'aggregation-job' started
Source: MemorySource (4 records)
Operators: filter_expr → key_by → window → aggregate
Sink: MemorySink
Processing complete: 4 records in, 2 aggregated results out
Next Steps
- Kafka Pipeline — connect to real data
- Operators Guide — write custom operators
- Windowing — deep dive into window types
- All Examples — browse the full examples list