NEXMark Benchmarks

NEXMark is the standard workload used to compare stream processing systems. It simulates an online auction — people, bids, auctions, a steady event rate — and defines a handful of queries (projection, windowed aggregation, stream-to-stream join) that together exercise the throughput, latency, and stateful-operator paths that matter in real pipelines.

Volley ships five of those queries as runnable examples so you can measure what your hardware does with your data shape, and compare against Flink or any other system that also publishes NEXMark numbers.

Queries

Query	Description	What it tests
Q1	Currency Conversion	Raw throughput — stateless map/projection
Q4	Average Selling Price per Category	Windowed join between bids and auctions; post-join filtering; min-watermark across dual sources
Q5	Hot Items	Sliding window aggregation under sustained load
Q7	Highest Bid per Session	Session window merging; high-cardinality keyed state; RocksDB under sustained load
Q8	Monitor New Users	Windowed stream-to-stream join

Running

# Q8 — Windowed join (multi-source pipeline)
cargo run --example nexmark_q8 -p volley-benchmark --release

# Q5 — Sliding window aggregation (recommended starting point)
cargo run --example nexmark_q5 -p volley-benchmark --release

# Q4 — Windowed join (average selling price per category)
cargo run --example nexmark_q4 -p volley-benchmark --release

# Q7 — Session windows (highest bid per session per bidder)
cargo run --example nexmark_q7 -p volley-benchmark --release

# Q1 — Stateless projection baseline
cargo run --example nexmark_q1 -p volley-benchmark --release

Configuration

Variable	Default	Description
`NEXMARK_DURATION`	`10`	Benchmark duration in seconds
`NEXMARK_BATCH_SIZE`	`4000`	Records per Arrow RecordBatch
`NEXMARK_PARALLELISM`	`4`	Parallel operator instances (Q5)
`NEXMARK_AUCTIONS`	`100`	Number of active auctions (Q5)
`NEXMARK_WINDOW_SIZE`	`10`	Window size in seconds (Q5)
`NEXMARK_WINDOW_SLIDE`	`2`	Window slide in seconds (Q5)
`NEXMARK_INTER_EVENT_MS`	`1`	Inter-event time in milliseconds

Metrics

Sustained throughput (records/sec) — measured over the full run, not peak
p50/p99/p999 latency — per-batch processing time from source to sink
Memory footprint — RSS delta under load
Recovery time — restore from checkpoint after failure (planned)

Comparison Baseline

When comparing against Apache Flink:

Use tuned Flink — RocksDB state backend, off-heap memory, equivalent parallelism
Match batch semantics — set Flink’s table.exec.mini-batch.size to match Volley’s batch size
Same hardware, same event rate, same total events
Discard first 10% of measurements for JIT warmup (Flink) and cache warming (both)

See volley-benchmark/README.md for full methodology.

Stress testing

In addition to the short benchmark runs above, Volley ships a stress testing suite for long-running pass/fail validation. Stress scenarios run for hours against Kafka-backed pipelines, auto-calibrate the throughput ceiling, and assert hard SLO thresholds.

Scenarios

Scenario	What it tests
`stress_sustained`	Sliding window aggregation at 80% of calibrated ceiling for 2 h — verifies no throughput drift, latency regression, or memory growth
`stress_backpressure`	Three-phase overload test: baseline → 120% ceiling → recovery — asserts catch-up within 60 s
`stress_checkpoint_recovery`	Crash mid-run and restart from checkpoint — asserts state restoration within 30 s
`stress_nexmark_q4`	Kafka-backed Q4 (windowed join) under sustained load
`stress_nexmark_q7`	Kafka-backed Q7 (session windows, high cardinality) under sustained load

Running

# All scenarios — default 2 h each (requires Docker for Kafka)
./scripts/stress-test.sh

# CI mode — 10 min per scenario
./scripts/stress-test.sh --ci

# Single scenario with custom settings
STRESS_DURATION=30m STRESS_TARGET_RATE=5000 ./scripts/stress-test.sh sustained

Pass/fail thresholds

Threshold	Default limit
Throughput drift	≤ 5% drop from calibrated operating point
p99 latency	≤ 2× warmup baseline
p999 latency	≤ 5× warmup baseline
Memory growth	≤ 20% RSS increase from warmup to end
Checkpoint duration	≤ 2× warmup baseline

Reports are written as JSON to target/stress-reports/ and uploaded as a CI artifact when run via the stress.yml GitHub Actions workflow.

Keyboard shortcuts

Volley Documentation