Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

NEXMark Benchmarks

NEXMark is the standard workload used to compare stream processing systems. It simulates an online auction — people, bids, auctions, a steady event rate — and defines a handful of queries (projection, windowed aggregation, stream-to-stream join) that together exercise the throughput, latency, and stateful-operator paths that matter in real pipelines.

Volley ships five of those queries as runnable examples so you can measure what your hardware does with your data shape, and compare against Flink or any other system that also publishes NEXMark numbers.

Queries

QueryDescriptionWhat it tests
Q1Currency ConversionRaw throughput — stateless map/projection
Q4Average Selling Price per CategoryWindowed join between bids and auctions; post-join filtering; min-watermark across dual sources
Q5Hot ItemsSliding window aggregation under sustained load
Q7Highest Bid per SessionSession window merging; high-cardinality keyed state; RocksDB under sustained load
Q8Monitor New UsersWindowed stream-to-stream join

Running

# Q8 — Windowed join (multi-source pipeline)
cargo run --example nexmark_q8 -p volley-benchmark --release

# Q5 — Sliding window aggregation (recommended starting point)
cargo run --example nexmark_q5 -p volley-benchmark --release

# Q4 — Windowed join (average selling price per category)
cargo run --example nexmark_q4 -p volley-benchmark --release

# Q7 — Session windows (highest bid per session per bidder)
cargo run --example nexmark_q7 -p volley-benchmark --release

# Q1 — Stateless projection baseline
cargo run --example nexmark_q1 -p volley-benchmark --release

Configuration

VariableDefaultDescription
NEXMARK_DURATION10Benchmark duration in seconds
NEXMARK_BATCH_SIZE4000Records per Arrow RecordBatch
NEXMARK_PARALLELISM4Parallel operator instances (Q5)
NEXMARK_AUCTIONS100Number of active auctions (Q5)
NEXMARK_WINDOW_SIZE10Window size in seconds (Q5)
NEXMARK_WINDOW_SLIDE2Window slide in seconds (Q5)
NEXMARK_INTER_EVENT_MS1Inter-event time in milliseconds

Metrics

  • Sustained throughput (records/sec) — measured over the full run, not peak
  • p50/p99/p999 latency — per-batch processing time from source to sink
  • Memory footprint — RSS delta under load
  • Recovery time — restore from checkpoint after failure (planned)

Comparison Baseline

When comparing against Apache Flink:

  1. Use tuned Flink — RocksDB state backend, off-heap memory, equivalent parallelism
  2. Match batch semantics — set Flink’s table.exec.mini-batch.size to match Volley’s batch size
  3. Same hardware, same event rate, same total events
  4. Discard first 10% of measurements for JIT warmup (Flink) and cache warming (both)

See volley-benchmark/README.md for full methodology.

Stress testing

In addition to the short benchmark runs above, Volley ships a stress testing suite for long-running pass/fail validation. Stress scenarios run for hours against Kafka-backed pipelines, auto-calibrate the throughput ceiling, and assert hard SLO thresholds.

Scenarios

ScenarioWhat it tests
stress_sustainedSliding window aggregation at 80% of calibrated ceiling for 2 h — verifies no throughput drift, latency regression, or memory growth
stress_backpressureThree-phase overload test: baseline → 120% ceiling → recovery — asserts catch-up within 60 s
stress_checkpoint_recoveryCrash mid-run and restart from checkpoint — asserts state restoration within 30 s
stress_nexmark_q4Kafka-backed Q4 (windowed join) under sustained load
stress_nexmark_q7Kafka-backed Q7 (session windows, high cardinality) under sustained load

Running

# All scenarios — default 2 h each (requires Docker for Kafka)
./scripts/stress-test.sh

# CI mode — 10 min per scenario
./scripts/stress-test.sh --ci

# Single scenario with custom settings
STRESS_DURATION=30m STRESS_TARGET_RATE=5000 ./scripts/stress-test.sh sustained

Pass/fail thresholds

ThresholdDefault limit
Throughput drift≤ 5% drop from calibrated operating point
p99 latency≤ 2× warmup baseline
p999 latency≤ 5× warmup baseline
Memory growth≤ 20% RSS increase from warmup to end
Checkpoint duration≤ 2× warmup baseline

Reports are written as JSON to target/stress-reports/ and uploaded as a CI artifact when run via the stress.yml GitHub Actions workflow.