Per-Record Tracing
Volley can trace individual records as they flow through a pipeline, creating OpenTelemetry spans at each processing stage. This gives you a complete picture of what happened to a record: which operators processed it, how long each step took, what the input and output looked like, and whether any errors occurred.
Quick Start
#![allow(unused)]
fn main() {
use volley_core::prelude::*;
use volley_core::observability::{TracingConfig, SamplingStrategy};
let results = StreamExecutionEnvironment::new()
.from_source(my_source)
.with_tracing(TracingConfig {
sampling: SamplingStrategy::Ratio(0.01),
..Default::default()
})
.map(|record| {
// Your transformation
Ok(record)
})
.to_sink(my_sink)
.execute("traced-pipeline")
.await?;
}
Open Jaeger or Grafana Tempo to see traces. Each sampled record appears as a trace with child spans for source, operators, and sink.
How It Works
- Sampling decision happens at the source. Based on your
SamplingStrategy, the runtime decides whether to trace each record. - Root span is created for sampled records (
volley.record). The span’s context is attached to theStreamRecord.tracefield. - Child spans are created automatically at each pipeline stage (operators, sink). No code changes needed in your operators.
- Payload previews (optional) capture a JSON snapshot of the first row at each operator’s input and output, size-capped to
max_payload_bytes. - Cleanup happens at pipeline exit. Any root spans for records that were filtered or lost to errors are ended.
Sampling
At production throughput (thousands of records per second), tracing every record would overwhelm your tracing backend. Use Ratio sampling:
| Throughput | Suggested ratio | Traces/sec |
|---|---|---|
| 1K rec/s | 0.10 (10%) | ~100 |
| 10K rec/s | 0.01 (1%) | ~100 |
| 100K rec/s | 0.001 (0.1%) | ~100 |
Use Always only during development or debugging.
Window and Aggregate Traces
Window and aggregate operators accumulate input records into new output records. The output gets its own trace, linked back to the input traces via OpenTelemetry span links. This preserves lineage without creating misleading parent-child relationships.
The volley.window.input_trace_count attribute tells you how many sampled records contributed to each aggregation result.
Kafka Trace Propagation
When using Kafka connectors, trace context propagates automatically via W3C traceparent headers:
- Kafka source extracts
traceparentfrom message headers - Kafka sink injects
traceparentinto produced messages
A traced request in an upstream service continues its trace through Volley and into downstream consumers. The runtime respects upstream sampling decisions (parent-based sampling).
Configuration Reference
| Field | Type | Default | Description |
|---|---|---|---|
sampling | SamplingStrategy | Never | Which records to trace |
max_payload_bytes | usize | 1024 | Max JSON preview size per span event |
capture_payload | bool | true | Whether to attach input/output previews |
Viewing Traces
Volley emits traces via the OTLP gRPC exporter configured in ObservabilityConfig. Point it at your collector:
#![allow(unused)]
fn main() {
.with_observability(ObservabilityConfig::new()
.with_node_id("node-1")
.with_otlp_endpoint("http://otel-collector:4317"))
.with_tracing(TracingConfig {
sampling: SamplingStrategy::Ratio(0.01),
..Default::default()
})
}
Compatible backends: Jaeger, Grafana Tempo, Datadog, Honeycomb, or any OTLP-compatible collector.
Error Handling
When a record fails during processing (e.g., a transformation returns an error), the record’s trace span is:
- Marked with
otel.status_code = ERROR - Annotated with the error message via
span.record_error() - Ended at the point of failure
Failed records are visible in your tracing backend by filtering on otel.status_code = ERROR. This makes tracing a useful debugging tool even when records are dropped or filtered.
Performance Impact
Tracing adds minimal overhead when sampling is configured appropriately:
| Sampling | Overhead | Notes |
|---|---|---|
Never | ~0% | No spans created; trace context still propagated for Kafka |
Ratio(0.001) | < 1% | Recommended for high-throughput production |
Ratio(0.01) | ~1-2% | Good for staging or moderate-throughput production |
Always | ~5-10% | Development and debugging only |
The overhead comes from span creation and OTLP export, not from sampling decisions (which are a single random number comparison).
Troubleshooting
No traces appearing in the backend
- Verify
with_observability()is configured with the correctotlp_endpoint - Check that the OTLP collector is reachable from your application
- Confirm
samplingis not set toNever(the default) - Check collector logs for rejected spans (often caused by resource limits)
Traces appear but are missing operators
- Ensure
with_tracing()is called before operators are added to the pipeline - Custom operators using
apply_operator()create spans automatically; no manual instrumentation needed
Kafka traces not connecting to upstream services
- The upstream producer must inject W3C
traceparentheaders into Kafka message headers - If the upstream doesn’t send
traceparent, Volley creates a new root trace (no parent linkage)