Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Per-Record Tracing

Volley can trace individual records as they flow through a pipeline, creating OpenTelemetry spans at each processing stage. This gives you a complete picture of what happened to a record: which operators processed it, how long each step took, what the input and output looked like, and whether any errors occurred.

Quick Start

#![allow(unused)]
fn main() {
use volley_core::prelude::*;
use volley_core::observability::{TracingConfig, SamplingStrategy};

let results = StreamExecutionEnvironment::new()
    .from_source(my_source)
    .with_tracing(TracingConfig {
        sampling: SamplingStrategy::Ratio(0.01),
        ..Default::default()
    })
    .map(|record| {
        // Your transformation
        Ok(record)
    })
    .to_sink(my_sink)
    .execute("traced-pipeline")
    .await?;
}

Open Jaeger or Grafana Tempo to see traces. Each sampled record appears as a trace with child spans for source, operators, and sink.

How It Works

  1. Sampling decision happens at the source. Based on your SamplingStrategy, the runtime decides whether to trace each record.
  2. Root span is created for sampled records (volley.record). The span’s context is attached to the StreamRecord.trace field.
  3. Child spans are created automatically at each pipeline stage (operators, sink). No code changes needed in your operators.
  4. Payload previews (optional) capture a JSON snapshot of the first row at each operator’s input and output, size-capped to max_payload_bytes.
  5. Cleanup happens at pipeline exit. Any root spans for records that were filtered or lost to errors are ended.

Sampling

At production throughput (thousands of records per second), tracing every record would overwhelm your tracing backend. Use Ratio sampling:

ThroughputSuggested ratioTraces/sec
1K rec/s0.10 (10%)~100
10K rec/s0.01 (1%)~100
100K rec/s0.001 (0.1%)~100

Use Always only during development or debugging.

Window and Aggregate Traces

Window and aggregate operators accumulate input records into new output records. The output gets its own trace, linked back to the input traces via OpenTelemetry span links. This preserves lineage without creating misleading parent-child relationships.

The volley.window.input_trace_count attribute tells you how many sampled records contributed to each aggregation result.

Kafka Trace Propagation

When using Kafka connectors, trace context propagates automatically via W3C traceparent headers:

  • Kafka source extracts traceparent from message headers
  • Kafka sink injects traceparent into produced messages

A traced request in an upstream service continues its trace through Volley and into downstream consumers. The runtime respects upstream sampling decisions (parent-based sampling).

Configuration Reference

FieldTypeDefaultDescription
samplingSamplingStrategyNeverWhich records to trace
max_payload_bytesusize1024Max JSON preview size per span event
capture_payloadbooltrueWhether to attach input/output previews

Viewing Traces

Volley emits traces via the OTLP gRPC exporter configured in ObservabilityConfig. Point it at your collector:

#![allow(unused)]
fn main() {
.with_observability(ObservabilityConfig::new()
    .with_node_id("node-1")
    .with_otlp_endpoint("http://otel-collector:4317"))
.with_tracing(TracingConfig {
    sampling: SamplingStrategy::Ratio(0.01),
    ..Default::default()
})
}

Compatible backends: Jaeger, Grafana Tempo, Datadog, Honeycomb, or any OTLP-compatible collector.

Error Handling

When a record fails during processing (e.g., a transformation returns an error), the record’s trace span is:

  1. Marked with otel.status_code = ERROR
  2. Annotated with the error message via span.record_error()
  3. Ended at the point of failure

Failed records are visible in your tracing backend by filtering on otel.status_code = ERROR. This makes tracing a useful debugging tool even when records are dropped or filtered.

Performance Impact

Tracing adds minimal overhead when sampling is configured appropriately:

SamplingOverheadNotes
Never~0%No spans created; trace context still propagated for Kafka
Ratio(0.001)< 1%Recommended for high-throughput production
Ratio(0.01)~1-2%Good for staging or moderate-throughput production
Always~5-10%Development and debugging only

The overhead comes from span creation and OTLP export, not from sampling decisions (which are a single random number comparison).

Troubleshooting

No traces appearing in the backend

  • Verify with_observability() is configured with the correct otlp_endpoint
  • Check that the OTLP collector is reachable from your application
  • Confirm sampling is not set to Never (the default)
  • Check collector logs for rejected spans (often caused by resource limits)

Traces appear but are missing operators

  • Ensure with_tracing() is called before operators are added to the pipeline
  • Custom operators using apply_operator() create spans automatically; no manual instrumentation needed

Kafka traces not connecting to upstream services

  • The upstream producer must inject W3C traceparent headers into Kafka message headers
  • If the upstream doesn’t send traceparent, Volley creates a new root trace (no parent linkage)