Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Writing Connectors

Connectors are how Volley reads from and writes to external systems. Volley provides built-in connectors for Kafka, cloud blob stores, Delta Lake, and Iceberg. You can also write your own.

Source and Sink Traits

#![allow(unused)]
fn main() {
// Source: produces StreamRecords
trait Source: Send {
    async fn poll(&mut self) -> Option<StreamRecord>;
    fn connector_id(&self) -> &str;
}

// Sink: consumes StreamRecords
trait Sink: Send {
    async fn write(&mut self, record: StreamRecord) -> Result<()>;
    async fn flush(&mut self) -> Result<()>;
    fn on_checkpoint(&mut self, epoch: u64);
    fn connector_id(&self) -> &str;
}
}

Sources produce StreamRecord values via poll(). Sinks consume them via write() and flush(). Both support the checkpoint lifecycle via on_checkpoint().

Cloud Blob Store Pattern

All cloud blob store connectors follow a shared architecture:

Cloud Queue/Topic          Cloud Storage
(SQS / Azure Queue /      (S3 / Azure Blob /
 Pub/Sub)                   GCS)
      |                        |
      v                        v
NotificationSource         BlobReader
      |                        |
      +--------+   +-----------+
               |   |
               v   v
          BlobSource (volley-connector-blob-store)
               |
               v
          Decoder (Parquet / JSON / CSV / Avro)
               |
               v
          RecordBatch --> Pipeline

To add a new cloud backend, implement two traits:

  • NotificationSource — polls a queue for new object notifications
  • BlobReader — reads object data from cloud storage

The decoder layer (Parquet, JSON, CSV, Avro -> Arrow RecordBatch) is shared across all cloud backends.

Feature Flags

The volley-connectors umbrella crate uses feature flags:

FeatureWhat it includes
kafkaKafkaSource, KafkaSink
blob-storeBlobSource, decoders
blob-store-awsS3Source, SqsNotificationSource
blob-store-azureAzureBlobSource, AzureQueueNotificationSource
blob-store-gcsGcsSource, PubSubNotificationSource
file-formatsParquet, JSON, CSV, Avro decoders
table-formatsDelta Lake + Iceberg sinks

Existing Connectors

  • Kafka — source and sink with exactly-once semantics
  • Delta Lake — sink with epoch-tagged commits
  • Iceberg — sink via REST catalog
  • AWS S3 — source via SQS notifications
  • Blob Store — Azure Blob and GCS sources