Version: 0.2

Note: Moonlink is in preview. Expect changes. Join our Community to stay updated on a stable release!

Moonlink

Moonlink is a Rust library that enables sub-second mirroring (CDC) of Postgres tables into Iceberg. It serves as a drop-in replacement for the Debezium + Kafka + Flink + Spark stack.

Under the hood, it extends Iceberg with a real-time storage engine optimized for low-latency, high-throughput ingestion from update-heavy sources like Postgres logical replication.

The Problem

Traditional CDC pipelines struggle with Iceberg as a destination. Iceberg wasn't designed for high-frequency updates and deletes:

Updates require expensive full-table scans
Even with Iceberg v2's equality deletes, the number of delete files grows rapidly
Small file problem degrades query performance

Moonlink takes a fundamentally different approach by building the first streaming storage engine for Iceberg. Which provides:

Sub-second Ingestion: Handles inserts, updates and deletes with minimal latency
Up-to-date Reads: Provides unified view of streaming buffer and Iceberg state

Architecture

Moonlink converts Iceberg into a real-time columnstore by adding a streaming storage engine on top of it. This consists of:

In-Memory Arrow Buffer: Which accumulates recent inserts/updates in columnar format
Hybrid Index: Maps primary key or row hash → (batch/file, row)
Deletion Log + RoaringBitmap Deletion Vectors: Efficiently tracks deletions in Iceberg PuffinFile

Moonlink Architecture

Hot data lands in Moonlink's streaming storage engine, and is periodically flushed into Iceberg. This architecture is inspired by real-time streaming storage systems like Fluss.

Write Path

Raw Inserts

Insert records are accumulated in in-memory Arrow buffers
When size/time thresholds are crossed, batches are serialized to Parquet files

Raw Deletions

Deletion records are converted to positional deletion logs using the Moonlink index
Deletion logs are periodically flushed to Deletion Vectors for efficiency

Read Path

Moonlink provides a unified read view that combines in-memory state with Iceberg files for real-time data access.

If you don't require an 'up-to-date' view of the table, you can query the Iceberg table directly.

Note: Moonlink writes Iceberg tables with deletion vectors (Iceberg v3). Not all engines support this just yet.

How does Moonlink work within pg_mooncake?

pg_mooncake runs moonlink as a Background Worker Process that manages tables, handles CDC ingestion, and processes read requests.

Postgres backends communicate with moonlink via unix streams:

DDL commands: Create/drop table and snapshot creation
- Commands are forwarded to moonlink and execution waits for completion
Read queries:
- Request current Read State (Parquet files, arrow batches, deletion vectors)

The Problem​

Architecture​

Write Path​

Read Path​

How does Moonlink work within pg_mooncake?​

The Problem

Architecture

Write Path

Read Path

How does Moonlink work within pg_mooncake?