Skip to main content
Version: 0.2

The Postgres Lakehouse

The Lakehouse architecture allows you to store your data in open table formats on object storage. A key benefit is vendor independence—you can read these tables with any tool of your choice (Snowflake, Spark, DuckDB, etc).

Traditionally, managing and writing data to a lakehouse has required specialized data engineering tools like Spark, EMR, Kafka, and Flink.

pg_mooncake simplifies this: build and manage your data lakehouse using only PostgreSQL. No complex data engineering stack required.

Write to Iceberg with Postgres

pg_mooncake allows you to write Iceberg tables that are fully-managed / optmiized.

  • Writes Iceberg v3 tables with deletion vectors
  • Handles schema evolution automatically
  • Efficiently processes updates/deletes without destroying Iceberg state
  • Requires no manual maintenance / compaction (no small file issues).
  • No external catalog required. All metadata stored in Postgres.

It replaces complex Debezium + Kafka setups, and the need for mananging complex catalogs ec.

Create a columnstore table that automatically exports to Iceberg:

-- Create columnstore table (automatically generates Iceberg)
CALL mooncake.create_table('orders_iceberg', 'orders');

-- Your data is now available as optimized Iceberg tables
SELECT * FROM mooncake.columnstore_tables
WHERE table_name = 'orders_iceberg';

Schema changes and data modifications are handled automatically:

-- Schema changes are handled automatically
ALTER TABLE orders ADD COLUMN priority TEXT;
-- Iceberg table schema updates automatically

-- Updates and deletes use Iceberg v3 deletion vectors for efficiency
UPDATE orders SET priority = 'high' WHERE total_amount > 1000;
DELETE FROM orders WHERE order_date < '2023-01-01';
-- Iceberg state preserved with optimal performance

Read Iceberg Tables

Your Iceberg tables are immediately available to query from multiple analytics engines:

  • Postgres with pg_mooncake - Query directly from PostgreSQL
  • DuckDB - High-performance analytics
  • Any Iceberg compatible tool - Spark, Trino, etc.
-- Query from Postgres with pg_mooncake
SELECT * FROM orders_iceberg WHERE order_date >= '2024-01-01';

-- Query from DuckDB
SELECT * FROM iceberg_scan('s3://bucket/iceberg/orders_iceberg');

-- Query from Spark
SELECT customer_id, COUNT(*) as order_count
FROM iceberg.default.orders_iceberg
GROUP BY customer_id;