The Postgres Lakehouse
The Lakehouse architecture allows you to store your data in open table formats on object storage. A key benefit is vendor independence—you can read these tables with any tool of your choice (Snowflake, Spark, DuckDB, etc).
Traditionally, managing and writing data to a lakehouse has required specialized data engineering tools like Spark, EMR, Kafka, and Flink.
pg_mooncake simplifies this: build and manage your data lakehouse using only PostgreSQL. No complex data engineering stack required.
Write to Iceberg with Postgres
pg_mooncake allows you to write Iceberg tables that are fully-managed / optmiized.
- Writes Iceberg v3 tables with deletion vectors
- Handles schema evolution automatically
- Efficiently processes updates/deletes without destroying Iceberg state
- Requires no manual maintenance / compaction (no small file issues).
- No external catalog required. All metadata stored in Postgres.
It replaces complex Debezium + Kafka setups, and the need for mananging complex catalogs ec.
Create a columnstore table that automatically exports to Iceberg:
-- Create columnstore table (automatically generates Iceberg)
CALL mooncake.create_table('orders_iceberg', 'orders');
-- Your data is now available as optimized Iceberg tables
SELECT * FROM mooncake.columnstore_tables
WHERE table_name = 'orders_iceberg';
Schema changes and data modifications are handled automatically:
-- Schema changes are handled automatically
ALTER TABLE orders ADD COLUMN priority TEXT;
-- Iceberg table schema updates automatically
-- Updates and deletes use Iceberg v3 deletion vectors for efficiency
UPDATE orders SET priority = 'high' WHERE total_amount > 1000;
DELETE FROM orders WHERE order_date < '2023-01-01';
-- Iceberg state preserved with optimal performance
Read Iceberg Tables
Your Iceberg tables are immediately available to query from multiple analytics engines:
- Postgres with pg_mooncake - Query directly from PostgreSQL
- DuckDB - High-performance analytics
- Any Iceberg compatible tool - Spark, Trino, etc.
-- Query from Postgres with pg_mooncake
SELECT * FROM orders_iceberg WHERE order_date >= '2024-01-01';
-- Query from DuckDB
SELECT * FROM iceberg_scan('s3://bucket/iceberg/orders_iceberg');
-- Query from Spark
SELECT customer_id, COUNT(*) as order_count
FROM iceberg.default.orders_iceberg
GROUP BY customer_id;