Data Warehousing
pg_mooncake transforms PostgreSQL into a high-performance data warehouse. This integration eliminates the need for separate systems, reducing both complexity and cost while delivering fast analytical query performance.
Periodic Data Loading from Rowstore to Columnstore​
A common data warehousing pattern is to periodically transform and load data from operational (rowstore) tables into analytical (columnstore) tables. This approach gives you the best of both worlds: fast OLTP operations on rowstore tables and high-performance analytics on columnstore tables.
Example: Daily Aggregation Pipeline​
This example demonstrates how to create a daily aggregation pipeline that:
- Joins multiple rowstore tables
- Transforms and aggregates the data
- Loads the results into a columnstore table
-- Create columnstore table for aggregated data
CREATE TABLE daily_sales_summary (
date DATE,
product_id INT,
category_id INT,
region_id INT,
total_quantity INT,
total_revenue DECIMAL(18,2)
) USING columnstore;
-- Function to load daily aggregated data
CREATE OR REPLACE FUNCTION load_daily_sales_summary(target_date DATE)
RETURNS void AS $$
BEGIN
INSERT INTO daily_sales_summary
SELECT
target_date AS date,
p.product_id,
p.category_id,
s.region_id,
SUM(o.quantity) AS total_quantity,
SUM(o.quantity * o.unit_price) AS total_revenue
FROM
orders o
JOIN products p ON o.product_id = p.product_id
JOIN stores s ON o.store_id = s.store_id
WHERE
o.order_date = target_date
GROUP BY
p.product_id, p.category_id, s.region_id;
END;
$$ LANGUAGE plpgsql;
-- Schedule with pg_cron
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'daily-sales-load',
'0 0 * * *',
$$SELECT load_daily_sales_summary(CURRENT_DATE - INTERVAL '1 day')$$
);
Integration with dbt​
dbt (data build tool) is a popular transformation framework that allows you to build modular, version-controlled data transformations. pg_mooncake works seamlessly with dbt, allowing you to define models that use columnstore tables.
Setting Up dbt with pg_mooncake​
- Configure your dbt project to connect to PostgreSQL
In your profiles.yml
file:
my_project:
target: dev
outputs:
dev:
type: postgres
host: [your-host]
user: [your-user]
password: [your-password]
port: 5432
dbname: [your-database]
schema: [your-schema]
threads: 4
Setting Default Table Access Method​
You can set the default table access method to columnstore for your dbt session by adding an on-run-start hook in your dbt project:
{% macro on_run_start() %}
{% set query %}
SET default_table_access_method = 'columnstore';
{% endset %}
{% do run_query(query) %}
{% endmacro %}
This way, all tables created by dbt will use columnstore by default.
Integration with BI Tools & Semantic Layers​
All pg_mooncake columnstore tables are accessible to BI tools and semantic layers as standard PostgreSQL tables. Any analytics tool with PostgreSQL connectivity can seamlessly query these columnstore tables without additional configuration.
This compatibility ensures your existing reporting infrastructure continues to work while benefiting from the performance advantages of columnar storage.