Skip to main content
Version: 0.2

Archiving Large Postgres Tables

Mooncake's columnstore writes data as Parquet to object storage, providing cost-effective, durable history while Postgres only maintains hot partitions. This approach is ideal for time-partitioned tables where you need to retain historical data while keeping your operational database lean.

Architecture Pattern

The archiving pattern follows these key principles:

  1. Create a columnstore mirror of a time-partitioned Postgres table
  2. Periodically truncate or drop old partitions from the rowstore
  3. Retain complete history in the columnstore while the rowstore only holds recent data

This gives you the best of both worlds: fast operational queries on recent data and cheap, queryable history for analytics.

Implementation Example

Below is a complete example using an INT range as a stand-in for time-based partitioning.

Step 1: Create a Partitioned Rowstore Table

CREATE TABLE transactions (
id INT PRIMARY KEY,
amount DECIMAL(10,2),
user_id INT,
created_at TIMESTAMP
) PARTITION BY RANGE (id);

Step 2: Create a Columnstore Mirror

CALL mooncake.create_table('transactions_archive', 'transactions');

Step 3: Add Partition and Insert Data

-- Create first partition
CREATE TABLE transactions_p1 PARTITION OF transactions
FOR VALUES FROM (0) TO (50);

-- Insert sample data
INSERT INTO transactions_p1 VALUES
(1, 99.99, 101, '2024-01-01 10:00:00'),
(2, 150.50, 102, '2024-01-01 11:00:00'),
(3, 75.25, 103, '2024-01-01 12:00:00');

Step 4: Archive Old Partition

When you're ready to archive old data:

-- Drop the old partition from rowstore
DROP TABLE transactions_p1;

Step 5: Query Historical Data

-- Query the columnstore (history is retained)
SELECT * FROM transactions_archive;

Step 6: Continue with New Partitions

-- Add new partition for recent data
CREATE TABLE transactions_p2 PARTITION OF transactions
FOR VALUES FROM (50) TO (100);

-- Insert new data
INSERT INTO transactions_p2 VALUES
(51, 200.00, 104, '2024-01-02 10:00:00'),
(52, 89.99, 105, '2024-01-02 11:00:00'),
(53, 125.75, 106, '2024-01-02 12:00:00');

-- Query columnstore returns all historical data
SELECT * FROM transactions_archive; -- Returns all six rows

Best Practices

Data Consistency

  • Never write to dropped partitions: After dropping or truncating a rowstore partition, avoid writing to it again to prevent history conflicts
  • Use partitioned tables: The columnstore safely ignores truncates on rowstore partitioned tables

Common Use Cases

  • Transaction logs: Archive old transactions while keeping recent ones in fast storage
  • Event data: Store historical events for analytics while maintaining operational performance
  • Audit trails: Retain complete audit history in queryable format
  • Time series data: Archive old metrics and sensor data