Archiving Large Postgres Tables
Mooncake's columnstore writes data as Parquet to object storage, providing cost-effective, durable history while Postgres only maintains hot partitions. This approach is ideal for time-partitioned tables where you need to retain historical data while keeping your operational database lean.
Architecture Pattern
The archiving pattern follows these key principles:
- Create a columnstore mirror of a time-partitioned Postgres table
- Periodically truncate or drop old partitions from the rowstore
- Retain complete history in the columnstore while the rowstore only holds recent data
This gives you the best of both worlds: fast operational queries on recent data and cheap, queryable history for analytics.
Implementation Example
Below is a complete example using an INT
range as a stand-in for time-based partitioning.
Step 1: Create a Partitioned Rowstore Table
CREATE TABLE transactions (
id INT PRIMARY KEY,
amount DECIMAL(10,2),
user_id INT,
created_at TIMESTAMP
) PARTITION BY RANGE (id);
Step 2: Create a Columnstore Mirror
CALL mooncake.create_table('transactions_archive', 'transactions');
Step 3: Add Partition and Insert Data
-- Create first partition
CREATE TABLE transactions_p1 PARTITION OF transactions
FOR VALUES FROM (0) TO (50);
-- Insert sample data
INSERT INTO transactions_p1 VALUES
(1, 99.99, 101, '2024-01-01 10:00:00'),
(2, 150.50, 102, '2024-01-01 11:00:00'),
(3, 75.25, 103, '2024-01-01 12:00:00');
Step 4: Archive Old Partition
When you're ready to archive old data:
-- Drop the old partition from rowstore
DROP TABLE transactions_p1;
Step 5: Query Historical Data
-- Query the columnstore (history is retained)
SELECT * FROM transactions_archive;
Step 6: Continue with New Partitions
-- Add new partition for recent data
CREATE TABLE transactions_p2 PARTITION OF transactions
FOR VALUES FROM (50) TO (100);
-- Insert new data
INSERT INTO transactions_p2 VALUES
(51, 200.00, 104, '2024-01-02 10:00:00'),
(52, 89.99, 105, '2024-01-02 11:00:00'),
(53, 125.75, 106, '2024-01-02 12:00:00');
-- Query columnstore returns all historical data
SELECT * FROM transactions_archive; -- Returns all six rows
Best Practices
Data Consistency
- Never write to dropped partitions: After dropping or truncating a rowstore partition, avoid writing to it again to prevent history conflicts
- Use partitioned tables: The columnstore safely ignores truncates on rowstore partitioned tables
Common Use Cases
- Transaction logs: Archive old transactions while keeping recent ones in fast storage
- Event data: Store historical events for analytics while maintaining operational performance
- Audit trails: Retain complete audit history in queryable format
- Time series data: Archive old metrics and sensor data