Version: 0.1

Querying columnstore outside of Postgres

A major benefit of pg_mooncake is its use of open formats (Parquet, Iceberg, Delta) that can be accessed by tools beyond PostgreSQL. This enables hybrid workflows combining PostgreSQL with other data processing systems.

Inspecting Table Metadata

Finding Table Locations

To find where your columnstore tables are physically stored, query the mooncake.columnstore_tables catalog table:

SELECT * FROM mooncake.columnstore_tables;

Example output:

table_name         |                                          path                                          
-------------------+----------------------------------------------------------------------------------------
 users_columnstore | /usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_columnstore_16446/
 users_log         | /usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_log_16455/
 events            | s3://my-bucket/mooncake_postgres_events_16459/

The path column shows either a local filesystem path or a cloud storage path (S3, GCS, etc.) where the table data is stored. In this directory, you'll find a Delta Lake table with complete metadata that includes transaction logs and Parquet data files.

Viewing Data Files

To inspect the individual Parquet files for a specific table, use the mooncake.data_files view:

SELECT * FROM mooncake.data_files;

Example output:

#	oid	file_name	file_metadata
1	32822	0ffb9a71-89a5-43d7-a55b-e04db56d680b.parquet	\x1502192c3500180d6475636b64625f

Accessing Data with Python

Using Polars

Polars is a fast DataFrame library for Python that works well with Delta Lake tables. You can use scan_delta() to directly query your pg_mooncake tables since they're stored in Delta Lake format:

Install Polars:

pip install 'polars'

Create a DataFrame from your table location:

import polars as pl

# For local tables
local_path = '/usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_columnstore_16446/'
df = pl.scan_delta(local_path)

# For S3 tables
s3_path = 's3://my-bucket/mooncake_postgres_events_16459/'
df = pl.scan_delta(s3_path)

Create a DataFrame

import polars as pl
s3_path = 's3://mooncake-test/reddit_comments_final/'
df = pl.scan_delta(s3_path)

Run Python Transformations

df.filter(
    (pl.col("pickup_time") >= '2021-01-01T00:00:00+00:00') & 
    (pl.col("pickup_time") < '2021-02-01T00:00:00+00:00')
).collect()

Inspecting Table Metadata​

Finding Table Locations​

Viewing Data Files​

Accessing Data with Python​

Using Polars​

Inspecting Table Metadata

Finding Table Locations

Viewing Data Files

Accessing Data with Python

Using Polars