S3 File System

The S3 file system allows GlareDB to read data directly from Amazon S3 buckets. This makes it easy to query data stored in cloud-based data lakes without needing to download files manually.

The S3 file system is enabled by default for the CLI, Python bindings, and WebAssembly bindings.

Usage

To read from an S3 bucket, use any supported file-reading function with an s3:// URI:

SELECT * FROM read_csv('s3://my-bucket/cities.csv');

This reads the cities.csv file from the my-bucket bucket.

You can also query S3 files directly by specifying the S3 URI in the FROM clause:

SELECT * FROM 's3://my-bucket/cities.csv';

Optional Parameters

When accessing private S3 buckets, you can provide AWS credentials directly as options:

SELECT * FROM read_csv(
  's3://my-private-bucket/secure.csv',
  access_key_id = 'YOUR_ACCESS_KEY',
  secret_access_key = 'YOUR_SECRET_KEY'
);

Supported Options

Option	Description
`access_key_id`	AWS Access Key ID
`secret_access_key`	AWS Secret Access Key
`region`	AWS regions where the bucket is located, defaults to "us-east-1" if omitted

Supported Formats

All supported file formats, such as CSV and Parquet, can be read using the S3 file system using their respective functions (e.g. read_csv).

CORS Configuration for WebAssembly

When accessing S3 buckets from GlareDB running in a browser using WebAssembly, you need to configure Cross-Origin Resource Sharing (CORS) for your S3 bucket to allow browser-based requests.

To configure CORS for your S3 bucket:

Navigate to your S3 bucket in the AWS Management Console
Go to Permissions -> Cross-origin resource sharing (CORS)
Add the following JSON configuration:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": []
    }
]

This configuration allows GET and HEAD requests from any origin, which is required for GlareDB's WebAssembly bindings to access your S3 bucket.

For more information about CORS configuration for S3, see the AWS documentation.

Notes

Public files can be accessed without authentication.
Access keys and secrets must be provided when querying non-public buckets. Session-level configuration for keys will be added in the future.