The S3 file system allows GlareDB to read data directly from Amazon S3 buckets. This makes it easy to query data stored in cloud-based data lakes without needing to download files manually.
The S3 file system is enabled by default for the CLI, Python bindings, and WebAssembly bindings.
To read from an S3 bucket, use any supported file-reading function with an
s3://
URI:
SELECT * FROM read_csv('s3://my-bucket/cities.csv');
This reads the cities.csv
file from the my-bucket
bucket.
You can also query S3 files directly by specifying the S3 URI in the FROM clause:
SELECT * FROM 's3://my-bucket/cities.csv';
When accessing private S3 buckets, you can provide AWS credentials directly as options:
SELECT * FROM read_csv(
's3://my-private-bucket/secure.csv',
access_key_id = 'YOUR_ACCESS_KEY',
secret_access_key = 'YOUR_SECRET_KEY'
);
Option | Description |
---|---|
access_key_id | AWS Access Key ID |
secret_access_key | AWS Secret Access Key |
region | AWS regions where the bucket is located, defaults to "us-east-1" if omitted |
All supported file formats, such as CSV and Parquet, can be read using the S3
file system using their respective functions (e.g. read_csv
).
When accessing S3 buckets from GlareDB running in a browser using WebAssembly, you need to configure Cross-Origin Resource Sharing (CORS) for your S3 bucket to allow browser-based requests.
To configure CORS for your S3 bucket:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"HEAD"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": []
}
]
This configuration allows GET and HEAD requests from any origin, which is required for GlareDB's WebAssembly bindings to access your S3 bucket.
For more information about CORS configuration for S3, see the AWS documentation.