CSV extension

The csv extension enables reading of CSV files.

This extension is included by default in the CLI, Python, and WebAssembly clients for GlareDB. A csv schema will be created automatically containing all CSV related functions.

Reading CSV files

The read_csv table function can be used to read a CSV file:

SELECT * FROM read_csv('path/to/file.csv');

The read_csv file can also be used to read TSV (Tab-Separated Values) files as well:

SELECT * FROM read_csv('path/to/file.tsv');

When reading a CSV or TSV file, the "dialect" used will be inferred. This includes inferring the delimiter (comma for CSV, tab for TSV) as well as if the file contains a header. Once the dialect has been determined, the types for each record will then be inferred.

DESCRIBE can be used to determine the types that were inferred for each column:

DESCRIBE read_csv('path/to/file.csv');

read_csv is an alias for the namespaced csv.read function, and can be used interchangeably:

SELECT * FROM csv.read('path/to/file.csv');

If your CSV files ends with .csv or .tsv, the function call can be omitted. The function to use will be inferred automatically:

SELECT * FROM 'path/to/file.csv';

Multiple files can be provided using either a list of files, or a glob. All files are currently expected to have the same schema.

To read a specific set of files:

SELECT * FROM read_csv(['file1.csv', 'file2.csv']);

To read all CSV files in data/:

SELECT * FROM read_csv('data/*.csv');

If the glob ends with .csv or .tsv, the function call can be omitted:

SELECT * FROM 'data/*.csv';

Function reference

All CSV table functions are located in the csv schema. A complete list can be found using list_functions:

SELECT *
FROM list_functions()
WHERE schema_name = 'csv';

`csv.read`

Alias: read_csv, csv_scan, csv.scan.

The read_csv function takes a path to a CSV file and returns a table containing the parsed data.

SELECT * FROM read_csv('userdata1.csv');

By default, read_csv will:

Automatically infer column data types
Detect and skip the header row (if present)

Additional parameters can be provided for other file systems. For example, we can provide AWS credentials for accessing a CSV file in a private S3 bucket:

SELECT * FROM csv.read('s3://bucket-name/path/to/file.csv',
                       region='us-east-1',
                       access_key_id='YOUR_ACCESS_KEY',
                       secret_access_key='YOUR_SECRET_KEY');

Note: Support for custom parsing options, including explicit type definitions and header row overrides, will be available soon.