The csv
extension enables reading of CSV files.
This extension is included by default in the CLI, Python, and WebAssembly
clients for GlareDB. A csv
schema will be created automatically containing all
CSV related functions.
The read_csv
table function can be used to read a CSV file:
SELECT * FROM read_csv('path/to/file.csv');
The read_csv
file can also be used to read TSV (Tab-Separated Values) files as
well:
SELECT * FROM read_csv('path/to/file.tsv');
When reading a CSV or TSV file, the "dialect" used will be inferred. This includes inferring the delimiter (comma for CSV, tab for TSV) as well as if the file contains a header. Once the dialect has been determined, the types for each record will then be inferred.
DESCRIBE can be used to determine the types that were inferred for each column:
DESCRIBE read_csv('path/to/file.csv');
read_csv
is an alias for the namespaced csv.read
function, and can be used
interchangeably:
SELECT * FROM csv.read('path/to/file.csv');
If your CSV files ends with .csv
or .tsv
, the function call can be omitted.
The function to use will be inferred automatically:
SELECT * FROM 'path/to/file.csv';
Multiple files can be provided using either a list of files, or a glob. All files are currently expected to have the same schema.
To read a specific set of files:
SELECT * FROM read_csv(['file1.csv', 'file2.csv']);
To read all CSV files in data/
:
SELECT * FROM read_csv('data/*.csv');
If the glob ends with .csv
or .tsv
, the function call can be omitted:
SELECT * FROM 'data/*.csv';
All CSV table functions are located in the csv
schema. A complete list can be
found using list_functions
:
SELECT *
FROM list_functions()
WHERE schema_name = 'csv';
csv.read
Alias: read_csv
, csv_scan
, csv.scan
.
The read_csv
function takes a path to a CSV file and returns a table
containing the parsed data.
SELECT * FROM read_csv('userdata1.csv');
By default, read_csv
will:
Additional parameters can be provided for other file systems. For example, we can provide AWS credentials for accessing a CSV file in a private S3 bucket:
SELECT * FROM csv.read('s3://bucket-name/path/to/file.csv',
region='us-east-1',
access_key_id='YOUR_ACCESS_KEY',
secret_access_key='YOUR_SECRET_KEY');
Note: Support for custom parsing options, including explicit type definitions and header row overrides, will be available soon.