Iceberg extension

The iceberg extension enables interacting with Apache Iceberg tables and catalogs.

This extension is included by default in the CLI, Python, and WebAssembly clients for GlareDB. An iceberg schema will be created automatically containing all Iceberg related functions.

The iceberg extension is under heavy development, and functionality may be missing or incomplete.

Function reference

All Iceberg table functions accept a table path as the first argument. The table path should be a directory containing metadata/ and data/ sub-directories.

An additional version parameter may be passed to all table functions specifying the version of the table to read. This version is used to determine which metadata JSON to load.

For example, if we wanted to query metadata of version "00004" of an Iceberg table, we can specify the version in the function call:

SELECT *
FROM iceberg.metadata('path/to/table', version = '00004')

This will attempt to load the metadata for version "00004". Note that metadata JSON files typically have a UUID in the file name as well -- the UUID does not need to be provided in the version string.

If version is not provided, the latest version will be read by attempting to list all metadata JSON files in the metadata/ directory and find the file with the lexicographically greatest name.

iceberg.metadata

The iceberg.metadata function takes a table path and returns information from the table's metadata.

SELECT *
FROM iceberg.metadata('wh/default.db/cities')
ColumnDescription
format_versionThe Iceberg format version of the table.
table_uuidThe UUID identifier for the table.
locationBase location of the table.

iceberg.snapshots

The iceberg.snapshots function takes a table path and returns information about valid snapshots for this version of the table.

SELECT *
FROM iceberg.snapshots('wh/default.db/cities')
ColumnDescription
snapshot_idA unique ID for the snapshot.
sequence_numberNumber indicating the order of changes to this table.
manifest_listLocation of the manifest list for this table.

iceberg.manifest_list

The iceberg.manifest_list function takes a table path and returns information about the manifest list for the current snapshot.

SELECT *
FROM iceberg.manifest_list('wh/default.db/cities')
ColumnDescription
manifest_pathLocation of a manifest file.
manifest_lengthLength in bytes of the manifest file.
contentType of files the manifest is for, 'data' or 'deletes'.
sequence_numberSequence number when manifest was added.

iceberg.data_files

The iceberg.data_files function takes a table path and returns information about the data files for the current snapshot.

SELECT *
FROM iceberg.data_files('wh/default.db/cities')
ColumnDescription
status'EXISTING', 'ADDED', or 'DELETED'
contentType of content in the data file, 'DATA', 'POSITION DELETES', or 'EQUALITY DELETES'
file_pathFull URI for the file.
file_formatFormat of the file.
record_countNumber of records in the file.