Foundry SQL is generally a subset of Spark SQL ↗ with ANSI compliance.
CREATE [ OR REPLACE ] [ ( col_name1 col_type1, ... ) ] [ USING table_format ] [ AS select_statement ]
If not specified, the table format will be Iceberg. Valid table formats include:
icebergparquetavroCopied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17-- examples CREATE TABLE `/path/to/table` AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` USING parquet AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` ( id INT, name STRING ); CREATE OR REPLACE TABLE `/path/to/table` (id INT, name STRING);
Iceberg tables support data appends with INSERT, UPDATE, and DELETE statements. Non-Iceberg tables do not currently support table alteration.
INSERT [ INTO ] table_identifier { VALUES ( { value | NULL } [ , ... ] ) [ , ( ... ) ] | query }
Copied!1 2 3 4-- examples INSERT INTO `/path/to/table` VALUES (1, 'foo'), (2, 'bar'); INSERT INTO `/path/to/table` SELECT id, name FROM `/path/to/table2`;
UPDATE table_identifier
SET column_name = value [, ...]
[ WHERE condition ]
Copied!1 2 3 4 5 6 7-- examples UPDATE `/path/to/table` SET col = 'new value'; UPDATE `/path/to/table` SET col = 'new value' WHERE col = 'old value';
DELETE FROM table_identifier WHERE condition
Copied!1DELETE FROM `/path/to/table` WHERE col = 'to delete'
[ WITH with_query [ , ... ] ]
select_statement
[ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] } ]
[ WINDOW { window_name AS ( window_spec ) [, ...] } ]
[ LIMIT { ALL | expression } ]
Where with_query is:
cte_name [ ( column_name [, ...] ) ] AS ( select_statement )
Where select_statement is:
SELECT [ ALL | DISTINCT ] { [ named_expression | regex_column_names | * ] [, ...] }
FROM { from_item [, ...] }
[ WHERE boolean_expression ]
[ GROUP BY expression [ , ... ] ]
[ HAVING boolean_expression ]
[ LIMIT { ALL | expression } [ OFFSET expression ] ]
Where named_expression is:
Expression (with alias)
expression [ AS alias ]
Copied!1 2 3 4-- examples SELECT count(*) AS total_count FROM `/path/to/table`; SELECT name, salary * 1.1 AS adjusted_salary FROM `/path/to/employees`;
Window function
See Window Functions section for available functions.
window_function() OVER ( window_spec )
window_function() OVER window_name
Where window_spec is:
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13-- examples SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) as rank FROM `/path/to/employees`; SELECT category, product_id, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as category_rank FROM `/path/to/products`; -- Using named window SELECT customer_id, order_date, total, ROW_NUMBER() OVER w as order_sequence FROM `/path/to/orders` WINDOW w AS (PARTITION BY customer_id ORDER BY order_date);
Where from_item is:
{ `/path/to/table`[.branch_<branch name>] | `ri.rid.for.table.1234`[.branch_<branch name>] }
Copied!1 2 3 4 5 6 7 8-- examples SELECT * FROM `/path/to/table`; SELECT * FROM `/path/to/table`.branch_master; SELECT * FROM `/path/to/table` WHERE id = 1; SELECT * FROM `ri.rid.for.table.1234`;
relation
{ CROSS | LEFT [ OUTER ] | RIGHT [ OUTER ] | FULL [ OUTER ] }
JOIN relation
ON boolean_expression
Copied!1 2 3 4-- examples SELECT * FROM `/path/to/table` LEFT JOIN `/path/to/table2` ON id = id2; SELECT * FROM `/path/to/table` RIGHT OUTER JOIN `/path/to/table2` ON id = id2;
( select_statement )
Copied!1 2 3 4 5 6-- examples SELECT * FROM (SELECT * FROM `/path/to/table`); SELECT * FROM (SELECT * FROM `/path/to/table`) WHERE id = 1; VALUES (9001, 1001, 1, DATE '2023-06-01', 'credit_card', 1329.98, 'completed', TIMESTAMP '2023-06-01 11:00:00')
Note: Subqueries must be wrapped in parentheses - this is not always required in Spark SQL, but is enforced in Foundry.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41-- examples SELECT DISTINCT o.order_id, c.name, o.total_amount, p.category FROM `/data/orders` o LEFT JOIN `/data/customers` c ON o.customer_id = c.id CROSS JOIN (SELECT product_id, category FROM `/data/products` WHERE category = 'Electronics') p ON o.product_id = p.product_id WHERE o.order_date > DATE '2024-01-01' AND o.total_amount > 1000.0 ORDER BY o.total_amount DESC LIMIT 100; SELECT c.name, COUNT(o.order_id) as order_count, SUM(o.total_amount) as total_spent FROM `/data/customers` c LEFT JOIN `/data/orders` o ON c.id = o.customer_id GROUP BY c.name HAVING COUNT(o.order_id) > 0; SELECT emp_id, name, dept, 'Active' AS status FROM `/hr/employees` e LEFT OUTER JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = true UNION SELECT emp_id, name, dept, 'Inactive' AS status FROM (SELECT * FROM `/hr/employees` WHERE termination_date < current_time()) e RIGHT JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = false ORDER BY dept ASC, status DESC NULLS LAST; WITH sales_summary (cust_id, total_sales, order_count) AS ( SELECT customer_id, SUM(total_amount), COUNT(*) FROM `/data/orders` GROUP BY customer_id ) SELECT * FROM sales_summary WHERE total_sales > 10000;
Foundry SQL supports Iceberg references to query metadata. See the official documentation ↗.
Copied!1 2 3 4-- examples SELECT * FROM `/path/to/table/`.files; SELECT * FROM `/path/to/table/`.snapshots;
Functions are available to transforms data in more complex ways. Refer to the Spark SQL documentation ↗ for detailed information on each function below.
| Function | Description | Example |
|---|---|---|
array(expr, ...) | Create an array with given elements | SELECT array(1, 2, 3) |
array_distinct(array) | Remove duplicate values from array | SELECT array_distinct(tags) FROM table |
array_intersect(array1, array2) | Elements in both arrays without duplicates | SELECT array_intersect(arr1, arr2) FROM table |
array_union(array1, array2) | Elements in either array without duplicates | SELECT array_union(arr1, arr2) FROM table |
array_except(array1, array2) | Elements in array1 but not in array2 | SELECT array_except(arr1, arr2) FROM table |
array_join(array, delimiter [, nullReplacement]) | Concatenate array elements with delimiter | SELECT array_join(tags, ',') FROM table |
array_max(array) | Maximum value in array | SELECT array_max(ARRAY(1, 5, 3)) |
array_min(array) | Minimum value in array | SELECT array_min(ARRAY(1, 5, 3)) |
array_position(array, element) | Position of first occurrence (1-based, 0 if not found) | SELECT array_position(tags, 'item') FROM table |
array_remove(array, element) | Remove all matching elements | SELECT array_remove(tags, 'old') FROM table |
array_size(array) | Number of elements in array | SELECT array_size(tags) FROM table |
array_contains(array, value) | True if array contains value | SELECT array_contains(tags, 'new') FROM table |
array_repeat(element, count) | Create array with element repeated count times | SELECT array_repeat('x', 5) |
arrays_overlap(array1, array2) | True if arrays have common non-null elements | SELECT arrays_overlap(arr1, arr2) FROM table |
flatten(arrayOfArrays) | Flatten nested arrays into single array | SELECT flatten(nested_arrays) FROM table |
sequence(start, stop, step) | Generate sequence of integers | SELECT sequence(1, 10, 2) |
shuffle(array) | Random permutation of array | SELECT shuffle(tags) FROM table |
slice(array, start, length) | Subset of array from start with length | SELECT slice(tags, 1, 2) FROM table |
sort_array(array[, ascendingOrder]) | Sort array in ascending (default) or descending order | SELECT sort_array(tags) FROM table |
get(array, index) | Returns element at given index (0-based) | SELECT get(tags, 0) FROM table |
| Function | Description | Example |
|---|---|---|
map(key1, value1, key2, value2, ...) | Create a map from key-value pairs | SELECT map('a', 1, 'b', 2) |
| Function | Description | Example |
|---|---|---|
struct(expr1, expr2, ...) | Create a struct from expressions | SELECT struct('red', 'large', 2.0) |
| Function | Description | Example |
|---|---|---|
current_timestamp() | Returns current timestamp | SELECT current_timestamp() |
current_date() | Returns current date | SELECT current_date() |
date_add(expr, num_days) | Add days to date | date_add(order_date, 7) |
date_sub(expr, num_days) | Subtract days from date | date_sub(order_date, 7) |
from_unixtime(expr [, format]) | Convert Unix timestamp to timestamp | from_unixtime(1609459200) |
unix_timestamp([expr] [, format]) | Convert timestamp to Unix timestamp | unix_timestamp(current_timestamp()) |
day(date) | Extract day of month from date | SELECT day(order_date) FROM table |
hour(timestamp) | Extract hour from timestamp | SELECT hour(order_timestamp) FROM table |
minute(timestamp) | Extract minute from timestamp | SELECT minute(order_timestamp) FROM table |
month(date) | Extract month from date | SELECT month(order_date) FROM table |
quarter(date) | Extract quarter from date | SELECT quarter(order_date) FROM table |
second(timestamp) | Extract second from timestamp | SELECT second(order_timestamp) FROM table |
year(date) | Extract year from date | SELECT year(order_date) FROM table |
| Function | Description | Example |
|---|---|---|
abs(x) | Absolute value | SELECT abs(-5) |
acos(x) | Arc cosine | SELECT acos(0.5) |
asin(x) | Arc sine | SELECT asin(0.5) |
atan(x) | Arc tangent | SELECT atan(1) |
atan2(y, x) | Arc tangent of y/x | SELECT atan2(1, 1) |
cbrt(x) | Cube root | SELECT cbrt(27) |
ceil(x) | Round up to nearest integer | SELECT ceil(3.7) |
ceiling(x) | Round up to nearest integer | SELECT ceiling(3.7) |
cos(x) | Cosine | SELECT cos(pi()) |
cosh(x) | Hyperbolic cosine | SELECT cosh(0) |
degrees(x) | Convert radians to degrees | SELECT degrees(pi()) |
e() | Euler's number | SELECT e() |
exp(x) | e raised to power x | SELECT exp(1) |
floor(x) | Round down to nearest integer | SELECT floor(3.7) |
ln(x) | Natural logarithm | SELECT ln(e()) |
log(base, x) | Logarithm with specified base | SELECT log(2, 8) |
log10(x) | Base 10 logarithm | SELECT log10(100) |
log2(x) | Base 2 logarithm | SELECT log2(8) |
mod(n, m) | Modulus (remainder) | SELECT mod(10, 3) |
pi() | Pi constant | SELECT pi() |
pow(x, p) | x raised to power p | SELECT pow(2, 3) |
power(x, p) | x raised to power p | SELECT power(2, 3) |
radians(x) | Convert degrees to radians | SELECT radians(180) |
rand() | Random value between 0 and 1 | SELECT rand() |
random() | Random value between 0 and 1 | SELECT random() |
round(x, d) | Round to d decimal places | SELECT round(3.7, 1) |
sign(x) | Sign function (-1, 0, 1) | SELECT sign(-5) |
sin(x) | Sine | SELECT sin(pi() / 2) |
sinh(x) | Hyperbolic sine | SELECT sinh(0) |
sqrt(x) | Square root | SELECT sqrt(16) |
tan(x) | Tangent | SELECT tan(pi() / 4) |
tanh(x) | Hyperbolic tangent | SELECT tanh(0) |
| Function | Description | Example |
|---|---|---|
chr(n) | Returns character from Unicode code point | SELECT chr(65) |
char(n) | Returns character from code point (alias for chr) | SELECT char(65) |
concat_ws(sep, str1, ...) | Concatenate strings with separator | SELECT concat_ws(',', 'a', 'b') |
length(str) | Length of string in characters | SELECT length('hello') |
char_length(str) | Length of string (alias for length) | SELECT char_length('hello') |
character_length(str) | Length of string (alias for length) | SELECT character_length('hello') |
lower(str) | Convert to lowercase | SELECT lower('HELLO') |
upper(str) | Convert to uppercase | SELECT upper('hello') |
lpad(str, len, pad) | Left pad string to length | SELECT lpad('hi', 5, 'x') |
rpad(str, len, pad) | Right pad string to length | SELECT rpad('hi', 5, 'x') |
ltrim(str) | Remove leading whitespace | SELECT ltrim(' hello') |
rtrim(str) | Remove trailing whitespace | SELECT rtrim('hello ') |
trim(str) | Remove leading and trailing whitespace | SELECT trim(' hello ') |
replace(str, search, replace) | Replace all occurrences | SELECT replace('hello', 'l', 'x') |
substr(str, pos, len) | Extract substring | SELECT substr('hello', 2, 3) |
substring(str, pos, len) | Extract substring | SELECT substring('hello', 2, 3) |
split(str, delimiter) | Split string into array | SELECT split('a,b,c', ',') |
split_part(str, delimiter, index) | Get part from split string (1-based) | SELECT split_part('a,b,c', ',', 2) |
instr(str, substr) | Find substring position (1-based, 0 if not found) | SELECT instr('hello', 'll') |
position(substr IN str) | Find substring position (SQL standard syntax) | SELECT position('ll' IN 'hello') |
startswith(str, prefix) | Check if string starts with prefix | SELECT startswith('hello', 'he') |
soundex(str) | Phonetic representation of string | SELECT soundex('Smith') |
luhn_check(str) | Validate string using Luhn algorithm | SELECT luhn_check('79927398713') |
| Function | Description | Example |
|---|---|---|
regexp_like(str, pattern) | Test if string matches regex pattern | SELECT regexp_like('hello123', '[0-9]+') |
regexp_extract(str, pattern) | Extract first substring matching pattern | SELECT regexp_extract('hello123', '[0-9]+') |
regexp_extract_all(str, pattern) | Extract all substrings matching pattern | SELECT regexp_extract_all('a1b2c3', '[0-9]+') |
regexp_replace(str, pattern, replacement) | Replace all matches with replacement string | SELECT regexp_replace('hello123', '[0-9]+', 'X') |
regexp_count(str, pattern) | Count occurrences of pattern in string | SELECT regexp_count('a1b2c3', '[0-9]+') |
| Function | Description | Example |
|---|---|---|
cast(expr AS type) | Convert value to specified type | cast(count() AS float) |
| Function | Description | Example |
|---|---|---|
count() | Count all rows | SELECT count() FROM table |
count(DISTINCT col) | Count distinct values in column | SELECT count(DISTINCT id) FROM table |
sum(col) | Sum values in column | SELECT sum(cost) FROM table |
avg(col) | Mean average values in column | SELECT avg(weight) FROM table |
min(col) | Minimum value in column | SELECT min(price) FROM table |
max(col) | Maximum value in column | SELECT max(price) FROM table |
min_by(x, y) | Value of x associated with minimum value of y | SELECT min_by(name, age) FROM table |
max_by(x, y) | Value of x associated with maximum value of y | SELECT max_by(name, age) FROM table |
count_if(condition) | Count rows where condition is true | SELECT count_if(age > 18) FROM table |
bool_and(condition) | True if all values are true | SELECT bool_and(active) FROM table |
bool_or(condition) | True if any value is true | SELECT bool_or(active) FROM table |
bit_and(col) | Bitwise AND of all non-null integer values | SELECT bit_and(flags) FROM table |
bit_or(col) | Bitwise OR of all non-null integer values | SELECT bit_or(flags) FROM table |
bit_xor(col) | Bitwise XOR of all non-null integer values | SELECT bit_xor(flags) FROM table |
every(condition) | True if all values are true (alias for bool_and) | SELECT every(active) FROM table |
stddev(col) | Sample standard deviation | SELECT stddev(salary) FROM table |
stddev_pop(col) | Population standard deviation | SELECT stddev_pop(salary) FROM table |
stddev_samp(col) | Sample standard deviation | SELECT stddev_samp(salary) FROM table |
variance(col) | Sample variance | SELECT variance(salary) FROM table |
var_pop(col) | Population variance | SELECT var_pop(salary) FROM table |
var_samp(col) | Sample variance | SELECT var_samp(salary) FROM table |
corr(y, x) | Pearson correlation coefficient | SELECT corr(sales, ads) FROM table |
covar_pop(y, x) | Population covariance | SELECT covar_pop(y, x) FROM table |
covar_samp(y, x) | Sample covariance | SELECT covar_samp(y, x) FROM table |
kurtosis(col) | Excess kurtosis | SELECT kurtosis(values) FROM table |
skewness(col) | Skewness | SELECT skewness(values) FROM table |
regr_intercept(y, x) | Linear regression intercept | SELECT regr_intercept(y, x) FROM table |
regr_slope(y, x) | Linear regression slope | SELECT regr_slope(y, x) FROM table |
| Function | Description | Example |
|---|---|---|
ROW_NUMBER() | Assigns sequential row number within partition | ROW_NUMBER() OVER (ORDER BY col) |
RANK() | Rank with gaps for ties | RANK() OVER (ORDER BY salary DESC) |
DENSE_RANK() | Rank without gaps for ties | DENSE_RANK() OVER (ORDER BY salary DESC) |
PERCENT_RANK() | Percentage ranking of value in group | PERCENT_RANK() OVER (ORDER BY salary) |
CUME_DIST() | Cumulative distribution of value | CUME_DIST() OVER (ORDER BY salary) |
NTILE(n) | Divide rows into n buckets | NTILE(4) OVER (ORDER BY salary) |
LAG(expr[, offset[, default]]) | Value from previous row | LAG(salary, 1) OVER (ORDER BY date) |
LEAD(expr[, offset[, default]]) | Value from next row | LEAD(salary, 1) OVER (ORDER BY date) |
FIRST_VALUE(expr) | First value in window frame | FIRST_VALUE(salary) OVER (ORDER BY date) |
LAST_VALUE(expr) | Last value in window frame | LAST_VALUE(salary) OVER (ORDER BY date) |
NTH_VALUE(expr, n) | Nth value in window frame | NTH_VALUE(salary, 2) OVER (ORDER BY date) |
| Aggregate functions | Standard aggregates (count, sum, avg, min, max, stddev, variance, etc.) can be used as window functions | SUM(amount) OVER (PARTITION BY customer_id) |
| Function | Description | Example |
|---|---|---|
zorder() | Iceberg Z-order optimization (Iceberg only) | CALL zorder('table', 'col1,col2') |
Window functions require an OVER clause with optional PARTITION BY and ORDER BY:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18-- examples SELECT id, name, ROW_NUMBER() OVER (ORDER BY name) as row_num FROM `/path/to/table`; SELECT category, name, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank_in_category FROM `/path/to/products`; SELECT order_id, customer_id, order_date, total_amount, SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total, AVG(total_amount) OVER (PARTITION BY customer_id) as customer_avg, COUNT(*) OVER (PARTITION BY customer_id) as customer_order_count FROM `/path/to/orders`; SELECT id, amount, ROW_NUMBER() OVER w as sequence_num FROM `/path/to/transactions` WINDOW w AS (PARTITION BY customer_id ORDER BY transaction_date);
| Operator | Description | Example |
|---|---|---|
+ | Plus | SELECT count() + 5 |
- | Minus | SELECT count() - 1 |
* | Times | SELECT sum(weight * 2) |
/ | Divide | SELECT (avg(a) + avg(b)) / 2 |
| Operator | Description | Example |
|---|---|---|
= | Equal to | WHERE id = 1 |
<>, != | Not equal to | WHERE status <> 'inactive' |
< | Less than | WHERE age < 18 |
<= | Less than or equal to | WHERE price <= 100.0 |
> | Greater than | WHERE salary > 50000 |
>= | Greater than or equal to | WHERE score >= 90 |
AND | Logical AND | WHERE active = true AND verified = true |
OR | Logical OR | WHERE status = 'new' OR status = 'pending' |
NOT | Logical NOT | WHERE NOT deleted |
IN | Value in list | WHERE category IN ('A', 'B', 'C') |
LIKE | Pattern matching | WHERE name LIKE 'John%' |
| Expression | Description | Example |
|---|---|---|
CASE ... WHEN | Switch expression | SELECT CASE WHEN mark >= 70 THEN 'A' WHEN mark >= 60 THEN 'B' ELSE 'C' END |
INTERVAL | Time interval literal | SELECT order_date + INTERVAL 30 DAY, WHERE created_at > current_timestamp() - INTERVAL 1 HOUR |
CASE WHEN condition THEN result
[WHEN condition THEN result]
[ELSE result]
END
INTERVAL { yearMonthIntervalQualifier | dayTimeIntervalQualifier }
yearMonthIntervalQualifier
{ YEAR [TO MONTH] |
MONTH }
dayTimeIntervalQualifier
{ DAY [TO { HOUR | MINUTE | SECOND } ] |
HOUR [TO { MINUTE | SECOND } ] |
MINUTE [TO SECOND] |
SECOND }
Copied!1 2 3 4 5-- examples INTERVAL '1' YEAR INTERVAL '6' MONTH INTERVAL '25' DAY INTERVAL '6' YEAR TO MONTH
| Type | Spark equivalent |
|---|---|
BOOLEAN | BOOLEAN |
SHORT | SHORT |
INT | INT |
LONG | LONG |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
DATE | DATE |
TIMESTAMP | TIMESTAMP |
TIMESTAMP_NTZ | TIMESTAMP_NTZ |
STRING | STRING |
BINARY | BINARY |
DECIMAL | DECIMAL |
MAP<key, value> | MAP<key, value> |
ARRAY<element_type> | ARRAY<element_type> |
STRUCT<field1_name: field1_type, field2_name: field2_type, …> | STRUCT<field1_name: field1_type, field2_name: field2_type, …> |
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17CREATE TABLE `/path/to/table` ( col1 BOOLEAN, col2 SHORT, col3 INT, col4 LONG, col5 FLOAT, col6 DOUBLE, col7 DATE, col8 TIMESTAMP, col9 TIMESTAMP_NTZ, col10 STRING, col11 BINARY, col12 DECIMAL(10, 2), col13 MAP<STRING, STRING>, col14 ARRAY<STRING>, col15 STRUCT<col1: STRING, col2: STRING> );