SQL dialect

Foundry SQL is generally a subset of Spark SQL ↗ with ANSI compliance.

Table creation

CREATE [ OR REPLACE ] [ ( col_name1 col_type1, ... ) ] [ USING table_format ] [ AS select_statement ]

If not specified, the table format will be Iceberg. Valid table formats include:

  • Iceberg: iceberg
  • Parquet: parquet
  • Avro: avro
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -- examples CREATE TABLE `/path/to/table` AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` USING parquet AS SELECT * FROM `/path/to/table`; CREATE TABLE `/path/to/table` ( id INT, name STRING ); CREATE OR REPLACE TABLE `/path/to/table` (id INT, name STRING);

Table alteration

Iceberg tables support data appends with INSERT, UPDATE, and DELETE statements. Non-Iceberg tables do not currently support table alteration.

Insert

INSERT [ INTO ] table_identifier { VALUES ( { value | NULL } [ , ... ] ) [ , ( ... ) ] | query }
Copied!
1 2 3 4 -- examples INSERT INTO `/path/to/table` VALUES (1, 'foo'), (2, 'bar'); INSERT INTO `/path/to/table` SELECT id, name FROM `/path/to/table2`;

Update

UPDATE table_identifier
SET column_name = value [, ...]
[ WHERE condition ]
Copied!
1 2 3 4 5 6 7 -- examples UPDATE `/path/to/table` SET col = 'new value'; UPDATE `/path/to/table` SET col = 'new value' WHERE col = 'old value';

Delete

DELETE FROM table_identifier WHERE condition
Copied!
1 DELETE FROM `/path/to/table` WHERE col = 'to delete'

Query composition

[ WITH with_query [ , ... ] ]
select_statement
  [ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
  [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] } ]
  [ WINDOW { window_name AS ( window_spec ) [, ...] } ]
  [ LIMIT { ALL | expression } ]

Where with_query is:

cte_name [ ( column_name [, ...] ) ] AS ( select_statement )

Where select_statement is:

SELECT [ ALL | DISTINCT ] { [ named_expression | regex_column_names | * ] [, ...] }
  FROM { from_item [, ...] }
  [ WHERE boolean_expression ]
  [ GROUP BY expression [ , ... ] ]
  [ HAVING boolean_expression ]
  [ LIMIT { ALL | expression } [ OFFSET expression ] ]

Where named_expression is:

  • Expression (with alias)

    expression [ AS alias ]
    
    Copied!
    1 2 3 4 -- examples SELECT count(*) AS total_count FROM `/path/to/table`; SELECT name, salary * 1.1 AS adjusted_salary FROM `/path/to/employees`;
  • Window function

    See Window Functions section for available functions.

    window_function() OVER ( window_spec )
    window_function() OVER window_name
    

    Where window_spec is:

    [ PARTITION BY expression [, ...] ]
    [ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
    
    Copied!
    1 2 3 4 5 6 7 8 9 10 11 12 13 -- examples SELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) as rank FROM `/path/to/employees`; SELECT category, product_id, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as category_rank FROM `/path/to/products`; -- Using named window SELECT customer_id, order_date, total, ROW_NUMBER() OVER w as order_sequence FROM `/path/to/orders` WINDOW w AS (PARTITION BY customer_id ORDER BY order_date);

Where from_item is:

  • Table
    { `/path/to/table`[.branch_<branch name>] | `ri.rid.for.table.1234`[.branch_<branch name>] }
    
    Copied!
    1 2 3 4 5 6 7 8 -- examples SELECT * FROM `/path/to/table`; SELECT * FROM `/path/to/table`.branch_master; SELECT * FROM `/path/to/table` WHERE id = 1; SELECT * FROM `ri.rid.for.table.1234`;
  • Join
    relation
      { CROSS | LEFT [ OUTER ] | RIGHT [ OUTER ] | FULL [ OUTER ] }
      JOIN relation
      ON boolean_expression
    
    Copied!
    1 2 3 4 -- examples SELECT * FROM `/path/to/table` LEFT JOIN `/path/to/table2` ON id = id2; SELECT * FROM `/path/to/table` RIGHT OUTER JOIN `/path/to/table2` ON id = id2;
  • Subquery
    ( select_statement )
    
    Copied!
    1 2 3 4 5 6 -- examples SELECT * FROM (SELECT * FROM `/path/to/table`); SELECT * FROM (SELECT * FROM `/path/to/table`) WHERE id = 1; VALUES (9001, 1001, 1, DATE '2023-06-01', 'credit_card', 1329.98, 'completed', TIMESTAMP '2023-06-01 11:00:00')

    Note: Subqueries must be wrapped in parentheses - this is not always required in Spark SQL, but is enforced in Foundry.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 -- examples SELECT DISTINCT o.order_id, c.name, o.total_amount, p.category FROM `/data/orders` o LEFT JOIN `/data/customers` c ON o.customer_id = c.id CROSS JOIN (SELECT product_id, category FROM `/data/products` WHERE category = 'Electronics') p ON o.product_id = p.product_id WHERE o.order_date > DATE '2024-01-01' AND o.total_amount > 1000.0 ORDER BY o.total_amount DESC LIMIT 100; SELECT c.name, COUNT(o.order_id) as order_count, SUM(o.total_amount) as total_spent FROM `/data/customers` c LEFT JOIN `/data/orders` o ON c.id = o.customer_id GROUP BY c.name HAVING COUNT(o.order_id) > 0; SELECT emp_id, name, dept, 'Active' AS status FROM `/hr/employees` e LEFT OUTER JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = true UNION SELECT emp_id, name, dept, 'Inactive' AS status FROM (SELECT * FROM `/hr/employees` WHERE termination_date < current_time()) e RIGHT JOIN `/hr/departments` d ON e.dept_id = d.id WHERE e.active = false ORDER BY dept ASC, status DESC NULLS LAST; WITH sales_summary (cust_id, total_sales, order_count) AS ( SELECT customer_id, SUM(total_amount), COUNT(*) FROM `/data/orders` GROUP BY customer_id ) SELECT * FROM sales_summary WHERE total_sales > 10000;

Iceberg references

Foundry SQL supports Iceberg references to query metadata. See the official documentation ↗.

Copied!
1 2 3 4 -- examples SELECT * FROM `/path/to/table/`.files; SELECT * FROM `/path/to/table/`.snapshots;

Functions

Functions are available to transforms data in more complex ways. Refer to the Spark SQL documentation ↗ for detailed information on each function below.

Array functions

FunctionDescriptionExample
array(expr, ...)Create an array with given elementsSELECT array(1, 2, 3)
array_distinct(array)Remove duplicate values from arraySELECT array_distinct(tags) FROM table
array_intersect(array1, array2)Elements in both arrays without duplicatesSELECT array_intersect(arr1, arr2) FROM table
array_union(array1, array2)Elements in either array without duplicatesSELECT array_union(arr1, arr2) FROM table
array_except(array1, array2)Elements in array1 but not in array2SELECT array_except(arr1, arr2) FROM table
array_join(array, delimiter [, nullReplacement])Concatenate array elements with delimiterSELECT array_join(tags, ',') FROM table
array_max(array)Maximum value in arraySELECT array_max(ARRAY(1, 5, 3))
array_min(array)Minimum value in arraySELECT array_min(ARRAY(1, 5, 3))
array_position(array, element)Position of first occurrence (1-based, 0 if not found)SELECT array_position(tags, 'item') FROM table
array_remove(array, element)Remove all matching elementsSELECT array_remove(tags, 'old') FROM table
array_size(array)Number of elements in arraySELECT array_size(tags) FROM table
array_contains(array, value)True if array contains valueSELECT array_contains(tags, 'new') FROM table
array_repeat(element, count)Create array with element repeated count timesSELECT array_repeat('x', 5)
arrays_overlap(array1, array2)True if arrays have common non-null elementsSELECT arrays_overlap(arr1, arr2) FROM table
flatten(arrayOfArrays)Flatten nested arrays into single arraySELECT flatten(nested_arrays) FROM table
sequence(start, stop, step)Generate sequence of integersSELECT sequence(1, 10, 2)
shuffle(array)Random permutation of arraySELECT shuffle(tags) FROM table
slice(array, start, length)Subset of array from start with lengthSELECT slice(tags, 1, 2) FROM table
sort_array(array[, ascendingOrder])Sort array in ascending (default) or descending orderSELECT sort_array(tags) FROM table
get(array, index)Returns element at given index (0-based)SELECT get(tags, 0) FROM table

Map functions

FunctionDescriptionExample
map(key1, value1, key2, value2, ...)Create a map from key-value pairsSELECT map('a', 1, 'b', 2)

Struct functions

FunctionDescriptionExample
struct(expr1, expr2, ...)Create a struct from expressionsSELECT struct('red', 'large', 2.0)

Date and timestamp functions

FunctionDescriptionExample
current_timestamp()Returns current timestampSELECT current_timestamp()
current_date()Returns current dateSELECT current_date()
date_add(expr, num_days)Add days to datedate_add(order_date, 7)
date_sub(expr, num_days)Subtract days from datedate_sub(order_date, 7)
from_unixtime(expr [, format])Convert Unix timestamp to timestampfrom_unixtime(1609459200)
unix_timestamp([expr] [, format])Convert timestamp to Unix timestampunix_timestamp(current_timestamp())
day(date)Extract day of month from dateSELECT day(order_date) FROM table
hour(timestamp)Extract hour from timestampSELECT hour(order_timestamp) FROM table
minute(timestamp)Extract minute from timestampSELECT minute(order_timestamp) FROM table
month(date)Extract month from dateSELECT month(order_date) FROM table
quarter(date)Extract quarter from dateSELECT quarter(order_date) FROM table
second(timestamp)Extract second from timestampSELECT second(order_timestamp) FROM table
year(date)Extract year from dateSELECT year(order_date) FROM table

Mathematical functions

FunctionDescriptionExample
abs(x)Absolute valueSELECT abs(-5)
acos(x)Arc cosineSELECT acos(0.5)
asin(x)Arc sineSELECT asin(0.5)
atan(x)Arc tangentSELECT atan(1)
atan2(y, x)Arc tangent of y/xSELECT atan2(1, 1)
cbrt(x)Cube rootSELECT cbrt(27)
ceil(x)Round up to nearest integerSELECT ceil(3.7)
ceiling(x)Round up to nearest integerSELECT ceiling(3.7)
cos(x)CosineSELECT cos(pi())
cosh(x)Hyperbolic cosineSELECT cosh(0)
degrees(x)Convert radians to degreesSELECT degrees(pi())
e()Euler's numberSELECT e()
exp(x)e raised to power xSELECT exp(1)
floor(x)Round down to nearest integerSELECT floor(3.7)
ln(x)Natural logarithmSELECT ln(e())
log(base, x)Logarithm with specified baseSELECT log(2, 8)
log10(x)Base 10 logarithmSELECT log10(100)
log2(x)Base 2 logarithmSELECT log2(8)
mod(n, m)Modulus (remainder)SELECT mod(10, 3)
pi()Pi constantSELECT pi()
pow(x, p)x raised to power pSELECT pow(2, 3)
power(x, p)x raised to power pSELECT power(2, 3)
radians(x)Convert degrees to radiansSELECT radians(180)
rand()Random value between 0 and 1SELECT rand()
random()Random value between 0 and 1SELECT random()
round(x, d)Round to d decimal placesSELECT round(3.7, 1)
sign(x)Sign function (-1, 0, 1)SELECT sign(-5)
sin(x)SineSELECT sin(pi() / 2)
sinh(x)Hyperbolic sineSELECT sinh(0)
sqrt(x)Square rootSELECT sqrt(16)
tan(x)TangentSELECT tan(pi() / 4)
tanh(x)Hyperbolic tangentSELECT tanh(0)

String functions

FunctionDescriptionExample
chr(n)Returns character from Unicode code pointSELECT chr(65)
char(n)Returns character from code point (alias for chr)SELECT char(65)
concat_ws(sep, str1, ...)Concatenate strings with separatorSELECT concat_ws(',', 'a', 'b')
length(str)Length of string in charactersSELECT length('hello')
char_length(str)Length of string (alias for length)SELECT char_length('hello')
character_length(str)Length of string (alias for length)SELECT character_length('hello')
lower(str)Convert to lowercaseSELECT lower('HELLO')
upper(str)Convert to uppercaseSELECT upper('hello')
lpad(str, len, pad)Left pad string to lengthSELECT lpad('hi', 5, 'x')
rpad(str, len, pad)Right pad string to lengthSELECT rpad('hi', 5, 'x')
ltrim(str)Remove leading whitespaceSELECT ltrim(' hello')
rtrim(str)Remove trailing whitespaceSELECT rtrim('hello ')
trim(str)Remove leading and trailing whitespaceSELECT trim(' hello ')
replace(str, search, replace)Replace all occurrencesSELECT replace('hello', 'l', 'x')
substr(str, pos, len)Extract substringSELECT substr('hello', 2, 3)
substring(str, pos, len)Extract substringSELECT substring('hello', 2, 3)
split(str, delimiter)Split string into arraySELECT split('a,b,c', ',')
split_part(str, delimiter, index)Get part from split string (1-based)SELECT split_part('a,b,c', ',', 2)
instr(str, substr)Find substring position (1-based, 0 if not found)SELECT instr('hello', 'll')
position(substr IN str)Find substring position (SQL standard syntax)SELECT position('ll' IN 'hello')
startswith(str, prefix)Check if string starts with prefixSELECT startswith('hello', 'he')
soundex(str)Phonetic representation of stringSELECT soundex('Smith')
luhn_check(str)Validate string using Luhn algorithmSELECT luhn_check('79927398713')

Conditional functions

FunctionDescriptionExample
regexp_like(str, pattern)Test if string matches regex patternSELECT regexp_like('hello123', '[0-9]+')
regexp_extract(str, pattern)Extract first substring matching patternSELECT regexp_extract('hello123', '[0-9]+')
regexp_extract_all(str, pattern)Extract all substrings matching patternSELECT regexp_extract_all('a1b2c3', '[0-9]+')
regexp_replace(str, pattern, replacement)Replace all matches with replacement stringSELECT regexp_replace('hello123', '[0-9]+', 'X')
regexp_count(str, pattern)Count occurrences of pattern in stringSELECT regexp_count('a1b2c3', '[0-9]+')

Conversion functions

FunctionDescriptionExample
cast(expr AS type)Convert value to specified typecast(count() AS float)

Aggregate functions

FunctionDescriptionExample
count()Count all rowsSELECT count() FROM table
count(DISTINCT col)Count distinct values in columnSELECT count(DISTINCT id) FROM table
sum(col)Sum values in columnSELECT sum(cost) FROM table
avg(col)Mean average values in columnSELECT avg(weight) FROM table
min(col)Minimum value in columnSELECT min(price) FROM table
max(col)Maximum value in columnSELECT max(price) FROM table
min_by(x, y)Value of x associated with minimum value of ySELECT min_by(name, age) FROM table
max_by(x, y)Value of x associated with maximum value of ySELECT max_by(name, age) FROM table
count_if(condition)Count rows where condition is trueSELECT count_if(age > 18) FROM table
bool_and(condition)True if all values are trueSELECT bool_and(active) FROM table
bool_or(condition)True if any value is trueSELECT bool_or(active) FROM table
bit_and(col)Bitwise AND of all non-null integer valuesSELECT bit_and(flags) FROM table
bit_or(col)Bitwise OR of all non-null integer valuesSELECT bit_or(flags) FROM table
bit_xor(col)Bitwise XOR of all non-null integer valuesSELECT bit_xor(flags) FROM table
every(condition)True if all values are true (alias for bool_and)SELECT every(active) FROM table
stddev(col)Sample standard deviationSELECT stddev(salary) FROM table
stddev_pop(col)Population standard deviationSELECT stddev_pop(salary) FROM table
stddev_samp(col)Sample standard deviationSELECT stddev_samp(salary) FROM table
variance(col)Sample varianceSELECT variance(salary) FROM table
var_pop(col)Population varianceSELECT var_pop(salary) FROM table
var_samp(col)Sample varianceSELECT var_samp(salary) FROM table
corr(y, x)Pearson correlation coefficientSELECT corr(sales, ads) FROM table
covar_pop(y, x)Population covarianceSELECT covar_pop(y, x) FROM table
covar_samp(y, x)Sample covarianceSELECT covar_samp(y, x) FROM table
kurtosis(col)Excess kurtosisSELECT kurtosis(values) FROM table
skewness(col)SkewnessSELECT skewness(values) FROM table
regr_intercept(y, x)Linear regression interceptSELECT regr_intercept(y, x) FROM table
regr_slope(y, x)Linear regression slopeSELECT regr_slope(y, x) FROM table

Window functions

FunctionDescriptionExample
ROW_NUMBER()Assigns sequential row number within partitionROW_NUMBER() OVER (ORDER BY col)
RANK()Rank with gaps for tiesRANK() OVER (ORDER BY salary DESC)
DENSE_RANK()Rank without gaps for tiesDENSE_RANK() OVER (ORDER BY salary DESC)
PERCENT_RANK()Percentage ranking of value in groupPERCENT_RANK() OVER (ORDER BY salary)
CUME_DIST()Cumulative distribution of valueCUME_DIST() OVER (ORDER BY salary)
NTILE(n)Divide rows into n bucketsNTILE(4) OVER (ORDER BY salary)
LAG(expr[, offset[, default]])Value from previous rowLAG(salary, 1) OVER (ORDER BY date)
LEAD(expr[, offset[, default]])Value from next rowLEAD(salary, 1) OVER (ORDER BY date)
FIRST_VALUE(expr)First value in window frameFIRST_VALUE(salary) OVER (ORDER BY date)
LAST_VALUE(expr)Last value in window frameLAST_VALUE(salary) OVER (ORDER BY date)
NTH_VALUE(expr, n)Nth value in window frameNTH_VALUE(salary, 2) OVER (ORDER BY date)
Aggregate functionsStandard aggregates (count, sum, avg, min, max, stddev, variance, etc.) can be used as window functionsSUM(amount) OVER (PARTITION BY customer_id)

Misc functions

FunctionDescriptionExample
zorder()Iceberg Z-order optimization (Iceberg only)CALL zorder('table', 'col1,col2')

Window functions require an OVER clause with optional PARTITION BY and ORDER BY:

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 -- examples SELECT id, name, ROW_NUMBER() OVER (ORDER BY name) as row_num FROM `/path/to/table`; SELECT category, name, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rank_in_category FROM `/path/to/products`; SELECT order_id, customer_id, order_date, total_amount, SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total, AVG(total_amount) OVER (PARTITION BY customer_id) as customer_avg, COUNT(*) OVER (PARTITION BY customer_id) as customer_order_count FROM `/path/to/orders`; SELECT id, amount, ROW_NUMBER() OVER w as sequence_num FROM `/path/to/transactions` WINDOW w AS (PARTITION BY customer_id ORDER BY transaction_date);

Arithmetic operators

OperatorDescriptionExample
+PlusSELECT count() + 5
-MinusSELECT count() - 1
*TimesSELECT sum(weight * 2)
/DivideSELECT (avg(a) + avg(b)) / 2

Boolean operators

OperatorDescriptionExample
=Equal toWHERE id = 1
<>, !=Not equal toWHERE status <> 'inactive'
<Less thanWHERE age < 18
<=Less than or equal toWHERE price <= 100.0
>Greater thanWHERE salary > 50000
>=Greater than or equal toWHERE score >= 90
ANDLogical ANDWHERE active = true AND verified = true
ORLogical ORWHERE status = 'new' OR status = 'pending'
NOTLogical NOTWHERE NOT deleted
INValue in listWHERE category IN ('A', 'B', 'C')
LIKEPattern matchingWHERE name LIKE 'John%'

Other expressions

ExpressionDescriptionExample
CASE ... WHENSwitch expressionSELECT CASE WHEN mark >= 70 THEN 'A' WHEN mark >= 60 THEN 'B' ELSE 'C' END
INTERVALTime interval literalSELECT order_date + INTERVAL 30 DAY, WHERE created_at > current_timestamp() - INTERVAL 1 HOUR

Case-when statements

CASE WHEN condition THEN result
     [WHEN condition THEN result]
     [ELSE result]
END

Date/time intervals

INTERVAL { yearMonthIntervalQualifier | dayTimeIntervalQualifier }

yearMonthIntervalQualifier
 { YEAR [TO MONTH] |
   MONTH }

dayTimeIntervalQualifier
 { DAY [TO { HOUR | MINUTE | SECOND } ] |
   HOUR [TO { MINUTE | SECOND } ] |
   MINUTE [TO SECOND] |
   SECOND }
Copied!
1 2 3 4 5 -- examples INTERVAL '1' YEAR INTERVAL '6' MONTH INTERVAL '25' DAY INTERVAL '6' YEAR TO MONTH

Data types

TypeSpark equivalent
BOOLEANBOOLEAN
SHORTSHORT
INTINT
LONGLONG
FLOATFLOAT
DOUBLEDOUBLE
DATEDATE
TIMESTAMPTIMESTAMP
TIMESTAMP_NTZTIMESTAMP_NTZ
STRINGSTRING
BINARYBINARY
DECIMALDECIMAL
MAP<key, value>MAP<key, value>
ARRAY<element_type>ARRAY<element_type>
STRUCT<field1_name: field1_type, field2_name: field2_type, …>STRUCT<field1_name: field1_type, field2_name: field2_type, …>
Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 CREATE TABLE `/path/to/table` ( col1 BOOLEAN, col2 SHORT, col3 INT, col4 LONG, col5 FLOAT, col6 DOUBLE, col7 DATE, col8 TIMESTAMP, col9 TIMESTAMP_NTZ, col10 STRING, col11 BINARY, col12 DECIMAL(10, 2), col13 MAP<STRING, STRING>, col14 ARRAY<STRING>, col15 STRUCT<col1: STRING, col2: STRING> );