Snowflake

Connect Foundry to Snowflake to read and sync data between Snowflake and Foundry.

Supported capabilities

CapabilityStatus
Exploration๐ŸŸข Generally available
Bulk import๐ŸŸข Generally available
Incremental๐ŸŸข Generally available
Virtual tables๐ŸŸข Generally available
Compute pushdown๐ŸŸข Generally available
Export tasks๐ŸŸก Sunset
Table Exports๐ŸŸก Beta

Setup

  1. Open the Data Connection application and select + New Source in the upper right corner of the screen.
  2. Select Snowflake from the available connector types.
  3. Choose to run the source capabilities on a Foundry worker or on an agent worker.
  4. Follow the additional configuration prompts to continue the setup of your connector using the information in the sections below.

Learn more about setting up a connector in Foundry.

Connection details

Snowflake accounts with underscores (_) must replace underscores with dashes (-). For example, my_account_prod needs to become my-account-prod. Failure to do so will cause networking issues.

OptionRequired?Description
Account identifierYesThis is the identifier that precedes ".snowflakecomputing.com". See Snowflake's official documentation โ†— for more details.
RolesNoThis is the default role to be used by the connection in case the credentials provided have access to multiple roles.
DatabaseYesSpecify a default database to use once connected.
SchemaNoOption to specify a default schema to use once connected. If not specified, all schemas will be available that are in-scope of the credentials.
WarehouseNo*The virtual warehouse to use once connected. In the case of registered virtual tables, this will be used for any source-side compute.
CredentialsYesRefer to the authentication section below for more details.
Network ConnectivityYes**Refer to the networking section below for more details.

* Warehouse details are optional for syncing Foundry datasets, but required for registering virtual tables.
** Network egress policies are required for Foundry worker connections, but not for agent worker connections.

Authentication

You can authenticate with Snowflake in the following ways:

MethodDescriptionDocumentation
Username and password [Legacy]Authenticate with a user account using a username and password. Basic authentication is legacy and not recommended in production.Working with passwords โ†—
Key-pair authenticationProvide a username and private key. Note that only unencrypted private keys are supported. Foundry will encrypt and store the private key securely.Key-pair authentication and key-pair rotation โ†—
External OAuth (OIDC) [Recommended]Authenticate as a user using workload identity federation. Workload identity federation allows workloads running in Foundry to access Snowflake without the need for Snowflake secrets. Follow the displayed source system configuration instructions to set up external OAuth.Workload identity federation โ†—

Refer to our OIDC documentation for an overview of how OpenID Connect (OIDC) is supported in Foundry.

For all authentication options, ensure that the provided user and role has usage privileges on the target database(s) and schema(s), as well as select privileges on the target table(s).

When registering virtual tables, the user and their role should also have usage privileges on the warehouse.

Snowflake is rolling out changes to require multi-factor authentication (MFA) for human users that use passwords, and to disallow passwords for all service users. As such, Username and password will no longer be a suitable authentication mechanism. Refer to the official Snowflake documentation โ†— for additional information and guidance on migrating.

Networking

For connections running on a Foundry worker, the appropriate egress policies must be added when setting up the source in the Data Connection application.

To identify the hostnames and port numbers of your Snowflake account to be allowlisted, you can run the following command in your Snowflake console. Ensure that at least the entries for SNOWFLAKE_DEPLOYMENT and STAGE are added as egress policies in Foundry.

Copied!
1 2 3 4 SELECT t.VALUE:type::VARCHAR as type, t.VALUE:host::VARCHAR as host, t.VALUE:port as port FROM TABLE(FLATTEN(input => PARSE_JSON(SYSTEM$ALLOWLIST()))) AS t;

See Snowflake's official documentation โ†— for additional information on identifying hostnames and port numbers to allowlist.

Connections from Foundry to Snowflake normally come from the default public gateway IPs for your environment. However, traffic within the same cloud provider (for example, AWS-AWS or Azure-Azure) may use different routing, and require establishing a connection via PrivateLink. See below for the additional setup required per cloud provider, or contact your Palantir representative for additional guidance.

Snowflake instance hosted on S3

If your Snowflake instance is configured to route internal S3 stage traffic through a VPCE โ†—, the Snowflake JDBC driver must be manually configured to not use the custom VPCE domain. Otherwise, the driver will be routed to the custom VPCE domain (which is inaccessible from Foundry's VPC) and will fail connections to URLs with the format of <bucketname>.bucket.vpce-<vpceid>.s3.<region>.vpce.amazonaws.com.

You can manually configure this by adding a JDBC connection property in the Connection details of your instance, with a key of S3_STAGE_VPCE_DNS_NAME and an empty value field (the equivalent of setting it to null). The S3 stage traffic will then be routed through the AWS S3 Gateway Endpoint (<bucketname>.bucket.s3.<region>.vpce.amazonaws.com) which maintains private connectivity so traffic will not be routed through the public internet.

Review our PrivateLink egress documentation โ†— for more information.

For egress policies that depend on an S3 bucket in the same region as your Foundry instance, ensure you have completed the additional configuration steps detailed in our Amazon S3 bucket policy documentation for the affected bucket(s).

Snowflake instance hosted on Azure

The Snowflake JDBC driver used for the Foundry Snowflake connector may attempt to connect directly to an underlying โ€œinternal stageโ€ storage bucket when fetching data. For Snowflake hosted on Azure, because Azure-hosted Foundry enrollments route traffic over Azure service endpoints, network connectivity from Foundry to the underlying stage buckets must be explicitly allow-listed by following the instructions below.

Gather the required information about your Snowflake warehouse

You will need the following information about your Azure-hosted Snowflake warehouse to establish network connectivity to Foundry:

System allowlist domains

Use the SYSTEM$ALLOWLIST command to get the full list of domains that may be required to successfully connect.

  • Note: This is the same command than used above to define egress policies, and is explained in the network panel callout in Data Connection.
  • This list will include the domain of an Azure storage bucket used as the stage for your Snowflake warehouse.
Azure storage account identifier

For the Azure storage bucket returned from the SYSTEM$ALLOWLIST command, you will also need to retrieve the storage account identifier.

  • If you are using Snowflake Standard Edition or Enterprise Edition, you will need to file a ticket with Snowflake support to request the storage account identifier.
  • If you are using Snowflake Business Critical Edition โ†—, you can retrieve the storage account identifier with the following steps:
    • Set the ENABLE_INTERNAL_STAGES_PRIVATELINK โ†— parameter to TRUE for the account.
    • Then, call the SYSTEM$GET_PRIVATELINK_CONFIG() โ†— function, which returns a field called privatelink-internal-stage containing the Azure storage account resource identifier.
      • Note that even if you are not connecting over a PrivateLink, you still need to retrieve and provide the storage account resource identifier.

A full Azure Storage account resource identifier will be in the following format:

/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}

More information on how to find an Azure Storage account resource ID directly in the Azure console can be found in the Azure documentation โ†—. Restricting cross-account network traffic using VNET rules and the storage account identifier is in line with Microsoftโ€™s published best practices โ†—, and should be used for all connections to Azure-hosted Snowflake warehouses from within Azure compute instances.

Allow outbound traffic from Foundry to the Azure storage account associated with your Snowflake warehouse

Now that you have gathered the required information about your Snowflake warehouse, you can create the required policies needed to enable Foundry access to your Snowflake data.

  1. Create a standard egress policy for the Azure storage internal stage, and attach it to your Snowflake source.

    • Note that you should add policies for everything returned from the SYSTEM$ALLOWLIST command, and not just the storage bucket domain.
  2. Create an Azure storage policy, pasting in the storage account resource identifier.

Navigate back to your Snowflake source in Data Connection and confirm you can explore the source and run syncs.

Iceberg tables (virtual tables only)

The Virtual tables section of this documentation provides details on integrating Iceberg tables registered in Snowflake Horizon Catalog. This functionality requires network connectivity to the external volume where the Iceberg table is stored. You must create network egress policies for each external volume, as Foundry reads and writes to the storage location directly rather than querying the table using a Snowflake compute warehouse.

Refer to the official Snowflake documentation โ†— for more information on external volumes and how to determine table storage locations.

Configuring egress policies for an external volume enables network traffic to egress from Foundry to that storage location. Network controls may vary between cloud providers, so you should ensure that any network controls on the storage location permit network traffic from Foundry. Learn more about identifying the IP addresses where Foundry traffic originates..

Additionally, refer to the other sections of this Networking documentation to ensure you correctly set up connections to external volumes hosted in the same cloud region as your Foundry instance.

Virtual tables

This section provides additional details around using virtual tables with a Snowflake source. This section is not applicable when syncing to Foundry datasets.

The Snowflake connector now offers enhanced functionality when using virtual tables to access Iceberg tables registered in Horizon Catalog. Foundry uses the Iceberg REST APIs exposed in Horizon Catalog to access tables as well as read and write data in the underlying storage locations, which are configured as external volumes in Snowflake. Additionally, Foundry uses Horizon Catalog credential vending to ensure secure access to cloud object storage. This can also improve the performance of reads and writes against these tables.

The connector exposes Iceberg functionality automatically if you:

  1. Configure network egress policies that allow connectivity from Foundry to the external volume that stores the table.
  2. Configure credentials on the source that have permission to obtain vended credentials from Horizon Catalog.



Foundry uses Iceberg clients to establish connections to read or write tables to the storage location directly without using Snowflake compute. However, Foundry still uses the warehouse configured on the source for certain metadata queries, such as determining the type of table being accessed.

Refer to the official Snowflake documentation โ†— for more information on querying Iceberg tables with an external engine through Snowflake Horizon Catalog. Refer to the Networking section of this documentation for details on enabling network access to external volumes.

Foundry treats Iceberg tables like regular Snowflake tables if any of the above requirements are not met. Connections to Snowflake are made using the same mechanism as for other Snowflake data (such as tables, views, or materialized views) and rely on a Snowflake compute warehouse to read and write from the table.

The table below highlights the virtual table capabilities that are supported for Snowflake.

CapabilityStatus
Bulk registration๐ŸŸก Beta
Automatic registration๐ŸŸข Generally available
Table inputs๐ŸŸข Generally available: tables, views, materialized views in Code Repositories, Pipeline Builder
Table outputs๐ŸŸข Generally available: tables in Code Repositories, Pipeline Builder
Incremental pipelines๐ŸŸข Generally available: APPEND only [1]
Compute pushdown๐ŸŸข Generally available
๐ŸŸก Beta: Pipeline Builder

Consult the virtual tables documentation for details on the supported Foundry workflows where Snowflake tables can be used as inputs or outputs.

[1] To enable incremental support for pipelines backed by Snowflake virtual tables, ensure that Change Tracking โ†— and Time Travel โ†— are enabled for the appropriate retention period. This functionality relies on CHANGES โ†— The current and added read modes in Python Transforms are supported. These will expose the relevant rows of the change feed based on the METADATA$ACTION column. The METADATA$ACTION, METADATA$ISUPDATE, METADATA$ROW_ID columns will be made available in Python Transforms.

Privileges on source credentials

For full feature support, you should provide the following privileges to the credentials configured for the source connection. You should apply these on either the database, schema, or table depending on the desired inheritance model.

CategoryPrivilegeNotes
PrerequisiteUSAGEMust be granted on the Snowflake databases and schemas that will be used in Foundry.
ReadSELECTRequired to read Snowflake tables when using syncs or virtual table inputs.
EditDELETE, INSERT, TRUNCATE, UPDATERequired to modify Snowflake tables when using virtual table outputs.
CreateCREATE SCHEMA, CREATE TABLERequired to create Snowflake tables when using virtual table outputs.

When using Iceberg tables, USAGE privilege is required on the external volume where the table is stored.

Additionally, the credentials provided must have usage privileges on the warehouse provided in the source configuration.

Refer to the official Snowflake documentation โ†— for more information on access control privileges in Snowflake.

Source configuration requirements

When using virtual tables, remember the following source configuration requirements:

  • You must use a Foundry worker source. Virtual tables do not support use of agent worker connections.
  • Ensure that bi-directional connectivity and allowlisting is established as described in the Networking section of this documentation.
  • If using virtual tables in Code Repositories, refer to the Virtual Tables documentation for details of additional source configuration required.
  • You must specify a warehouse in the connection details.
  • The credentials provided must have usage privileges on the warehouse.

See the Connection Details section above for more details.

Compute pushdown

Foundry offers the ability to push down compute to Snowflake when using virtual tables. Virtual table inputs leverage the Snowflake Spark connector โ†— which has built-in support for predicate pushdown.

When using Snowflake virtual tables registered to the same source as inputs and outputs to a pipeline, it is possible to fully federate compute to Snowflake. This feature is currently available in Python transforms. See the Python documentation for details on how to push down compute to Snowflake.

Data model

Note that columns of type array โ†—, object โ†—, and variant โ†— will be parsed by Foundry as type string. This is due to the source's variable typing.

For example, the Snowflake array [ 1, 2, 3 ] would be interpreted by Foundry as the string "[1,2,3]".

See Snowflake's official documentation โ†—for more details.