Documentation

Data connectivity & integrationAvailable connectorsDatabricks

Databricks

Connect Foundry to Databricks to leverage a range of capabilities on top of data, compute, and models available within Databricks.

Supported capabilities

Capability	Status
Exploration	🟢 Generally available
Bulk import	🟢 Generally available
Incremental	🟢 Generally available
Virtual tables	🟢 Generally available
Compute pushdown	🟢 Generally available: Python transforms
External models	🟢 Generally available

The Databricks connector now offers enhanced functionality when using virtual tables to expose the features of Delta Lake and Apache Iceberg. Refer to the Virtual Tables section of this documentation for information and details on how to configure the connector to enable this functionality.

Setup

Open the Data Connection application and select + New Source in the upper right corner of the screen.
Select Databricks from the available connector types.
Choose to use a direct connection over the Internet or to connect through an intermediary agent.
Follow the additional configuration prompts to continue the setup of your connector using the information in the sections below.

Learn more about setting up a connector in Foundry.

Connection details

The following configuration options are available for the Databricks connector:

Option	Required?	Default	Description
`Hostname`	Yes		The hostname of the Databricks workspace.
`HTTP Path`	Yes		The Databricks compute resource’s HTTP Path value. This can be either a: SQL warehouse of form `/sql/<version>/warehouses/<warehouseId>` Compute cluster of form `sql/protocolv1/o/<workspaceId>/<clusterId>`
`Cloud Fetch`	Yes	True	Indicates whether Cloud Fetch should be enabled. Refer to the Networking documentation below to ensure suitable network connectivity to the cloud storage locations.

Refer to the official Databricks documentation ↗ for information on how to obtain these values.

For best performance with virtual tables, Palantir recommends using a compute cluster together with external access to storage locations. See more details on setting up external access below.

Authentication

You can authenticate with Databricks in the following ways:

Method	Description	Documentation
`Basic authentication` [Legacy]	Authenticate with a user account using username and password. Basic authentication is legacy and not recommended in production.	Basic authentication ↗
`OAuth machine-to-machine`	Authenticate as a service principal using OAuth. Create a service principal in Databricks and generate an OAuth secret to obtain a client ID and secret.	OAuth for service principals (OAuth M2M) ↗
`Personal access token`	Authenticate as a user or service principal using a personal access token.	Personal access tokens (PAT) ↗.
`Workload identity federation` [Recommended]	Authenticate as a service principal using workload identity federation. Workload identity federation allows workloads running in Foundry to access Databricks APIs without the need for Databricks secrets. Create a service principal federation policy in Databricks and follow the displayed instructions to allow the source to securely authenticate as a service principal.	Databricks OAuth token federation ↗ Refer to our OIDC documentation for an overview of how OpenID Connect (OIDC) is supported in Foundry.

For full feature support, ensure that the credentials provided have been granted the relevant privileges on the relevant catalog and compute resources.

Networking

If you are using a direct connection for connectivity between Databricks and Foundry, the appropriate egress policies must be added when setting up the source in the Data Connection application. If you are using an agent runtime, the server running the agent must have suitable network access.

The Databricks connector requires network access to the Hostname provided in Configuration options on port 443. This grants access for Foundry to connect to the Databricks workspace and Unity Catalog REST APIs.

Cloud Fetch

Cloud Fetch is a feature of the Databricks JDBC driver. Cloud Fetch enables parallel data extraction from Databricks to Foundry through cloud storage, delivering up to 10x faster performance compared to traditional single-threaded transfers.

When enabled, additional network policies may be needed to allow outbound connections to the cloud storage service (AWS S3, Azure Data Lake Storage, or Google Cloud Storage) where Databricks temporarily stores query results. If you are using a direct connection, egress policies will need to be created for the workspace storage bucket. This is the cloud storage location used by Cloud Fetch.

Review Databricks' official documentation ↗ for details.

External access to storage locations (virtual tables only)

The Virtual Tables section of this documentation provides details on external access in Unity Catalog and the functionality it enables. External access requires network connectivity to a table's storage location (managed or external). Egress policies will need to be created for each storage location to benefit from the features enabled by external access.

Refer to the official Databricks documentation ↗ for more information on external access and how to determine the storage locations of tables.

Examples

Below we provide example egress policies that may need to be configured to ensure network connectivity to Databricks.

Type	URL	DNS	Port
Databricks workspace	`https://adb-5555555555555555.19.azuredatabricks.net/`	`adb-5555555555555555.19.azuredatabricks.net`	`443`
Azure storage location ^[1]	`abfss://<container-name>@<account-name>.dfs.core.windows.net/<table-directory>`	`<account-name>.dfs.core.windows.net` `<account-name>.blob.core.windows.net`	`443`
Google Cloud Storage (GCS) storage location	`gs://<bucket-path>/<table-directory>`	`storage.googleapis.com`	`443`
S3 storage location	`s3://<bucket-path>/<table-directory>`	`<bucket-path>.s3.<region>.amazonaws.com`	`443`

[1] Be sure to include both blob.core.<endpoint> and dfs.core.<endpoint> domains when configuring access to Azure storage locations. endpoint may vary depending on the Azure Cloud environment.

In a limited number of cases (depending on your Foundry and Databricks environments) it may be necessary to establish a connection via PrivateLink. This is typically the case where both Foundry and Databricks are hosted by the same CSP (for example, AWS-AWS or Azure-Azure.) If you believe this applies to your setup, contact your Palantir representative for additional guidance.

For egress policies that depend on an S3 bucket in the same region as your Foundry instance, ensure you have completed the additional configuration steps detailed in our Amazon S3 bucket policy documentation for the affected bucket(s).

More options: SSL and hostname validation

You may additionally need to pass in a JDBC property to allow self-signed certificates.

How to identify if this property is needed:

SSL connections validate server certificates. Normally, SSL validations happen through a certificate chain. By default, both agent and direct connection run times trust most industry-standard certificate chains.
If the server to which you are connecting has a self-signed certificate, or if a firewall performs TLS interception on the connection, the connector must trust the certificate. Learn more about using certificates in agent-based connections.
If you are creating a direct connection and are using a self-signed certificate, you will need to add a JDBC property for the AllowSelfSignedCerts=1 property.

How to add the property allowing self-signed certificates:

At the bottom of the Connection details page under Connection settings select More options then JDBC properties.
Under JDBC properties configuration, select Add property then New property then enter AllowSelfSignedCerts as the key and 1 as the value.

When the AllowSelfSignedCerts property is set to 1, SSL verification is disabled. In this case, the connector does not verify the server certificate against the trust store, and does not verify if the server's host name matches the common name or subject alternative names in the server certificate.

This JDBC property and others are outlined in the Databricks driver documentation ↗. The JDBC properties outlined in this documentation are specific to the Databricks driver and will differ from other source types.

The server must provide the full certificate chain in order for SSL verification to work. The certificate chain for the Databricks server can be obtained by running the command openssl s_client -connect {hostname}:{port} -showcerts. To verify the certificate chain, use the OpenSSL command line utility or any other available tool.

Virtual tables

Virtual tables allow you to connect to data registered in Databricks Unity Catalog. This allows you to both read and write to tables in Databricks from Foundry as well as push down compute to Databricks from pipelines in Foundry. This section provides additional details around using virtual tables with Databricks. This section is not applicable when syncing to Foundry datasets.

The Databricks connector now offers enhanced functionality when using virtual tables to expose the features of Delta Lake and Apache Iceberg. This functionality requires external access to be enabled in Unity Catalog. When enabled, external access allows Foundry to access tables using the Unity REST API and Iceberg REST catalog, and read and write data in the underlying storage locations. Unity Catalog credential vending is used to ensure secure access to cloud object storage. In addition to enhanced functionality, this can also improve the performance of reads and writes against these tables.

Refer to the official Databricks documentation ↗ for more information on external access and how to determine the storage locations of tables. Refer to the Networking section of this documentation for details on enabling network access to storage locations.

If external access is not enabled, or if the format of the Unity Catalog object is not supported (for example, views or materialized views), connections to Databricks will be made using JDBC. JDBC is the same mechanism used for syncs. Refer to the official Databricks documentation ↗ for more information on JDBC connectivity to Databricks.

The table below highlights the virtual table capabilities that are supported for Databricks.

Capability	Status
Bulk registration	🟡 Beta
Automatic registration	🟢 Generally available
Table inputs	🟢 Generally available: Code Repositories, Pipeline Builder
Table outputs	🟢 Generally available: Code Repositories, Pipeline Builder
Incremental pipelines	🟢 Generally available ^[2]
Compute pushdown	🟢 Generally available: Python transforms

Consult the virtual tables documentation for details on the supported Foundry workflows where Databricks tables can be used as inputs or outputs. Functionality may vary depending on whether external access is enabled.

Table format and storage locations

The following table provides a summary of the supported formats and workflows when external access is or is not enabled.

Unity Catalog object	External access required	Format	Table inputs	Table outputs
Managed table	Yes	Avro ↗, Delta ↗, Parquet ↗	✔️
Managed table	Yes	Iceberg ↗	✔️	✔️
External table	Yes	Delta	✔️	✔️
External table	Yes	Avro, Parquet	✔️
Managed table	No	Table ↗, View ↗, Materialized view	✔️
External table	No	Table, view, materialized view	✔️

[2] To enable incremental support for Spark pipelines backed by Databricks virtual tables, external access must be enabled; incremental computation requires the ability to directly interact with Delta or Iceberg tables. Incremental compute on top of Delta tables relies on Change Data Feed ↗. Incremental compute on top of Iceberg tables relies on Incremental Reads ↗.

Privileges on source credentials

For full feature support, we recommend providing the following catalog privileges to the credentials provided for the source connection.

Category	Privilege	Notes
Prerequisite	`USE CATALOG`, `USE SCHEMA`	Supports source exploration and table registration
Metadata	`BROWSE`	Supports reads
Read	`EXECUTE`, `READ VOLUME`, `SELECT`	Supports reads
Edit	`MODIFY`	Supports virtual table output (writes)
Create	`CREATE MATERALIZED VIEW`, `CREATE SCHEMA`, `CREATE TABLE`	Supports virtual table output (writes)
Other	`EXTERNAL USE SCHEMA`	Enables external access to storage locations

Additionally, the credentials provided must have usage privileges on the warehouse or compute cluster provided in the source configuration.

Source configuration requirements

When using virtual tables, remember the following source configuration requirements:

You must set up the source as a direct connection. Virtual tables do not support use of intermediary agents.
Ensure that bi-directional connectivity and allowlisting is established as described in the Networking section of this documentation, including the recommended networking to storage locations.
If using virtual tables in Code Repositories, refer to the Virtual Tables documentation for details of additional source configuration required.
You must specify a warehouse or compute cluster in the connection details using the HTTP path field. Refer to the official Databricks documentation ↗ on getting connection details for a Databricks compute resource.

See the Connection Details section above for more details.

Compute pushdown

Foundry offers the ability to push down compute to Databricks when using virtual tables. When using Databricks virtual tables registered to the same source as inputs and outputs to a pipeline, it is possible to fully federate compute to Databricks. This capability leverages Databricks Connect ↗ and is currently available in Python transforms. See the Python documentation for details on how to push down compute to Databricks.

External models

Databricks models registered in Unity Catalog can be integrated to Foundry via:

Refer to the official Databricks documentation ↗ for more information on making models available in Unity Catalog, and to the guide on setting up Databricks external models in Foundry.