External pipelines in Pipeline Builder [Beta]

Beta

External pipelines are currently in Beta. Functionality may change during ongoing development.

If you're new to Pipeline Builder, review how to create a batch pipeline in Pipeline Builder before proceeding.

Pipeline Builder now offers external pipelines, which push down compute to external compute engines. This functions in a similar manner as compute pushdown in Python transforms, and allows Foundry's pipeline management, data lineage, and security functionality to be used on top of external data warehouse compute. As with compute pushdown in Python transforms, all inputs and outputs from external pipelines must be virtual tables.

Tables built with external compute can be composed together with datasets and tables built with Foundry-native compute using Foundry’s scheduling tools, allowing you to orchestrate complex multi-technology pipelines using the exact right compute at every step along the way.

Diagram showing how Foundry external pipelines use virtual tables to enable you to push down compute to external execution engines.

Supported external compute engines for Pipeline Builder

Currently, Databricks is the only supported external compute engine in Pipeline Builder. To use other external compute engines, such as Snowflake or BigQuery, use transforms with compute pushdown.

Source typeStatusNotes
BigQueryNot available
DatabricksBetaServerless (default) or classic compute available.
SnowflakeNot available

Create a new external pipeline

  1. Open Pipeline Builder and select Create new pipeline.
  2. After entering a name for your pipeline and the desired location, choose Batch pipeline > External in the configuration settings and select Next.
  3. Search for and select your supported external source and import it into the pipeline.
  4. Now you can add virtual tables from that source to the graph and create your pipeline as usual.
  5. All pipeline outputs will be virtual table outputs in the source.
  6. When ready to build, save and deploy the pipeline. The pipeline will run using external compute and then output the result as a virtual table with storage in the source system.

All input and output tables must be configured from the same source you selected as part of the pipeline setup.

Screenshot of Pipeline configuration.

Configuring build settings

You can edit your pipeline source and configure source-specific compute options in the build settings panel.

Screenshot of external build settings configuration

Known limitations

External pipelines do not currently support the full set of features and expressions available in standard batch pipelines.

Currently unsupported features and expressions include:

  • Incremental computation
  • LLM features
  • Media set operations
  • Union
  • User-defined functions