Virtual tables allow you to query tables in supported data platforms without first storing the data in a Foundry dataset.
A virtual table acts as a pointer to a table in a source system outside of Foundry. Virtual tables abstract away the underlying source system and storage formats, enabling you to build workflows that combine data from different source systems seamlessly. Virtual tables can also be combined with datasets stored in Foundry as part of a flexible architecture where data need not be consolidated in one place. You can also create new virtual tables as outputs from Foundry data transformations, enabling workflows where storage is fully external and Foundry handles orchestration, security, and other functions.
A virtual table is defined by:
As with any resource in Foundry, virtual tables are governed by Foundry's security and permissions model and can be opened or used in various Foundry applications.
The following sources support virtual tables. Refer to the source documentation for more details on how to configure the connection as well as the supported capabilities.
Source | Status | Supported Formats | Manual Registration | Automatic Registration |
---|---|---|---|---|
Amazon S3 | 🟢 Generally available | Avro ↗, Delta ↗, Iceberg ↗, Parquet ↗ | ✔️ | |
Azure Data Lake Storage Gen2 (Azure Blob Storage) | 🟢 Generally available | Avro ↗, Delta ↗, Iceberg ↗, Parquet ↗ | ✔️ | |
BigQuery | 🟢 Generally available | Table, View, Materialized View | ✔️ | ✔️ |
Databricks | 🟢 Generally available | Table, View, Materialized View | ✔️ | ✔️ |
Google Cloud Storage | 🟢 Generally available | Avro ↗, Delta ↗, Iceberg ↗, Parquet ↗ | ✔️ | |
Snowflake | 🟢 Generally available | Table, View, Materialized View | ✔️ | ✔️ |
An Iceberg catalog is required to load virtual tables backed by an Apache Iceberg table. To learn more about Iceberg catalogs, see the Apache Iceberg documentation ↗. Virtual tables support different catalog options depending on the source being used. The table below highlights the supported catalogs. Refer to the source documentation for more details on how to configure each catalog.
Source | AWS Glue | Object Storage | Unity Catalog |
---|---|---|---|
Amazon S3 | 🟢 Generally available | 🟢 Generally available | 🟢 Generally available |
Azure Data Lake Storage Gen2 (Azure Blob Storage) | 🔴 Not available | 🟢 Generally available | 🟢 Generally available |
Google Cloud Storage | 🔴 Not available | 🟢 Generally available | 🔴 Not available |
Virtual tables are supported as inputs in the below applications and workflows, and as outputs in Pipeline Builder and Code Repositories.
Supported application | Supported workflow | Not supported |
---|---|---|
Data Connection | Configure source Register virtual tables | Agent-based connections |
Contour | Analyze in Contour | Save as dataset |
Ontology | Object creation via Pipeline Builder | Object creation via Ontology Manager |
Data Lineage | View Foundry lineage | |
Pipeline Builder [Beta] | Pipeline input Pipeline output Snapshot builds Incremental builds (append-only) | Streaming builds |
Code Repositories | Python Transforms Java Transforms SQL Transforms Snapshot builds Incremental builds (append-only) |
Note that some source types may not support all these capabilities. Refer to the source-specific documentation for more details. Learn more about how to configure a source when using virtual tables in Code Repositories.
In general, virtual tables can be used to back most common Foundry workflows by either:
Sources supporting virtual tables are set up in the Data Connection application. Select the source that you want to use, then navigate to the Virtual tables tab in the source configuration. Follow the source-documentation and any requirements listed there for using virtual tables.
All supported sources allow you register individual tables from the source system in Foundry. Tabular source types also support bulk registration of multiple virtual tables at once. Some sources additionally support automatic registration, which will periodically register all tables in the source that are accessible to the configured credentials in a designated project.
To register a virtual table, select Create virtual table in the Virtual tables tab in the source. Browse available tables and select the table to register. Unless you choose a different location, the virtual table will be created in the default output folder of the source.
Virtual table bulk registration is in the beta phase of development and may not be available on your enrollment. Functionality may change during active development.
When working with tabular source types such as Databricks, BigQuery, and Snowflake, you will be able to bulk register multiple virtual tables at once. To begin, select one or more external tables from the left panel. Use the right panel to change where your new virtual tables will be saved, or update their names. Note that changing the name of a virtual table in Foundry does not change the table name in the source.
When enabling auto-registration, you create a new Foundry project where virtual tables will be created automatically. The folder hierarchy in this project will mirror the structure of the source system, and be periodically updated as new tables are created in the source. When source tables are deleted, related virtual tables won't be auto-deleted in the project, but accessing them won't load any data.
To enable auto-registration, you must have project creation permissions in Foundry.
The project is managed by Foundry, and users cannot manually create or update resources in it. Virtual tables registered in this project can be imported into other projects for use in workflow development.
Enabling auto-registration allows setting permissions and access to the project, which can later be managed by the project owner using the access sidebar.
When virtual tables are used in Code Repositories, the transforms consuming them will automatically obtain network egress based on the egress policies configured on the source. The credentials configured on the source will necessarily be made available to connect to the source. This is similar behavior to External Transforms.
The following settings must be enabled on the source:
Once a source has been configured and imported into a code repository, virtual tables can be used as inputs to Python Transforms in the same way a dataset would be used, using transforms.api.Input
. Incremental computation has a consistent API to that of datasets and is supported by a subset of sources. Refer to the source-specific documentation for more information.
In general, virtual tables are supported as inputs to Python, SQL, and Java Transforms. Only Python Transforms support creating a new virtual table as a transform output, while SQL and Java Transforms support writing to existing virtual tables.
Learn more about creating new virtual tables via Python Transforms.
The decision to use virtual tables vs. sync to Foundry datasets depends on your architecture goals and the target workflow to be supported. We recommend considering the appropriate integration pattern on a workflow-by-workflow basis. The two approaches can be used in conjunction to complement one another.
Below are some considerations to keep in mind about the potential benefits, drawbacks, and limitations of using virtual tables vs. syncing data to datasets.
Virtual tables provide a number of benefits, including:
Virtual tables may not be the best choice in all circumstances. Some considerations include:
Limitations of virtual tables include:
use_external_systems
decorator are currently not compatible with Virtual Tables. Switch to source-based external_transforms or split your transform into multiple transforms, one that uses Virtual Tables as input and one that uses the use_external_systems
decorator.For queries run directly on virtual tables, compute may be split between Foundry and the source system. The specific behavior depends on the query and the degree of pushdown computation supported by the source system. Refer to the source-specific documentation for more information.