Branch isolation in Foundry's Iceberg catalog

Foundry's Iceberg catalog extends standard Iceberg so that each branch can independently track its own current schema, default partition spec, and default sort order. These operations are isolated to the branch they run against and do not affect the table's main (master) branch or any other branch. This means Iceberg tables match the behavior Foundry users already expect from catalog datasets, where work on a branch is fully isolated, while still remaining compliant with the Iceberg REST catalog specification ↗.

This page explains how branch isolation works, how it extends default Iceberg behavior, and how it is automatically applied when using Iceberg inside Foundry. It also covers how to opt into this behavior when using external Iceberg clients.

The branch isolation described on this page applies when running jobs through Foundry's build system. When writing to Foundry's Iceberg catalog from an external client, standard Iceberg branching behavior applies by default. Alternatively, you can opt into Foundry's branch isolation. See Using branches with external clients for details.

Branch isolation in Foundry

When running a build in Foundry, Foundry provides branch isolation for Iceberg tables. This means Foundry automatically isolates schema, partition spec, and sort order changes to the associated branch, ensuring these branch-scoped changes do not unintentionally modify other branches. Users do not need to take any action to configure this; it is the default behavior for writing to Iceberg tables using Foundry's build system.

Note that the branch-scoped behavior described here for schemas also applies to partition spec and sort order, which (like schemas) can be evolved independently on a branch in Foundry. The branch-scoped metadata is stored as additional properties in your table's Iceberg metadata file. For example: "foundry.branch.feature-branch.schema-id" : "3".

Standard Iceberg behavior

In standard Iceberg, the schema is a table-level concept shared across all branches, where there is a single current schema tracked for the table across all branches. See Apache Iceberg documentation ↗ for details. This means that a modification to the table's schema on any branch will change the schema for main and every other branch.

Having a single schema across all branches makes it difficult to develop or validate schema changes in isolation on a branch before merging into production. To test schema changes on a branch, you would need to modify your table's global schema. Any such change, regardless of which branch it originates from, would immediately affect all consumers of the table on main.

Foundry's behavior

Foundry's Iceberg catalog solves this by automatically tracking a branch-scoped schema, partition spec, and sort order. A job running on a branch (for example, my-branch) sees the schema that that branch is tracking, not the schema on main. Schema changes made within that job on my-branch are stored for the associated branch (in this case, my-branch) and leave the current schema on main entirely unchanged.

The following table summarizes what is branch-scoped versus table-scoped and shared across all branches:

MetadataFoundry-extended branchingDefault Iceberg branching
Schema historyTable-scopedTable-scoped
Table propertiesTable-scopedTable-scoped
Current snapshotBranch-scopedBranch-scoped
Current schemaBranch-scopedTable-scoped
Default partition specBranch-scopedTable-scoped
Default sort orderBranch-scopedTable-scoped

Note that in Foundry-extended branching while the current schema is branch-scoped, the history of schemas is table-scoped. This means that when a branch adds a new column, that schema is added to the table's shared schema list, but is not set as the current schema on main.

How branch context is established

When a job runs on a branch in Foundry's build system, Foundry automatically injects the branch context into all catalog operations for that job. You do not need to configure anything as a user. Your pipelines read inputs and write outputs against the branch without any additional code.

Examples

The examples below illustrate the two most common scenarios: creating a new table on a branch, and evolving the schema of an existing table on a branch.

Example: Creating a new table on a branch

Suppose you are building a new table on a development branch feature/customer-scores. Your transform creates an output Iceberg table and writes initial data to it.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import pyarrow as pa from transforms.api import transform from transforms.tables import TableOutput, TableLightweightOutput @transform.using(customer_scores=TableOutput("/<path>/customer_scores")) def compute(customer_scores: TableLightweightOutput): data = pa.table( { "customer_id": pa.array([1, 2, 3], type=pa.int64()), "score": pa.array([0.91, 0.74, 0.88], type=pa.float64()), } ) customer_scores.iceberg().write(data)

Here is the state of the customer_scores table after writing to it on the feature branch:

Foundry-extended branchingDefault Iceberg branching
Schema on feature/customer-scoresNew schemaNew schema
Schema on mainEmptyNew schema
Data on feature/customer-scoresExistsExists
Data on mainNoneNone

With Foundry's branch isolation: The branch owns the initial schema and snapshot. The table exists on main, but has an empty schema with no data.

With default Iceberg branches: The branch's schema appears on main.

Example: Evolving the schema of an existing table on a branch

Suppose you now have the table customer_scores already in production. You are testing some potential changes to the pipeline and want to add a new column and modify an existing column type, as part of a job running on branch feature/scoring-v2, without affecting what downstream production consumers see on main.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import pyarrow as pa from transforms.api import transform from transforms.tables import TableOutput, TableLightweightOutput @transform.using( customer_scores=TableOutput("/<path>/customer_scores") ) def compute(customer_scores: TableLightweightOutput): data = pa.table( { "customer_id": pa.array(["c1", "c2", "c3"], type=pa.string()), "score": pa.array([0.91, 0.74, 0.88], type=pa.float64()), "score_v2": pa.array([0.95, 0.80, 0.92], type=pa.float64()), } ) customer_scores.iceberg().write(data)

Here is the state of the table after the job runs on feature/scoring-v2:

Foundry-extended branchingDefault Iceberg branching
Schema on feature/scoring-v2New schemaNew schema
Schema on mainOld schemaNew schema
Data on feature/scoring-v2ExistsExists
Data on mainUnchangedUnchanged

With Foundry's branch isolation: The new column and type change are isolated to the branch. main retains its original schema, so downstream consumers are unaffected.

With default Iceberg branches: The branch and main share a single current schema, so the write either fails schema validation or forces you to update main's schema, exposing the change to downstream consumers.

Using branches with external clients

Foundry's Iceberg catalog is fully compliant with the Iceberg REST catalog specification ↗.

When accessing Foundry's Iceberg catalog from an external tool, the catalog's behavior depends on whether the X-PLTR-Branch header is set. Set the X-PLTR-Branch header when working with branches from external clients to ensure consistent Foundry branching behavior.

When the X-PLTR-Branch header is set:

  • All read and write operations are automatically scoped to the branch.
  • The branch set in the X-PLTR-Branch header is used by default when no branch qualifier is specified (or if main is referenced). If a specific (non-main) branch is provided in the request, then that branch will be used rather than the header branch. For example, if the header is set to featurebranch1, then SELECT * FROM table and SELECT * FROM table.branch_main will reference featurebranch1, but SELECT * FROM table.branch_featurebranch2 will reference featurebranch2.

When the X-PLTR-Branch header is not set, default Iceberg behavior applies:

  • Snapshot reads will use the schema associated with the referenced snapshot. For example, SELECT * FROM table VERSION AS OF <snapshot_id>.
  • Branch reads will use main's schema (table-scoped). For example, SELECT * FROM table.branch_my_branch.
  • Writes are validated against main's schema (table-scoped); writes that do not conform fail rather than adapting to the branch.
  • Schema changes apply to the table-scoped schema on main, and are not isolated to the branch.

Setting the X-PLTR-Branch header externally

To set the branch for a PyIceberg client, add the header to your catalog properties:

Copied!
1 2 3 4 5 6 7 8 9 10 from pyiceberg.catalog import load_rest catalog = load_rest( "foundry", { "uri": "https://your.foundry/iceberg", "token": "eyJwb...", "header.X-PLTR-Branch": "my-branch" }, )

For Spark, pass the header as a catalog configuration property:

Copied!
1 2 3 4 5 6 7 spark = ( SparkSession.builder ... .config("spark.sql.catalog.foundry.header.X-PLTR-Branch", "my-branch") ... .getOrCreate() )

For more information on connecting external clients to Foundry's catalog, see Authenticating Iceberg clients and Example: Local Jupyter.

Limitations and considerations

Note the following considerations when working with branch isolation.

A single branch applies to all tables in a job

When a job runs on a branch, that branch context applies uniformly to all tables the job reads from and writes to. It is not currently possible to configure different branches for different input or output tables within the same job. Foundry validates that all resolved input branches match the job's branch, and raises an error if they do not.

Schemas and partition specs referenced by a branch are protected

If a branch is tracking a particular schema or partition spec, that schema or spec cannot be dropped from the table even by operations running on main. This prevents branches from being silently broken by cleanup or maintenance operations running concurrently. The protection is lifted automatically when the branch is deleted.

Iceberg branch merge procedures do not respect branch isolation

Iceberg's native branch merge procedures, such as cherry-pick and fast-forward, operate on snapshot references only and are not aware of Foundry's branch-scoped schema properties. If you use these procedures to merge a branch into main, the branch's schema will not be carried over, and main will retain its existing schema. To ensure schema changes on a branch are reflected on main, merge your code and rebuild on main through Foundry's build system.

Excluded REST endpoints

Some privileged REST endpoints cannot be called from Foundry branches, notably:

  • dropNamespace
  • updateNamespace
  • dropTable
  • renameTable
  • commitTransaction
  • createView
  • dropView
  • renameView

Version compatibility

Branch isolation is available from transforms-tables library version 0.1211.0 and greater.