Foundry transforms make it easy to build robust, scalable Change Data Capture (CDC) pipelines leveraging Apache Iceberg’s changelog and snapshot features. You can use CDC in transforms to efficiently process new, updated, or deleted records since the last pipeline run, enabling efficient, incremental, low-latency data movement and processing.
In addition to existing support for append-only incremental transforms on datasets, Foundry now offers full CDC processing support for Iceberg tables as part of the transforms-tables
library. This capability leverages Iceberg’s changelog views ↗ to retrieve inserts, updates, and deletes between Iceberg table snapshots.
Using CDC with Iceberg tables offers a number of benefits including:
You can use the Palantir transforms API to read and write changelogs from Iceberg tables:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13
from transforms.api import incremental, transform from transforms.tables import TableInput, TableOutput @incremental(v2_semantics=True) @transform( source=TableInput("<PATH>/your_iceberg_input_table"), output=TableOutput("<PATH>/your_iceberg_output_table"), ) def cdc_transform(ctx, source, output): # Read only the changes since the last run changelog_df = source.changelog(["your_primary_key"]) # Apply your business logic to the changelog output.apply_changelog(changelog_df, ["your_primary_key"])
For more detailed guides and examples, see the next sections with changelog code examples and a technical primer, including a walkthrough of an example with no primary keys in the input.