TableTransformInput with added functionality for incremental computation.
The configuration for an incremental input that will be read in batches.
BatchIncrementalConfigurationThe branch of the dataset.
Returns the name of the table’s Spark catalog, intended for use in Spark procedures, if supported by the underlying table type.
Throws: : ValueError: If the underlying table type does not expose a Spark catalog.
Creates a changelog view for the given table from the last processed snapshot ID.
Note: Only supported for Iceberg tables.
If the identifier columns are provided, this creates a identifier-based changelog. This changelog type gives you the last changes performed on rows uniquely identified by the given identifier columns. This is more performant and allows greater flexibility while performing row edits.
Without identifier columns, it creates a net-changes changelog. This changelog type gives you the coalesced changes performed on the rows by cancelling out DELETES and INSERTS over the snapshot range. This leads to a high amount of data shuffling and is slower than an identifier-based changelog.
If this changelog is intended to be used in updating an output table, the identifier columns used when creating this changelog should match the identifier columns used to update the output table.
See the Iceberg create_changelog_view ↗ documentation for more information.
_change_type, _change_ordinal and _commit_snapshot_id columns. For an identifier-based changelog, _change_type can either be INSERT, DELETE, UPDATE_AFTER or UPDATE_BEFORE. For a net-changes changelog, _change_type can either be INSERT or DELETE.DataFrame ↗The column descriptions of the dataset.
Dict[str, str]The column typeclasses of the dataset.
Dict[str, str]Return a pyspark.sql.DataFrame ↗ for the given read mode.
changelog read mode for Iceberg tables returns the changelog view for the given table from the last processed snapshot ID. Unlike the changelog() method, this always creates a net-changes changelog, which is not very performant but supports tables without identifier columns (a list of columns that uniquely identify each row). Note that this read mode is deprecated. Use changelog() instead.
DataFrame ↗The ending transaction of the input dataset.
Construct a FileSystem object for reading from FoundryFS for the given read mode.
Only current, previous and added modes are supported.
FileSystemReturns the full-qualified, catalog-prefixed, Spark V2 identifier of the table, if supported by the underlying table type.
Throws: : ValueError: If the underlying table type does not expose a Spark V2 identifier.
pandas.DataFrame ↗: A pandas dataframe containing the full view of the dataset.
The Compass path of the dataset.
The resource identifier of the dataset.
The starting transaction of the input dataset.