Transformstransforms-pythontransforms.apiIncrementalLightweightOutput

transforms.api.IncrementalLightweightOutput

class transforms.api.IncrementalLightweightOutput(alias, rid, branch=None)

The output object passed into user code at runtime for incremental ContainerTransform objects.

The aim is to mimic a subset of the transforms.api.IncrementalTransformOutput API, while providing access to the underlying foundry.transforms.Dataset.

property alias

The alias of the dataset this parameter is associated with.

arrow(mode='current', schema=None)

A PyArrow table containing the view of the dataset.

Parameters:
- mode (str ↗) – The read mode, one of current, previous, or added. Defaults to current.
- schema – The schema to read empty datasets with. Only used if the dataset is empty.

property branch

The branch of the dataset this parameter is associated with.

dataframe(mode='current', schema=None)

A pandas DataFrame containing the view of the dataset.

Parameters:
- mode (str ↗) – The read mode, one of current, previous, or added. Defaults to current.
- schema – The schema to read empty datasets with. Only used if the dataset is empty.

filesystem()

Access the filesystem.

Construct a FoundryDataSidecarFileSystem object for accessing the dataset’s files directly.

pandas(mode='current', schema=None)

A pandas DataFrame containing the view of the dataset.

Parameters:
- mode (str ↗) – The read mode, one of current, previous, or added. Defaults to current.
- schema – The schema to read empty datasets with. Only used if the dataset is empty.

path(mode='current')

Download the dataset’s underlying files and return a path to them.

Parameters: mode (str ↗) – The read mode, one of current, previous, or added. Defaults to current. This argument is only applicable when @incremental is added and v2_semantics is True.

property path_for_object_store_write_table

Returns a virtual object store path to a bucket that will be mapped into the output transaction. This does not point directly at a bucket in cloud storage, but rather at a local S3 proxy to allow query engines to perform async, optimized IO against the data.

property path_for_write_table

Return the path for the dataset’s files to be used with write_table.

polars(lazy=False, mode='current', schema=None)

A Polars DataFrame or LazyFrame containing the view of the dataset.

Parameters:
- lazy (bool ↗ , optional) – Whether to return a LazyFrame or DataFrame. Defaults to False.
- mode (str ↗) – The read mode, one of current, previous, or added. Defaults to current.
- schema – The schema to read empty datasets with. Only used if the dataset is empty.

put_metadata(column_descriptions=None)

Method to finalize a dataset after uploading raw Parquet files. This will infer and upload a Foundry Schema from the uploaded Parquet (overwriting it if it already exists), and update column description metadata on the dataset.

This method must be called after one or more Parquet files have been uploaded to the output dataset so that a schema can be inferred. This method will throw an error if it is called before a successful file upload.

Parameters: column_descriptions (Dict [str ↗ , str ↗ ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.

read_unstaged_dataset_as_polars_lazy()

Read the local version of the dataset as a Polars LazyFrame.

This method is used when computing expectations on the dataset. It must happen before the dataset is committed, since expectations can abort the build if failed.

property rid

The unique resource identifier of the dataset this parameter is associated with.

set_mode(mode)

Set the mode for the output dataset.

Parameters: mode (str ↗) –

The write mode, one of replace, modify, or append. In modify mode, anything written is appended to the dataset, this may also override existing files. In append mode, anything written is appended to the dataset, and will not override existing files. In replace mode, anything written replaces the dataset.

The write mode cannot be changed after data is written.

property transaction_rid

The transaction on the output dataset.

Type: str ↗

write_dataframe(df, column_description=None, column_descriptions=None)

Write a DataFrame of any supported type to the dataset.

For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.

Parameters:
- df – pd.DataFrame, pa.Table, pl.DataFrame, pl.LazyFrame, duckdb.DuckDBPyRelation or pathlib.Path with the data to upload.
- column_description (Dict [str ↗ , str ↗ ] , optional) – Deprecated, use column_descriptions instead.
- column_descriptions (Dict [str ↗ , str ↗ ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.
Returns: None

write_pandas(df, column_description=None, column_descriptions=None)

Write the given pandas.DataFrame ↗ to the dataset.

For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.

write_table(df, column_description=None, column_descriptions=None)

Write a pandas DataFrame, Arrow Table, Polars DataFrame or LazyFrame, to a Foundry dataset.

This has three operations: uploading the df itself to the dataset, inferring a schema and putting it to the dataset (overwriting it if it already exists), and updating column description metadata. To update only the metadata without uploading data, use put_metadata() instead.

For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.

Parameters:
- df – pd.DataFrame, pa.Table, pl.DataFrame, pl.LazyFrame, duckdb.DuckDBPyRelation or pathlib.Path with the data to upload, or None to just infer a schema from data previously written in the transaction.
- column_description (Dict [str ↗ , str ↗ ] , optional) – Deprecated, use column_descriptions instead.
- column_descriptions (Dict [str ↗ , str ↗ ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.
Returns: None

←

PREVIOUSIncrementalLightweightInput

NEXTIncrementalTableTransformInput

→