The output object passed into user code at runtime for incremental ContainerTransform objects.
The aim is to mimic a subset of the transforms.api.IncrementalTransformOutput API, while providing access to the underlying foundry.transforms.Dataset.
The alias of the dataset this parameter is associated with.
A PyArrow table containing the view of the dataset.
current, previous, or added. Defaults to current.The branch of the dataset this parameter is associated with.
A pandas DataFrame containing the view of the dataset.
current, previous, or added. Defaults to current.Access the filesystem.
Construct a FoundryDataSidecarFileSystem object for accessing the dataset’s files directly.
A pandas DataFrame containing the view of the dataset.
current, previous, or added. Defaults to current.Download the dataset’s underlying files and return a path to them.
current, previous, or added. Defaults to current. This argument is only applicable when @incremental is added and v2_semantics is True.Returns a virtual object store path to a bucket that will be mapped into the output transaction. This does not point directly at a bucket in cloud storage, but rather at a local S3 proxy to allow query engines to perform async, optimized IO against the data.
Return the path for the dataset’s files to be used with write_table.
A Polars DataFrame or LazyFrame containing the view of the dataset.
Method to finalize a dataset after uploading raw Parquet files. This will infer and upload a Foundry Schema from the uploaded Parquet (overwriting it if it already exists), and update column description metadata on the dataset.
This method must be called after one or more Parquet files have been uploaded to the output dataset so that a schema can be inferred. This method will throw an error if it is called before a successful file upload.
Read the local version of the dataset as a Polars LazyFrame.
This method is used when computing expectations on the dataset. It must happen before the dataset is committed, since expectations can abort the build if failed.
The unique resource identifier of the dataset this parameter is associated with.
Set the mode for the output dataset.
Parameters: mode (str ↗) –
The write mode, one of replace, modify, or append. In modify mode, anything written is appended to the dataset, this may also override existing files. In append mode, anything written is appended to the dataset, and will not override existing files. In replace mode, anything written replaces the dataset.
The write mode cannot be changed after data is written.
The transaction on the output dataset.
Write a DataFrame of any supported type to the dataset.
For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.
pd.DataFrame, pa.Table, pl.DataFrame, pl.LazyFrame, or pathlib.Path with the data to upload.column_descriptions instead.Write the given pandas.DataFrame ↗ to the dataset.
For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.
Write a pandas DataFrame, Arrow Table, Polars DataFrame or LazyFrame, to a Foundry dataset.
This has three operations: uploading the df itself to the dataset, inferring a schema and putting it to the dataset (overwriting it if it already exists), and updating column description metadata. To update only the metadata without uploading data, use put_metadata() instead.
For compatibility reasons, both column_description and column_descriptions are accepted. However, only one of them can be provided at the same time.
pd.DataFrame, pa.Table, pl.DataFrame, pl.LazyFrame, duckdb.DuckDBPyRelation or pathlib.Path with the data to upload, or None to just infer a schema from data previously written in the transaction.column_descriptions instead.