Transformstransforms-pythontransforms.apiIncrementalTransformOutput

transforms.api.IncrementalTransformOutput

class transforms.api.IncrementalTransformOutput(toutput, prev_txrid=None, mode='replace')

TransformOutput with added functionality for incremental computation.

abort()

Aborts all work on this output. Any work done on writers from this output before or after calling this method will be ignored.

History

Added in version 1.7.0.

property batch_incremental_configuration

The configuration for an incremental input that will be read in batches.

Type: BatchIncrementalConfiguration

property branch

The branch of the dataset.

Type: str ↗

property column_descriptions

The column descriptions of the dataset.

Type: Dict[str, str]

property column_typeclasses

The column typeclasses of the dataset.

Type: Dict[str, str]

dataframe(mode='current', schema=None)

Return a pyspark.sql.DataFrame ↗ for the given read mode.

Parameters:
- mode (str ↗ , optional) – The read mode, one of added, current, or previous. Defaults to current.
- schema (pyspark.types.StructType , optional) – A PySpark schema to use when constructing an empty DataFrame. Required when using the previous read mode if there is no previous transaction.
Returns: The DataFrame for the dataset.
Return type: DataFrame ↗
Raises: ValueError – If no schema is passed when using previous mode, and there is no previous transaction.

property end_transaction_rid

The ending transaction of the input dataset.

Type: str ↗

filesystem(mode='current')

Construct a FileSystem object for writing to FoundryFS.

Parameters: mode (str ↗ , optional) – The read mode, one of added, current, or previous. Defaults to current. Only the current filesystem is writable.

classmethod from_transform_output(instance, delegate)

Sets fields in a TransformOutput instance to the values from the delegate TransformOutput.

pandas(mode='current', schema=None)

pandas.DataFrame ↗: A pandas dataframe for the given read mode.

property path

The Compass path of the dataset.

Type: str ↗

property rid

The resource identifier of the dataset.

Type: str ↗

set_mode(mode)

Change the write mode of the dataset.

Parameters: mode (str ↗) – The write mode, one of replace, modify, or append. In modify mode, anything written is appended to the dataset. In replace mode, anything written replaces the dataset. In append mode, anything written is appended to the dataset and will not override existing files.

The write mode cannot be changed after data has been written.

History

Added in version 1.61.0.

property start_transaction_rid

The starting transaction of the input dataset.

Type: str ↗

write_dataframe(df, partition_cols=None, bucket_cols=None, bucket_count=None, sort_by=None, output_format=None, options=None, column_descriptions=None, column_typeclasses=None)

Write the given DataFrame ↗ to the dataset.

Parameters:
- df (pyspark.sql.DataFrame ↗) – The PySpark DataFrame to write.
- partition_cols (List [str ↗ ] , optional) – Column partitioning to use when writing data.
- bucket_cols (List [str ↗ ] , optional) – The columns by which to bucket the data. Must be specified if bucket_count is given.
- bucket_count (int ↗ , optional) – The number of buckets. Must be specified if bucket_cols is given.
- sort_by (List [str ↗ ] , optional) – The columns by which to sort the bucketed data.
- output_format (str ↗ , optional) – The output file format, defaults to parquet.
- options (dict ↗ , optional) – Extra options to pass through to org.apache.spark.sql.DataFrameWriter#option(String, String).
- column_descriptions (Dict [str ↗ , str ↗ ] , optional) – Map of column names to their string descriptions. This map is intersected with the columns of the DataFrame, and must include descriptions no longer than 800 characters.
- column_typeclasses (Dict [str ↗ , List *[*Dict [str ↗ , str ↗ ] ] ] , optional) – Map of column names to their column typeclasses. Each typeclass in the List is a Dict[str, str], where only two keys are valid; name and kind. Each maps to the corresponding string the user wants, up to a maximum of 100 characters. An example column_typeclasses value would be {"my_column": [{"name": "my_typeclass_name", "kind": "my_typeclass_kind"}]}.

write_pandas(pandas_df)

Write the given pandas.DataFrame ↗ to the dataset.

Parameters: pandas_df (pandas.DataFrame ↗) – The DataFrame to write.

←

PREVIOUSIncrementalTransformInput

NEXTInput

→