Search documentation
karat

+

K

User Documentation ↗

transforms.api.transform_df

transforms.api.transform_df(output, **inputs)

Register the wrapped compute function as a DataFrame transform.

The transform_df decorator is used to construct a Transform object from a compute function that accepts and returns pyspark.sql.DataFrame objects. Similar to the transform() decorator, the input names become the compute function’s parameter names. However, a transform_df accepts only a single Output spec as a positional argument. The return value of the compute function is also a DataFrame that is automatically written out to the single output dataset.

Copied!
1 2 3 4 5 6 7 8 >>> @transform_df( ... Output('/path/to/output/dataset'), # An unnamed Output spec ... first_input=Input('/path/to/first/input/dataset'), ... second_input=Input('/path/to/second/input/dataset'), ... ) ... def my_compute_function(first_input, second_input): ... # type: (pyspark.sql.DataFrame, pyspark.sql.DataFrame) -> pyspark.sql.DataFrame ... return first_input.union(second_input)
  • Parameters:
    • output (Output) – The single Output spec for the transform.
    • **inputs (Input) – kwargs comprised of named Input specs.