Documentation

Data connectivity & integrationPythonMedia and unstructured dataMedia set transforms API reference

Media set transforms API

The media set transforms API provides methods for transforming media sets in Python transforms. This API enables various operations on media sets, such as extracting text using optical character recognition (OCR), resizing images, converting documents to images, and more.

Methods by schema type

Schema type	Available methods
Image	`resize` • `crop` • `binarize` • `rotate` • `grayscale` • `equalize` • `rayleigh` • `convert_image_to_document` • `generate_image_embeddings` • `tile` • `ocr` • `encrypt` • `decrypt`
Document	`ocr` • `extract_raw_text` • `convert_document_to_images` • `slice_document` • `extract_form_fields` • `extract_table_of_contents` • `get_pdf_page_dimensions`
Video	`extract_audio` • `extract_scene_frames` • `chunk` • `extract_first_frame` • `extract_frames_at_timestamp` • `transcode` • `get_scene_frame_timestamps`
Audio	`transcribe` • `chunk` • `transcode` • `get_waveform`
DICOM	`render_dicom_layer`
Spreadsheet	`extract_content_from_spreadsheets`
Multimodal	`filter_to`

Getting started

To use the media set transforms API, access the transform functionality from a media set input as shown in the example below.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from transforms.mediasets.inputs import MediaSetInputParam
from transforms.api import transform, Output, TransformOutput
from transforms.mediasets import MediaSetInput

@transform(
    media_input=MediaSetInput("/path/to/media_set"),
    dataset_output=Output("/path/to/output")
)
def compute(ctx, media_input: MediaSetInputParam, dataset_output: TransformOutput):
    # Create a MediaSetInputTransform instance
    transform = media_input.transform()

    # Apply transformations
    result = transform.ocr()

    # Write the result to output
    dataset_output.write_dataframe(result)

transform()

Copied!1
def transform(self, deduplicate_by_path=True):

Returns a MediaSetInputTransform instance. This class enables fluent method chaining for media transformations on a media set input.

Parameters:

deduplicate_by_path (bool, optional): If True, only the most recent item at each path will be included. Defaults to True.

Returns:

MediaSetInputTransform: A user-facing class that provides methods for media set transformations.

Example:

Copied!1
df = media_set.transform().ocr()

API reference

ocr()

Extracts text from PDFs or images using OCR and returns the extracted text as a string. Recommended for images and scanned documents.

Parameters:

languages (list[str]): List of languages to be used for OCR. Defaults to English. All valid codes can be found in the Tesseract documentation ↗ under languages.
scripts (Optional[list[str]]): List of scripts to be used for OCR. Defaults to None. All valid codes can be found in the Tesseract documentation ↗ under scripts.
media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
start_page (Optional[int]): The zero-indexed start page for OCR. Only applicable for PDF media sets. Defaults to 0 (the first page).
end_page (Optional[int]): The zero-indexed end page for OCR (exclusive). Only applicable for PDF media sets. Defaults to None (the final page).
return_structure (str): item_per_row or page_per_row. Only applicable to transformations on an entire PDF media set. Defaults to item_per_row.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame, list of strings, or a single string.
- str: Transformations on a single image.
- list[str]: Transformations on a single PDF.
- DataFrame: Transformations on the entire media set.
  - For PDF (item_per_row): Columns are media_item_rid, path, media_reference, extracted_text (list[str]).
  - For PDF (page_per_row) or image sets: Columns are media_item_rid, path, media_reference, page_number, extracted_text (str).

Example:

Copied!1
2
df = media_set.transform().ocr()
dataset_output.write_dataframe(df)

extract_raw_text()

Extracts raw text from PDFs. Recommended for documents that have been electronically generated.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
start_page (Optional[int]): The zero-indexed start page for text extraction. Defaults to 0 (the first page).
end_page (Optional[int]): The zero-indexed end page for text extraction (exclusive). Defaults to None (the final page).
return_structure (str): item_per_row or page_per_row. Only applicable to transformations on an entire PDF media set. Defaults to item_per_row.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or list of strings.
- str: Applicable to transformations on a single image.
- list[str]: Applicable to transformations on a single PDF.
- DataFrame: Applicable to transformations on the entire media set.
  - PDF (item_per_row): Columns are media_item_rid, path, media_reference, extracted_text (list[str]).
  - PDF (page_per_row) or image sets: Columns are media_item_rid, path, media_reference, page_number, extracted_text (str).

Example:

Copied!1
2
df = media_set.transform().extract_raw_text()
dataset_output.write_dataframe(df)

resize()

Resizes images to the specified dimensions.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
width (Optional[int]): The target width for the resized images. Defaults to 1024. Must be provided if height is not provided.
height (Optional[int]): The target height for the resized images. Defaults to 1024. Must be provided if width is not provided.
maintain_aspect_ratio (bool): Specifies whether to maintain the original aspect ratio of the images. If True, images will be resized to fit within the specified dimensions while preserving the aspect ratio. Defaults to True.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the resize transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().resize()
image_output.write(transform)

convert_document_to_images()

Converts document pages to images with the specified dimensions.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
start_page (Optional[int]): The zero-indexed start page for conversion. Defaults to 0 (the first page).
end_page (Optional[int]): The zero-indexed end page for conversion (exclusive). Defaults to None (the last page).
width (Optional[int]): The width of the output images. Defaults to 1024.
height (Optional[int]): The height of the output images. Defaults to 1024.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the document to image transformation, allowing for further transformations.

Example:

Copied!1
2
transform = pdf_input.transform().convert_document_to_images()
image_output.write(transform)

slice_document()

Slices documents to a specified range of pages.

Parameters:

start_page (int): The zero-indexed start page for the slice operation.
end_page (int): The zero-indexed end page for the slice operation (exclusive).
media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
strictly_enforce_end_page (bool): Specifies behavior if the end_page exceeds the number of pages in the document. If True, an error will be raised. If False, the last page of the document is used instead. Defaults to True.

Returns:

An instance of MediaSetInputTransform containing the slice transformation, allowing for further transformations.

Example:

Copied!1
2
transform = pdf_input.transform().slice_document(0, 5)
pdf_output.write(transform)

tile()

Generates Slippy map tiles (EPSG 3857) from images. Only supports geo-embedded images in TIFF or NITF format, with a maximum size of 100 million square pixels.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
zoom (Union[int, Column]): Zoom level of the tile. Must be a non-negative integer. At zoom level 0, the entire world fits into a single tile. Each increment doubles the spatial resolution and quadruples the number of tiles. Defaults to 0.
x (Union[int, Column]): Tile column index at the specified zoom level. Increases from west to east. Valid range: 0 <= x < 2**zoom. Defaults to 0.
y (Union[int, Column]): Tile row index at the specified zoom level. Increases from north to south. Valid range: 0 <= y < 2**zoom. Defaults to 0.
df (Optional[DataFrame]): Specifies the DataFrame to join on when passing column inputs. Tiling will only be applied to media items present in both the input media set and provided DataFrame. Must be provided together with the on parameter.
on (Optional[Literal["media_item_rid", "media_reference"]]): The column name to join on when df is specified. This aligns the tiling operation with the correct media item.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the tile transformation, allowing for further transformations.

Example:

Copied!1
2
3
4
5
6
7
8
9
10
# Dynamically select tiling parameters from the input_df columns.
# Only tiles media items in both the media set and provided DataFrame.
transform1 = image_input.transform().tile(input_df.zoom, input_df.x, input_df.y, df=input_df, on="media_item_rid")

# All tiles will be generated with the same parameters.
# Generates a tile for all media items in the media set.
transform2 = image_input.transform().tile(zoom=2, x=1, y=1)

# Write the transformation to output media set
image_output.write(transform)

encrypt()

Encrypts specified regions of images using the provided cipher image key.

Parameters:

polygons (Union[list[api.Polygon], Column]): The regions to encrypt, specified as polygons.
cipher_license_rid (Union[str, Column]): The cipher license RID to use for encryption.
media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
df (Optional[DataFrame]): Specifies the DataFrame to join on when passing column inputs. Encryption will only be applied to media items present in both the input media set and provided DataFrame. Must be provided together with the on parameter.
on (Optional[Literal["media_item_rid", "media_reference"]]): The column name to join on when df is specified. This aligns the encryption operation with the correct media item.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the encrypt transformation, allowing for further transformations.

Example:

Copied!1
2
3
4
polygon = [api.Coordinate(x=10, y=10), api.Coordinate(x=100, y=10),
           api.Coordinate(x=100, y=100), api.Coordinate(x=10, y=100)]
transform = image_input.transform().encrypt([polygon], "cipher_license_rid_123", media_item_rid="rid1")
image_output.write(transform)

decrypt()

Decrypts specified regions of images using the provided cipher image key.

Parameters:

polygons (Union[list[api.Polygon], Column]): The regions to decrypt, specified as a list of polygons.
cipher_license_rid (Union[str, Column]): The resource identifier for the cipher license to use for decryption.
media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
df (Optional[DataFrame]): Specifies the DataFrame to join on when passing column inputs. Decryption will only be applied to media items present in both the input media set and provided DataFrame. Must be provided together with the on parameter.
on (Optional[Literal["media_item_rid", "media_reference"]]): The column name to join on when df is specified. This aligns the decryption operation with the correct media item.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the decrypt transformation, allowing for further transformations.

Example:

Copied!1
2
3
4
polygon = [api.Coordinate(x=10, y=10), api.Coordinate(x=100, y=10),
           api.Coordinate(x=100, y=100), api.Coordinate(x=10, y=100)]
transform = image_input.transform().decrypt([polygon], "cipher_license_rid_123", media_item_rid="rid1")
image_output.write(transform)

crop()

Crops images using specified dimensions and offsets.

Parameters:

width (Union[int, Column]): The width of the cropped image.
height (Union[int, Column]): The height of the cropped image.
x_offset (Union[int, Column]): The x-coordinate of the top-left corner of the crop area. Defaults to 0.
y_offset (Union[int, Column]): The y-coordinate of the top-left corner of the crop area. Defaults to 0.
media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
df (Optional[DataFrame]): Specifies the DataFrame to join on when passing column inputs. Cropping will only be applied to media items present in both the input media set and provided DataFrame. Must be provided together with the on parameter.
on (Optional[Literal["media_item_rid", "media_reference"]]): The column name to join on when df is specified. This aligns the cropping operation with the correct media item.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the crop transformation, allowing for further transformations.

Examples:

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# All media items will be cropped with the same parameters.
# Crops all media items in the media set.
transform = image_input.transform().crop(100, 100, 10, 10)
image_output.write(transform)

# Dynamically select cropping parameters from the input_df columns.
# Only crops media items in both the media set and provided DataFrame.
transform1 = image_input.transform().crop(input_df.x2 - input_df.x1, input_df.y2 - input_df.y1,
    input_df.x1, input_df.y1, df=input_df, on="media_item_rid")

# Width and height are dynamically selected from the input_df columns, while the offsets are static.
# Only crops media items in both the media set and provided DataFrame.
transform2 = image_input.transform().crop(30, 50,
    input_df.x1, input_df.y1, df=input_df, on="media_item_rid")

# All media items will be cropped with the same parameters.
# Only crops media items in both the media set and provided DataFrame.
transform3 = image_input.transform().crop(30, 50,
    20, 60, df=input_df, on="media_item_rid")

# All media items will be cropped with the same parameters.
# Crops all media items in the media set.
transform4 = image_input.transform().crop(30, 50, 20, 60)

# Write the transformation to output media set
image_output.write(transform1)

binarize()

Converts images to binary using the specified threshold.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
threshold (Optional[int]): Values above or equal to the threshold will be assigned a value of 255 and values below will be assigned a value of 0. Defaults to computing the threshold based on the input image.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the binarize transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().binarize(threshold=150)
image_output.write(transform)

rotate()

Rotates images by the specified angle.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
angle (Literal["DEGREE_90", "DEGREE_180", "DEGREE_270"]): The angle to rotate the images. Defaults to DEGREE_90.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the rotate transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().rotate(angle="DEGREE_180")
image_output.write(transform)

grayscale()

Converts images to grayscale.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the grayscale transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().grayscale()
image_output.write(transform)

equalize()

Improves the clarity of low-contrast images by performing histogram equalization on grayscale images.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the equalize transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().equalize()
image_output.write(transform)

rayleigh()

Adjusts the grayscale intensity values so the image's histogram (the distribution of pixel brightness) matches the Rayleigh distribution (roughly a bell curve that is always negative). This can improve clarity in low-contrast images.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
sigma (float): The scaling parameter for the Rayleigh distribution. Must be a floating point numeral between 0 and 1. Defaults to 0.5.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the rayleigh transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().rayleigh(sigma=0.7)
image_output.write(transform)

convert_image_to_document()

Converts images to PDFs.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.

Returns:

An instance of MediaSetInputTransform containing the convert image to document transformation, allowing for further transformations.

Example:

Copied!1
2
transform = image_input.transform().convert_image_to_document()
pdf_output.write(transform)

transcode()

Transcodes audio or video to the specified format.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
encode_format (Optional[str]): Specifies the format in which the output media will be encoded. Defaults to MP4 for video and MP3 for audio.

Returns:

An instance of MediaSetInputTransform containing the transcode transformation, allowing for further transformations.

Example:

Copied!1
2
transform = video_input.transform().transcode(encode_format="mov")
video_output.write(transform)

extract_audio()

Extracts audio from video files.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
output_format (str): The format of the output audio (e.g., mp3, wav). Defaults to mp3.

Returns:

An instance of MediaSetInputTransform containing the audio extraction transformation, allowing for further transformations.

Example:

Copied!1
2
transform = video_input.transform().extract_audio(output_format="wav")
audio_output.write(transform)

extract_scene_frames()

Extracts all scene frames from videos as images. A scene frame is a video frame that marks the beginning of a new scene or a significant visual transition in the video content.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
scene_sensitivity (Literal["MORE_SENSITIVE", "STANDARD", "LESS_SENSITIVE"]): The sensitivity level for scene detection. Defaults to STANDARD.
output_format (str): Specifies the encoding format for extracted frames. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the scene frame extraction transformation. The output images for each video will be stored in a TAR archive file.

Example:

Copied!1
2
transform = video_input.transform().extract_scene_frames(scene_sensitivity="MORE_SENSITIVE")
multimodal_output.write(transform)

chunk()

Chunks audio or video files into smaller segments of the specified duration.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
chunk_duration_milliseconds (int): The duration of each chunk in milliseconds. Defaults to 10000 (10 seconds). Must be a positive integer.
output_format (Optional[str]): The format of the output audio chunks. Defaults to MP4 for video and and TS for audio. Note that audio only supports TS as output format.

Returns:

An instance of MediaSetInputTransform containing the chunking transformation, allowing for further transformations.

Example:

Copied!1
2
transform = video_input.transform().chunk(chunk_duration_milliseconds=5000)
video_output.write(transform)

extract_first_frame()

Extracts the first full scene frame from videos as an image with the specified dimensions, or the original dimensions if not provided.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
width (Optional[int]): The target width for the extracted frames. If None, width will be scaled based on the provided height. Defaults to None.
height (Optional[int]): The target height for the extracted frames. If None, height will be scaled based on the provided width. Defaults to None.
output_format (str): The format of the output images. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the first frame extraction transformation, allowing for further transformations.

Example:

Copied!1
2
transform = video_input.transform().extract_first_frame(width=800, height=600)
image_output.write(transform)

extract_frames_at_timestamp()

Extracts frames from videos at a specified timestamp, using the specified dimensions, or the original dimensions if not provided.

Parameters:

timestamps (Union[float, Column]): The timestamp in seconds at which to extract frames.
media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
df (Optional[DataFrame]): Specifies the DataFrame to join on when passing column inputs. Frame extraction will only be applied to media items present in both the input media set and provided DataFrame. Must be provided together with the on parameter.
on (Optional[Literal["media_item_rid", "media_reference"]]): The column name to join on when df is specified. This aligns the frame extraction operation with the correct media item.
width (Optional[int]): The target width for the extracted frames. If None, scales width based on the provided height. Defaults to None.
height (Optional[int]): The target height for the extracted frames. If None, scales height based on provided width. Defaults to None.
output_format (str): The format of the output images. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the frame extraction transformation, allowing for further transformations.

Example:

Copied!1
2
3
4
5
6
7
8
9
# Dynamically select timestamp parameters from the input_df columns.
# Only extracts frames from media items in both the media set and provided DataFrame.
transform1 = video_input.transform().extract_frames_at_timestamp(input_df.timestamp, df=input_df, on="media_item_rid")

# All frames will be extracted at the same timestamp for all items in the media set.
transform2 = video_input.transform().extract_frames_at_timestamp(30)

# Write the transformation to output media set
image_output.write(transform1)

render_dicom_layer()

Renders a frame of a DICOM file as an image, using the specified dimensions, or the original dimensions if not provided.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
layer_number (Optional[int]): The layer number to render from the DICOM image. Defaults to the middle layer.
width (Optional[int]): The target width for the rendered image. If None, and height is provided, the aspect ratio will be preserved. Must be provided if height is not provided.
height (Optional[int]): The target height for the rendered image. If None, and width is provided, the aspect ratio will be preserved. Must be provided if width is not provided.
output_format (str): The format of the output images, for example PNG or JPEG. Defaults to PNG.

Returns:

An instance of MediaSetInputTransform containing the render DICOM layer transformation, allowing for further transformations.

Example:

Copied!1
2
transform = dicom_input.transform().render_dicom_layer(layer_number=2)
image_output.write(transform)

extract_form_fields()

Extracts form fields from documents.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing form fields. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, form_fields (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().extract_form_fields()
dataset_output.write_dataframe(df)

extract_table_of_contents()

Extracts the table of contents from documents.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing table of contents. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, table_of_contents (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().extract_table_of_contents()
dataset_output.write_dataframe(df)

get_pdf_page_dimensions()

Returns PDF page dimensions.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing a list of dictionaries with keys width and height. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, page_dimensions (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().get_pdf_page_dimensions()
dataset_output.write_dataframe(df)

generate_image_embeddings()

Generates embeddings for images.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
model_id (Optional[str]): The model to use to generate image embeddings. Defaults to GOOGLE_SIGLIP_2.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing vector embeddings. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, embeddings (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().generate_image_embeddings()
dataset_output.write_dataframe(df)

get_waveform()

Returns waveform amplitudes for audio files.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
peaks_per_second (Optional[int]): Number of peaks per second to return. Defaults to 100. Must be a positive non-zero integer up to 1000.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing a list of doubles representing audio amplitudes and normalized between 0 and 1. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, waveform (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().get_waveform()
dataset_output.write_dataframe(df)

transcribe()

Transcribes audio.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
language (Optional[str]): The language to use for transcription. Defaults to None, in which case it will be auto-detected. Valid languages can be found in the Whisper GitHub repo ↗ under LANGUAGES.
performance_mode (Literal["more_economical", "more_performant"]): The performance mode to use for transcription. Defaults to more_economical.
output_format (Literal["text", "segments"]): The format of the output. Defaults to text.
add_timestamps (Optional[bool]): Control whether timestamps are added to the transcription. Defaults to False. Only applicable when output_format is text.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: Applicable for transformations on a single item.
  - text: The transcribed text.
  - segments: JSON object containing the transcribed segments including timestamps, segment confidence and more details.
- DataFrame: Columns are media_item_rid, path, media_reference, transcription (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().transcribe(output_format="segments")
dataset_output.write_dataframe(df)

get_scene_frame_timestamps()

Returns timestamps for scene frames from video files.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
scene_sensitivity (Literal["MORE_SENSITIVE", "STANDARD", "LESS_SENSITIVE"]): The sensitivity level for scene detection. Defaults to STANDARD.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing a list of scene frames with keys timestamp and sceneScore in the frames field. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, scene_frames (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().get_scene_frame_timestamps()
dataset_output.write_dataframe(df)

filter_to()

Filters the media set to only include items of the specified schema type.

Parameters:

schema_type (Literal["audio", "image", "video", "document", "spreadsheet", "dicom"]): The schema type to filter to.

Returns:

An instance of MediaSetInputTransform containing the filter transformation, allowing for further transformations.

Example:

Copied!1
2
transform = media_set.transform().filter_to("document").slice_document(0,5)
pdf_output.write(transform)

extract_content_from_spreadsheets()

Extracts content from spreadsheet files.

Parameters:

media_item_rid (Optional[str]): If specified, will run the transformation on the specified item instead of the entire media set. Defaults to None.
suppress_errors (bool): Specifies error handling behavior. If True, errors are caught and the error message will be returned in the output. If False, any errors will not be caught and the build will fail. Defaults to True. Only applicable to transformations on the entire media set.

Returns:

A DataFrame or a single string.
- str: JSON object containing a key for each sheet name, and its value having fields table and merged_cells. Applicable for transformations on a single item.
- DataFrame: Columns are media_item_rid, path, media_reference, extracted_content (str). Applicable for transformations on the entire media set.

Example:

Copied!1
2
df = media_set.transform().extract_content_from_spreadsheets()
dataset_output.write_dataframe(df)

←

PREVIOUSUse media sets with Python transforms

NEXTUnstructured files

→