The media set transforms API provides methods for transforming media sets in Python transforms. This API enables various operations on media sets, such as extracting text using optical character recognition (OCR), resizing images, converting documents to images, and more.
Schema type | Available methods |
---|---|
Image | resize • crop • binarize • rotate • grayscale • equalize • rayleigh • convert_image_to_document • generate_image_embeddings • tile • ocr • encrypt • decrypt |
Document | ocr • extract_raw_text • convert_document_to_images • slice_document • extract_form_fields • extract_table_of_contents • get_pdf_page_dimensions |
Video | extract_audio • extract_scene_frames • chunk • extract_first_frame • extract_frames_at_timestamp • transcode • get_scene_frame_timestamps |
Audio | transcribe • chunk • transcode • get_waveform |
DICOM | render_dicom_layer |
Spreadsheet | extract_content_from_spreadsheets |
Multimodal | filter_to |
To use the media set transforms API, access the transform functionality from a media set input as shown in the example below.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13
@transform( media_input=MediaSetInput("/path/to/media_set"), dataset_output=Output("/path/to/output") ) def compute(ctx, media_input, dataset_output): # Create a MediaSetInputTransform instance transform = media_input.transform() # Apply transformations result = transform.ocr() # Write the result to output dataset_output.write_dataframe(result)
Copied!1
def transform(self, deduplicate_by_path=True):
Returns a MediaSetInputTransform
instance. This class enables fluent method chaining for media transformations on a media set input.
Parameters:
True
, only the most recent item at each path will be included. Defaults to True
.Returns:
MediaSetInputTransform
: A user-facing class that provides methods for media set transformations.Example:
Copied!1
df = media_set.transform().ocr()
Extracts text from PDFs or images using OCR and returns the extracted text as a string. Recommended for images and scanned documents.
Parameters:
None
. All valid codes can be found in the Tesseract documentation ↗ under scripts.None
.0
(the first page).None
(the final page).item_per_row
or page_per_row
. Only applicable to transformations on an entire PDF media set. Defaults to item_per_row
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: Transformations on a single image.list[str]
: Transformations on a single PDF.DataFrame
: Transformations on the entire media set.
item_per_row
): Columns are media_item_rid
, path
, media_reference
, extracted_text
(list[str]).page_per_row
) or image sets: Columns are media_item_rid
, path
, media_reference
, page_number
, extracted_text
(str).Example:
Copied!1 2
df = media_set.transform().ocr() dataset_output.write_dataframe(df)
Extracts raw text from PDFs. Recommended for documents that have been electronically generated.
Parameters:
None
.None
(the final page).item_per_row
or page_per_row
. Only applicable to transformations on an entire PDF media set. Defaults to item_per_row
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: Applicable to transformations on a single image.list[str]
: Applicable to transformations on a single PDF.DataFrame
: Applicable to transformations on the entire media set.
item_per_row
): Columns are media_item_rid
, path
, media_reference
, extracted_text
(list[str]).page_per_row
) or image sets: Columns are media_item_rid
, path
, media_reference
, page_number
, extracted_text
(str).Example:
Copied!1 2
df = media_set.transform().extract_raw_text() dataset_output.write_dataframe(df)
Resizes images to the specified dimensions.
Parameters:
None
.1024
. Must be provided if height is not provided.1024
. Must be provided if width is not provided.True
, images will be resized to fit within the specified dimensions while preserving the aspect ratio. Defaults to True
.Returns:
MediaSetInputTransform
containing the resize transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().resize() image_output.write(transform)
Converts document pages to images with the specified dimensions.
Parameters:
None
.None
(the last page).Returns:
MediaSetInputTransform
containing the document to image transformation, allowing for further transformations.Example:
Copied!1 2
transform = pdf_input.transform().convert_document_to_images() image_output.write(transform)
Slices documents to a specified range of pages.
Parameters:
None
.end_page
exceeds the number of pages in the document. If True
, an error will be raised. If False
, the last page of the document is used instead. Defaults to True
.Returns:
MediaSetInputTransform
containing the slice transformation, allowing for further transformations.Example:
Copied!1 2
transform = pdf_input.transform().slice_document(0, 5) pdf_output.write(transform)
Generates Slippy map tiles (EPSG 3857) from images. Only supports geo-embedded images in TIFF or NITF format, with a maximum size of 100 million square pixels.
Parameters:
None
.0
, the entire world fits into a single tile. Each increment doubles the spatial resolution and quadruples the number of tiles. Defaults to 0
.0 <= x < 2**zoom
. Defaults to 0
.0 <= y < 2**zoom
. Defaults to 0
.on
parameter.df
is specified. This aligns the tiling operation with the correct media item.Returns:
MediaSetInputTransform
containing the tile transformation, allowing for further transformations.Example:
Copied!1 2 3 4 5 6 7 8 9 10
# Dynamically select tiling parameters from the input_df columns. # Only tiles media items in both the media set and provided DataFrame. transform1 = image_input.transform().tile(input_df.zoom, input_df.x, input_df.y, df=input_df, on="media_item_rid") # All tiles will be generated with the same parameters. # Generates a tile for all media items in the media set. transform2 = image_input.transform().tile(zoom=2, x=1, y=1) # Write the transformation to output media set image_output.write(transform)
Encrypts specified regions of images using the provided cipher image key.
Parameters:
None
.on
parameter.df
is specified. This aligns the encryption operation with the correct media item.Returns:
MediaSetInputTransform
containing the encrypt transformation, allowing for further transformations.Example:
Copied!1 2 3 4
polygon = [api.Coordinate(x=10, y=10), api.Coordinate(x=100, y=10), api.Coordinate(x=100, y=100), api.Coordinate(x=10, y=100)] transform = image_input.transform().encrypt([polygon], "cipher_license_rid_123", media_item_rid="rid1") image_output.write(transform)
Decrypts specified regions of images using the provided cipher image key.
Parameters:
None
.on
parameter.df
is specified. This aligns the decryption operation with the correct media item.Returns:
MediaSetInputTransform
containing the decrypt transformation, allowing for further transformations.Example:
Copied!1 2 3 4
polygon = [api.Coordinate(x=10, y=10), api.Coordinate(x=100, y=10), api.Coordinate(x=100, y=100), api.Coordinate(x=10, y=100)] transform = image_input.transform().decrypt([polygon], "cipher_license_rid_123", media_item_rid="rid1") image_output.write(transform)
Crops images using specified dimensions and offsets.
Parameters:
0
.0
.None
.on
parameter.df
is specified. This aligns the cropping operation with the correct media item.Returns:
MediaSetInputTransform
containing the crop transformation, allowing for further transformations.Examples:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# All media items will be cropped with the same parameters. # Crops all media items in the media set. transform = image_input.transform().crop(100, 100, 10, 10) image_output.write(transform) # Dynamically select cropping parameters from the input_df columns. # Only crops media items in both the media set and provided DataFrame. transform1 = image_input.transform().crop(input_df.x2 - input_df.x1, input_df.y2 - input_df.y1, input_df.x1, input_df.y1, df=input_df, on="media_item_rid") # Width and height are dynamically selected from the input_df columns, while the offsets are static. # Only crops media items in both the media set and provided DataFrame. transform2 = image_input.transform().crop(30, 50, input_df.x1, input_df.y1, df=input_df, on="media_item_rid") # All media items will be cropped with the same parameters. # Only crops media items in both the media set and provided DataFrame. transform3 = image_input.transform().crop(30, 50, 20, 60, df=input_df, on="media_item_rid") # All media items will be cropped with the same parameters. # Crops all media items in the media set. transform4 = image_input.transform().crop(30, 50, 20, 60) # Write the transformation to output media set image_output.write(transform1)
Converts images to binary using the specified threshold.
Parameters:
None
.255
and values below will be assigned a value of 0
. Defaults to computing the threshold based on the input image.Returns:
MediaSetInputTransform
containing the binarize transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().binarize(threshold=150) image_output.write(transform)
Rotates images by the specified angle.
Parameters:
None
.DEGREE_90
.Returns:
MediaSetInputTransform
containing the rotate transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().rotate(angle="DEGREE_180") image_output.write(transform)
Converts images to grayscale.
Parameters:
None
.Returns:
MediaSetInputTransform
containing the grayscale transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().grayscale() image_output.write(transform)
Improves the clarity of low-contrast images by performing histogram equalization on grayscale images.
Parameters:
None
.Returns:
MediaSetInputTransform
containing the equalize transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().equalize() image_output.write(transform)
Adjusts the grayscale intensity values so the image's histogram (the distribution of pixel brightness) matches the Rayleigh distribution (roughly a bell curve that is always negative). This can improve clarity in low-contrast images.
Parameters:
None
.Returns:
MediaSetInputTransform
containing the rayleigh transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().rayleigh(sigma=0.7) image_output.write(transform)
Converts images to PDFs.
Parameters:
None
.Returns:
MediaSetInputTransform
containing the convert image to document transformation, allowing for further transformations.Example:
Copied!1 2
transform = image_input.transform().convert_image_to_document() pdf_output.write(transform)
Transcodes audio or video to the specified format.
Parameters:
None
.Returns:
MediaSetInputTransform
containing the transcode transformation, allowing for further transformations.Example:
Copied!1 2
transform = video_input.transform().transcode(encode_format="mov") video_output.write(transform)
Extracts audio from video files.
Parameters:
None
.mp3
, wav
). Defaults to mp3
.Returns:
MediaSetInputTransform
containing the audio extraction transformation, allowing for further transformations.Example:
Copied!1 2
transform = video_input.transform().extract_audio(output_format="wav") audio_output.write(transform)
Extracts all scene frames from videos as images. A scene frame is a video frame that marks the beginning of a new scene or a significant visual transition in the video content.
Parameters:
None
.STANDARD
.Returns:
MediaSetInputTransform
containing the scene frame extraction transformation. The output images for each video will be stored in a TAR archive file.Example:
Copied!1 2
transform = video_input.transform().extract_scene_frames(scene_sensitivity="MORE_SENSITIVE") multimodal_output.write(transform)
Chunks audio or video files into smaller segments of the specified duration.
Parameters:
None
.10000
(10 seconds). Must be a positive integer.Returns:
MediaSetInputTransform
containing the chunking transformation, allowing for further transformations.Example:
Copied!1 2
transform = video_input.transform().chunk(chunk_duration_milliseconds=5000) video_output.write(transform)
Extracts the first full scene frame from videos as an image with the specified dimensions, or the original dimensions if not provided.
Parameters:
None
.None
, width will be scaled based on the provided height. Defaults to None
.None
, height will be scaled based on the provided width. Defaults to None
.Returns:
MediaSetInputTransform
containing the first frame extraction transformation, allowing for further transformations.Example:
Copied!1 2
transform = video_input.transform().extract_first_frame(width=800, height=600) image_output.write(transform)
Extracts frames from videos at a specified timestamp, using the specified dimensions, or the original dimensions if not provided.
Parameters:
None
.on
parameter.df
is specified. This aligns the frame extraction operation with the correct media item.None
, scales width based on the provided height. Defaults to None
.None
, scales height based on provided width. Defaults to None
.Returns:
MediaSetInputTransform
containing the frame extraction transformation, allowing for further transformations.Example:
Copied!1 2 3 4 5 6 7 8 9
# Dynamically select timestamp parameters from the input_df columns. # Only extracts frames from media items in both the media set and provided DataFrame. transform1 = video_input.transform().extract_frames_at_timestamp(input_df.timestamp, df=input_df, on="media_item_rid") # All frames will be extracted at the same timestamp for all items in the media set. transform2 = video_input.transform().extract_frames_at_timestamp(30) # Write the transformation to output media set image_output.write(transform1)
Renders a frame of a DICOM file as an image, using the specified dimensions, or the original dimensions if not provided.
Parameters:
None
.None
, and height is provided, the aspect ratio will be preserved. Must be provided if height is not provided.None
, and width is provided, the aspect ratio will be preserved. Must be provided if width is not provided.Returns:
MediaSetInputTransform
containing the render DICOM layer transformation, allowing for further transformations.Example:
Copied!1 2
transform = dicom_input.transform().render_dicom_layer(layer_number=2) image_output.write(transform)
Extracts form fields from documents.
Parameters:
None
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing form fields. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, form_fields
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().extract_form_fields() dataset_output.write_dataframe(df)
Extracts the table of contents from documents.
Parameters:
None
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing table of contents. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, table_of_contents
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().extract_table_of_contents() dataset_output.write_dataframe(df)
Returns PDF page dimensions.
Parameters:
None
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing a list of dictionaries with keys width
and height
. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, page_dimensions
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().get_pdf_page_dimensions() dataset_output.write_dataframe(df)
Generates embeddings for images.
Parameters:
None
.GOOGLE_SIGLIP_2
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing vector embeddings. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, embeddings
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().generate_image_embeddings() dataset_output.write_dataframe(df)
Returns waveform amplitudes for audio files.
Parameters:
None
.100
. Must be a positive non-zero integer up to 1000
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing a list of doubles representing audio amplitudes and normalized between 0 and 1. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, waveform
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().get_waveform() dataset_output.write_dataframe(df)
Transcribes audio.
Parameters:
None
.None
, in which case it will be auto-detected. Valid languages can be found in the Whisper GitHub repo ↗ under LANGUAGES
.more_economical
.text
.False
. Only applicable when output_format is text
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: Applicable for transformations on a single item.
text
: The transcribed text.segments
: JSON object containing the transcribed segments including timestamps, segment confidence and more details.DataFrame
: Columns are media_item_rid
, path
, media_reference
, transcription
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().transcribe(output_format="segments") dataset_output.write_dataframe(df)
Returns timestamps for scene frames from video files.
Parameters:
None
.STANDARD
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing a list of scene frames with keys timestamp
and sceneScore
in the frames
field. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, scene_frames
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().get_scene_frame_timestamps() dataset_output.write_dataframe(df)
Filters the media set to only include items of the specified schema type.
Parameters:
Returns:
MediaSetInputTransform
containing the filter transformation, allowing for further transformations.Example:
Copied!1 2
transform = media_set.transform().filter_to("document").slice_document(0,5) pdf_output.write(transform)
Extracts content from spreadsheet files.
Parameters:
None
.True
, errors are caught and the error message will be returned in the output. If False
, any errors will not be caught and the build will fail. Defaults to True
. Only applicable to transformations on the entire media set.Returns:
str
: JSON object containing a key for each sheet name, and its value having fields table
and merged_cells
. Applicable for transformations on a single item.DataFrame
: Columns are media_item_rid
, path
, media_reference
, extracted_content
(str). Applicable for transformations on the entire media set.Example:
Copied!1 2
df = media_set.transform().extract_content_from_spreadsheets() dataset_output.write_dataframe(df)