Media set compute usage

Media sets bring a number of advanced, out-of-the-box transformations to the platform. In addition to being triggered via transforms and pipelines, media transformations are also triggered by interacting with media items in the platform, for example, by previewing a media item. Additionally, there is a cost to download or stream the full contents of a media item.

Usage is tracked in units of Foundry compute-seconds. The table below describes each transformation available, with usage rate in terms of compute-seconds per gigabyte processed.

If you have an enterprise contract with Palantir, contact your Palantir representative before proceeding with usage calculations.

Transformations

Usage rate is measured in compute-seconds per GB.

All

TransformationUsage Rate
Download / stream2

Images

TransformationUsage Rate
Rotate40
Resize40
Generate PDF40
Adjust Contrast75
Crop / chip75
Grayscale75
Geo tile75
Render DICOM image layer75
Extract text (OCR)275
Encryption / decryption75

Audio

TransformationUsage Rate
Transcode75
Waveform generation75
Stream with HLS75
Transcription275

Video

TransformationUsage Rate
Get timestamps for scene frames40
Extract audio75
Extract frames at timestamp75
Extract all scene frames275
Stream with HLS275
Transcode275

Documents

TransformationUsage Rate
Render page as image40
Render page as image within bounding box40
Get PDF page dimensions40
Slice PDF range75
Extract form fields75
Extract table of contents75
Extract text on page (raw)75
Extract all text (raw)75
Extract text (OCR)275

Using media at scale

Media set limits

  • There is no limit to how many media items can be uploaded to a media set.
  • Transactional media sets have an upload limit of 10,000 items per transaction.
  • Transactionless media sets do not have an item upload limit (since there are no transactions).
  • Paths of items in media sets cannot exceed 256 characters. Attempting to add an item with a path longer than 256 characters to a media set will result in a MediaSet:MediaItemPathInvalid error.
  • Each media item can have a maximum file size of 50 GB.
  • For incremental transforms, limiting the batch size of media set inputs is not supported.

Throttling

While it is possible to interact with media sets at scale, such as uploading or transforming millions of media items, error handling should be implemented to ensure pipeline stability. When interacting with media sets at scale, the media set service may throttle by responding with Quality of Service (QoS) errors which have status codes of either 429 or 503. This indicates that the media set service cannot handle the storage or compute load at the moment.

Instead of failing on a QoS error when the service throttles, the client should build an adequate retrying mechanism.

For example, if a client is uploading millions of media items by submitting PUT requests at /{mediaSetRid}/items, the client should retry any media item upload requests that receive a QoS error.

When using the transforms-media library in Python transforms, a retrying mechanism is generally already in place. For example, when uploading media files from a catalog dataset to a media set, the library method provided already has retrying logic to handle QoS errors.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from transforms.api import transform, Input, Output from transforms.mediasets import MediaSetOutput @transform( input_files=Input('/examples/image_files'), output_media_set=MediaSetOutput('/examples/image_media_set'), uploaded_media_record=Output('/examples/uploaded_media_record'), ) def compute(input_files, input_files, output_media_set, uploaded_media_record): uploaded_media_items = output_files.put_dataset_files_and_get_dataframe_of_uploads( input_files, ignore_items_not_matching_schema=False, ignore_items_failing_to_convert=False, ) uploaded_media_record.write_dataframe(uploaded_media_items)