Common media set transformations are available in Pipeline Builder. Learn how to build a pipeline with media sets in Pipeline Builder.
Here is an example of the Text Extraction (OCR option) board used on a PDF:
Contact Palantir Support if you are interested in a transformation that is not currently available.
Media sets can also be configured as outputs of your pipeline.
Media sets also support specialized transformations like PDF text extraction, optical character recognition (OCR), image tiling, and metadata parsing that can be leveraged in Python transforms by importing the transforms-media
library.
Common transformations can be found in the documentation on using media sets with Python transforms.
Here is an example on how you can get started with media sets in Code Repositories:
Copied!1 2 3 4 5 6 7 8
from transforms.api import transform from transforms.mediasets import MediaSetInput, MediaSetOutput @transform( images=MediaSetInput('/examples/images'), output_images=MediaSetOutput('/examples/output_images') ) def translate_images(images, output_images): ...
Media sets can be read from and written to incrementally within python transforms. To find out how to do this, follow the documentation.
Advanced users and developers can take advantage of media set access patterns, which are pre-configured transformations that can be performed on-demand on the media items in a media set. Access patterns have persistence policies for storage and optimization tuning, enabling the option to recompute at each request, persist outputs after first request indefinitely, or cache for a time.
Access patterns are leveraged in the Palantir platform to optimally process or render media set items. For example:
The default available set of access patterns is determined based on the configured media set schema. Additional transformations are registered as access patterns to a media set via API call only.