Media references are not yet supported in Python and TypeScript v2 functions.
Functions are executed in an environment that has strict memory limits. Exceeding these memory limits can happen quickly when dealing with file data; we recommend only interacting with media under 20MB.
If an object has a media reference property, you can use functions to interact with the associated media item. A media item exposes a number of methods for conveniently interacting with the underlying media. There are a number of built-in operations that allow you to easily interact with different kinds of media with no need for external libraries. The following documentation explains what functionality is available and how to use it.
If you need any operations that don't currently exist out-of-the-box, you will likely need to use external libraries or write your own custom code. Learn more about adding dependencies to functions repositories.
Some operations are supported by all media types.
You can access a media item by selecting the media reference property on the object. The signature for the method is as follows:
Copied!1// Blob is a standard JavaScript type, representing a file-like object of immutable, raw data. 2// https://developer.mozilla.org/en-US/docs/Web/API/Blob 3readAsync(): Promise<Blob>;
You can access a media item's metadata by selecting the metadata reference property on the object. The signature for the method is as follows:
Copied!1getMetadataAsync(): Promise<IMediaMetadata>;
Type guards in TypeScript v1 allow you to access functionality that is specific to certain media types. The following type guards can be used on media item metadata:
isAudioMetadata()
isDicomMetadata()
isDocumentMetadata()
isImageryMetadata()
isSpreadsheetMetadata()
isUntypedMetadata()
isVideoMetadata()
As an example, you could use the imagery type guard to pull out image specific metadata fields:
Copied!1const metadata = await myObject.mediaReference?.getMetadataAsync(); 2if (isImageryMetadata(metadata)) { 3 const imageWidth = metadata.dimensions?.width; 4 ... 5}
You can also use type guards on the media item namespace, which then gives you access to more methods on the type-specific media item. The type guards you can use here are:
MediaItem.isAudio()
MediaItem.isDicom()
MediaItem.isDocument()
MediaItem.isImagery()
MediaItem.isSpreadsheet()
MediaItem.isVideo()
To extract text from a document, you can either use optical character recognition (OCR) or extract embedded text on the media item.
For machine-generated PDFs, it may be faster and/or more accurate to extract text embedded digitally in the PDF rather than using optical character recognition (OCR). Below is an example of text extraction usage:
Copied!1extractTextAsync(options: IDocumentExtractTextOptions): Promise<string[]>;
When using TypeScript v1, the following can optionally be provided as an object:
startPage
: The zero-indexed start page (inclusive).endPage
: The zero-indexed end page (exclusive).For non-machine-generated PDFs, it would be best to use the OCR method for extracting text.
Copied!1ocrAsync(options: IDocumentOcrOptions): Promise<string[]>;
The following can optionally be provided as a TypeScript object:
startPage
: The zero-indexed start page (inclusive).endPage
: The zero-indexed end page (exclusive).languages
: A list of languages to recognize (can be empty).scripts
: A list of scripts to recognize (can be empty).outputType
: Specifies the output type as text
or hocr
.Remember that you need to use type guards in order to access media-type specific operations. Here's an example of using the isDocument()
type guard to then perform OCR text extraction:
Copied!1import { MediaItem } from "@foundry/functions-api"; 2import { ArxivPaper } from "@foundry/ontology-api"; 3 4@Function() 5public async firstPageText(paper: ArxivPaper): Promise<string | undefined> { 6 if (MediaItem.isDocument(paper.mediaReference)) { 7 const text = (await paper.mediaReference.ocrAsync({ endPage: 1, languages: [], scripts: [], outputType: 'text' }))[0]; 8 return text; 9 } 10 11 return undefined; 12}
Audio media items support transcription using the transcribe method. The signature is as follows:
Copied!1transcribeAsync(options: IAudioTranscriptionOptions): Promise<string>;
The following can optionally be passed in to specify how the transcription should run. The available options are:
language
: The language to transcribe, passed using the TranscriptionLanguage
enum.performanceMode
: Runs transcriptions in More Economical
or More Performant
mode, passed using the TranscriptionPerformanceMode
enum.outputFormat
: Specifies the output format by passing an object of type
plainTextNoSegmentData
(plain text) or pttml
. pttml
is a TTML-like ↗ format where the object also takes a boolean addTimestamps
parameter if the type is plainTextNoSegmentData
.Here's an example of providing options for transcription:
Copied!1import { Function, MediaItem, TranscriptionLanguage, TranscriptionPerformanceMode } from "@foundry/functions-api"; 2import { AudioFile } from "@foundry/ontology-api"; 3 4@Function() 5public async transcribeAudioFile(file: AudioFile): Promise<string|undefined> { 6 if (MediaItem.isAudio(file.mediaReference)) { 7 return await file.mediaReference.transcribeAsync({ 8 language: TranscriptionLanguage.ENGLISH, 9 performanceMode: TranscriptionPerformanceMode.MORE_ECONOMICAL, 10 outputFormat: {type: "plainTextNoSegmentData", addTimestamps: true} 11 }); 12 } 13 14 return undefined; 15}