Extracts content from the specified document, while preserving the document's layout.
Expression categories: Media
Declared arguments
Languages to detect - Languages to detect in the input files. Set<Enum<Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Azerbaijani - Cyrilic, Basque, Belarusian, and more ...>>
Media reference - The PDF to extract content from. Expression<Media reference>
Output format - The desired format of the output. Choose between a simple text-based output or a structured output with all details, including the bounding boxes. Enum<Full extract, Text and tables>
optionalEnd page - The end of the page range (inclusive). If no value is provided, it will default to the last page. Expression<Integer>
optionalError handling - Determines the behavior of the pipeline for inputs that fail to process. Enum<FAIL, NULL>
optionalStart page - The start of the page range. If no value is provided, it will default to the first page. Expression<Integer>