Lightweight pipelines in Pipeline Builder [Beta]

Beta

Lightweight pipelines are in the beta phase of development and need to be turned on for your enrollment. Functionality may change during active development. Contact your Palantir Support to enable.

If you are unfamiliar with creating pipelines in Pipeline Builder, review the documentation on how to create a batch pipeline in Pipeline Builder before proceeding.

Pipeline Builder now supports lightweight pipelines, which can provide faster execution for batch and incremental pipelines. Pipeline Builder's lightweight pipelines use a backend powered by DataFusion ↗, an open-source query engine written in Rust ↗. Compared to traditional Spark-based pipelines, lightweight pipelines can substantially accelerate compute processes for small to medium-sized datasets.

Lightweight pipelines are specifically engineered to optimize build times and execute low-latency operations efficiently. In particular, "quick" pipelines that run in under 15 minutes will benefit most from lightweight configuration.

We encourage you to experiment with different pipeline configurations to improve performance. You can explore the capabilities of lightweight pipelines by testing them on a branch or making a copy of an existing pipeline to compare lightweight performance with your original configuration.

Create a new lightweight pipeline

  1. Open Pipeline Builder and select Create new pipeline.
  2. After entering a name for your pipeline and the desired location, choose Lightweight pipeline under Pipeline type.
  3. Select Create pipeline.

Screenshot of Pipeline selection

Convert between lightweight and batch pipelines

You can convert between lightweight and standard batch pipelines, and vice versa, by following the steps below. This conversion can be reversed at any time by repeating the process and selecting the desired options.

  1. To convert a batch pipeline to a lightweight pipeline, go to Settings and select Convert to Lightweight pipeline.

Screenshot of pipeline settings "Convert to Lightweight pipeline" option

To convert a lightweight pipeline to a batch pipeline, go to Settings and select Convert to Batch pipeline.

Screenshot of pipeline settings "Convert to Batch pipeline" option

  1. If the pipeline is compatible with the new pipeline type, you will see a dialog box where you can confirm the conversion.

Screenshot of a successful convert to Lightweight pipeline dialog

  1. If the pipeline is not compatible with the new pipeline type, a warning will appear when you try to convert your pipeline. The warning will list any expressions or transforms that are incompatible with lightweight pipelines.

Screenshot of a unsuccessful convert to Lightweight pipeline dialog

Known limitations

Lightweight pipelines do not currently support the same set of transforms and expressions as standard batch pipelines. Most notably, unsupported transforms and expressions include LLM features, media set operations, and split nodes.

Due to the differences between lightweight and batch pipelines, you should always verify results using Preview or by examining build outputs.

Most supported expressions in lightweight pipelines will behave as their batch equivalents. Known limitations in lightweight pipelines include:

  • Floating point results may vary in the last digits.
  • Decimal overflow will throw an error instead of outputting a NULL value.
  • Structs cannot be compared with <, >, ==, etc.
  • pow overflow returns NULL instead of inf.
  • Regular expressions (regex) functions do not support the full range of Java regular expressions, such as look arounds and back references. Learn more about the regular expression implementation. ↗
  • Cast functionality may have differences for complex types, such as structs, arrays, maps, and their conversions into strings. For example, nulls may be rendered differently when these types are converted to strings.
  • Limited format support for TimestampToString, DateToString, StringToTimestamp, and StringToDate.
  • Min and max are not supported for complex types, such as structs, arrays, and maps.
  • Empty outputs will result in 0 files rather than an empty file.
  • Stats, other than row count, are not supported on build.