The Polars lazy API ↗ only evaluates queries after they are collected. This allows the engine to apply optimizations that improve performance in most cases, and is the recommended mode of compute when using Polars. Lazy execution is recommended for pipelines that process large amounts of data. However, different queries will benefit to varying degrees, and the stability of individual pipelines should always be verified before deploying to production systems.
To access the Polars lazy API in Foundry, set the lazy
flag to True
in your transform as shown below:
Copied!1 2 3 4 5 6 7 8 9 10 11
from transforms.api import transform, Input, Output @transform.lightweight( output=Output("/Users/jsmith/output"), input=Input("/Users/jsmith/input"), ) def compute(output, input): df = input.polars(lazy=True) # your data transformation logic output.write_table(df)
By default, lazy execution is disabled.
The Polars streaming engine ↗, is generally used during lazy computation. This allows for larger-than-memory data to be processed, as the query can be executed in batches instead of all at once. In addition, queries will execute faster when data streaming is enabled. Streaming is always enabled when Polars is used in lazy mode.
Lazy execution is especially beneficial when filter pushdown, also known as predicate pushdown, can be used. Filter pushdown means that filtering operations such as .filter()
, .where()
, or boolean indexing, are not executed immediately. Instead, they are recorded as part of the query plan. When the query is executed, Polars attempts to push these filters in the execution plan as early as possible. In some cases, they can be pushed all the way down to the data scan itself, avoiding data input/output altogether. The smaller the fraction of data used in the pipeline, the greater the impact filter pushdown will have. The full set of optimizations used during lazy compute can be found in the Polars lazy optimization ↗ documentation.
If your query resembles the example below, enabling lazy execution is strongly recommended, as it will significantly improve performance.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import polars as pl from transforms.api import transform, Input, Output @transform.lightweight( output=Output("/Users/jsmith/2025_06_sale_data"), sales_data=Input("/Users/jsmith/sales_data"), ) def june_2025_sales(output, sales_data): df = ( sales_data.polars(lazy=True) .filter( (pl.col("date") >= pl.lit("2025-06-01")) & (pl.col("date") <= pl.lit("2025-06-30")) ) ) output.write_table(df)
In lazy execution mode, Polars will generate a query plan. Use .explain()
or .describe_plan()
on a lazy DataFrame to view the planned execution steps and applied optimizations.
Copied!1 2 3 4 5 6 7 8 9 10 11 12
from transforms.api import transform, Input, Output @transform.lightweight( output=Output("/Users/jsmith/output"), input=Input("/Users/jsmith/input"), ) def compute(output, input): df = input.polars(lazy=True) # your data transformation logic print(df.explain()) output.write_table(df)
If a query is failing and lazy execution makes it challenging to identify the issue, you can materialize intermediate results with .collect()
.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13
from transforms.api import transform, Input, Output @transform.lightweight( output=Output("/Users/jsmith/output"), input=Input("/Users/jsmith/input"), ) def compute(output, input): df = input.polars(lazy=True) # part of a query df.collect() # remainder of a query output.write_table(df)
To fully leverage Polars optimizations in lazy compute, your foundry dataset is exposed using an internal object store proxy, so queries will only load the necessary data. Since Polars in eager mode does not have the same optimizations, your entire dataset is prefetched to disk when lazy=False
.
Consider the following points to decide when to enable lazy execution.
lazy=True
) in the following situations:
lazy=False
) in the following situations: