Foundry caches time series data on disk in a highly optimized format for time series queries. Querying this data requires using compute-seconds.
Time series queries use compute in the following ways:
Queries on larger time series indexes will read more points. The following sections offer a description of how this is calculated.
Note that when you are paying for Foundry usage, a fixed, minimum number of compute-seconds are consumed per query. The default amount is 20
compute-seconds. This is the base compute usage required in order to serve a query. If you have an enterprise contract with Palantir, contact your Palantir representative before proceeding with compute usage calculations.
Storing time series data is measured under Foundry storage. Storing time series data does not use any compute; only indexing time series and actively querying the data will use compute.
Time series query compute is used exclusively when querying time series data stored in Foundry. Time series queries use compute in two ways:
The following formula derives compute-seconds from a query:
compute-seconds = 20 + points_scanned / 50000
The Resource Management application allows you to drill down to dataset usage information and should be a starting point for investigating usage in the Foundry platform.
Users have access to multiple tools for querying time series data in Foundry. Time series query usage is always attached to the resource that each tool produces or modifies.
When paying for Foundry on a usage contract, there are three main drivers of compute with time series queries:
Managing the total number of queries is important for managing total compute usage from time series querying. Consider the following practices when using time series in Foundry:
To predict the cost of a time series query, be sure to always understand the size of the series being queried.
Consider the following example, where there are three queries against a series of 100,000 points with each query scanning all points in the series:
series_size: 100,000 points
minimum_query_usage: 20-compute seconds
points_per_compute_second: 50,000 points
total_queries: 3
compute-seconds = num_queries * minimum_query_usage + total_points / points_per_compute_second
compute-seconds = 3 queries * 20 compute-seconds + 100,000 points * 3 queries / 50,000 points-per-second
compute-seconds = 3 * 20 + 300,000 / 50,000
compute-seconds = 66 compute-seconds
The complexity of a time series query increases as more nested operations are applied to the queried time series.
As an example, consider the following FoundryTS code which adds two time series together and returns all the points of the new series for a 1-year time range as a Pandas dataframe:
series_1 = N.TimeseriesNode('series_1')
series_2 = N.TimeseriesNode('series_2')
result = F.dsl(program='a+b', return_type=float)([series_1, series_2]).time_range(start='2022-01-01', end='2023-01-01')
result.to_pandas()
This code will make a query to Codex with the shape:
{
id: dsl-fomula
children: [
{ id: timeseries },
{ id: timeseries }
]
}
Evaluating the to_pandas
call will incur cost for scanning the points in the result
time series in the requested 1-year time range, as well as the points in the two component series required to compute the result (in this case, a 1-year range from each).
Now, consider the following FoundryTS code which applies more nested operations. First, we define a series that is the sum of two other series. Then, we compare that series against its 7-day rolling average, and load one year of points from the result as a Pandas dataframe:
series_1 = N.TimeseriesNode('series_1')
series_2 = N.TimeseriesNode('series_1')
intermediate_1 = F.dsl(program='a+b', return_type=float)([series_1, series_2])
intermediate_2 = intermediate_1.rolling_aggregate('mean', '7d')
result = F.dsl(program='a-b', return_type=float)([intermediate_1, intermediate_2]).time_range(start='2022-01-01', end='2023-01-01')
result.to_pandas()
This code will make a query to Codex with the shape:
{
id: dsl-fomula
children: [
{
id: dsl-fomula
children: [
{ id: timeseries },
{ id: timeseries }
]
},
{
id: rolling-aggregate
children: [
{
id: dsl-fomula
children: [
{ id: timeseries },
{ id: timeseries }
]
}
]
}
]
}
Each node in this query tree will incur cost for scanning the 1-year range of points to produce the final result.