Documentation

Compute usage with AIP

AIP compute usage involves large language models (LLMs). Fundamentally, LLMs take text as an input and respond with text as an output. The amount of text input and output is measured in tokens. Compute usage for LLMs is measured in compute-seconds per some number of tokens. Different models may have different rates for compute usage, as described below.

Tokens in AIP

Tokens are the basic units of text that LLMs use to process and understand input. A token can be as short as a single character or as long as a whole word depending on the language and the specific model.

Importantly, tokens do not map one-to-one with words. For example, common words might be a single token, but longer or less common words may be split into multiple tokens. Even punctuation marks and spaces can be considered tokens.

Different model providers have distinct definitions for what constitutes a token; for instance, OpenAI ↗ and Anthropic ↗. On average, tokens are around 4 characters long, with a character being a single letter or punctuation mark.

In AIP, tokens are consumed by applications that send prompts to and receive prompts from LLMs. Each of these prompts and responses consist of a measurable number of tokens. These tokens can be sent to multiple LLM providers; due to differences between providers, these tokens are converted into compute-seconds to match the price of the underlying model provider.

All applications that provide LLM-backed capabilities consume tokens when being used. See the following list for the set of applications that may use tokens when you interact with their LLM-backed capabilities.

AIP Assist
AIP Logic
AIP Error Enhancer
AIP Code Assist
Workshop LLM-backed tools
Quiver LLM-backed tools
Pipeline Builder LLM-backed tools
Direct calls to the Language Model Service (including both Python and TypeScript libraries)

If you have an enterprise contract with Palantir, contact your Palantir representative or our Support channels before proceeding with compute usage calculations.

Measuring compute with AIP

Model	Foundry cloud provider	Foundry region	Compute seconds per 10k input tokens	Compute seconds per 10k output tokens
Grok-2 ↗	AWS	North America	36	182
	AWS	EU / UK	31	154
	AWS	South America / APAC / Middle East	25	125
Grok-2-Vision ↗	AWS	North America	36	182
	AWS	EU / UK	31	154
	AWS	South America / APAC / Middle East	25	125
Grok-3 ↗	AWS	North America	55	273
	AWS	EU / UK	46	231
	AWS	South America / APAC / Middle East	38	188
Grok-3-Mini-Reasoning ↗	AWS	North America	5.5	9.1
	AWS	EU / UK	4.6	7.7
	AWS	South America / APAC / Middle East	3.8	6.3
Grok-4 prompts < 128k tokens ↗	AWS	North America	54.5	272.7
	AWS	EU / UK	46.2	230.8
	AWS	South America / APAC / Middle East	37.5	187.5
Grok-4 prompts > 128k tokens ↗	AWS	North America	109.1	545.5
	AWS	EU / UK	92.3	461.5
	AWS	South America / APAC / Middle East	75.0	375.0
Grok-4 Fast Reasoning < 128k tokens ↗	AWS	North America	3.6	9.1
	AWS	EU / UK	3.1	7.7
	AWS	South America / APAC / Middle East	2.5	6.3
Grok-4 Fast Reasoning > 128k tokens ↗	AWS	North America	7.3	18.2
	AWS	EU / UK	6.2	15.4
	AWS	South America / APAC / Middle East	5.0	12.5
Grok-4 Fast Non-Reasoning < 128k tokens ↗	AWS	North America	3.6	9.1
	AWS	EU / UK	3.1	7.7
	AWS	South America / APAC / Middle East	2.5	6.3
Grok-4 Fast Non-Reasoning > 128k tokens ↗	AWS	North America	7.3	18.2
	AWS	EU / UK	6.2	15.4
	AWS	South America / APAC / Middle East	5.0	12.5
Grok-4 Fast 1 ↗	AWS	North America	3.6	27.3
	AWS	EU / UK	3.1	23.1
	AWS	South America / APAC / Middle East	2.5	18.8
GPT-4o ↗	AWS	North America	43	172
	AWS	EU / UK	36	145
	AWS	South America / APAC / Middle East	30	118
GPT-4o mini ↗	AWS	North America	2.6	10.3
	AWS	EU / UK	2.2	8.7
	AWS	South America / APAC / Middle East	1.8	7.1
GPT-4.1 ↗	AWS	North America	31	124
	AWS	EU / UK	26	105
	AWS	South America / APAC / Middle East	21	85
GPT-4.1-mini ↗	AWS	North America	6.2	24.7
	AWS	EU / UK	5.2	20.9
	AWS	South America / APAC / Middle East	4.3	17
GPT-4.1-nano ↗	AWS	North America	1.5	6.2
	AWS	EU / UK	1.3	5.2
	AWS	South America / APAC / Middle East	1.1	4.3
GPT-5 ↗	AWS	North America	20.5	163.6
	AWS	EU / UK	17.3	138.5
	AWS	South America / APAC / Middle East	14.1	112.5
GPT-5-mini ↗	AWS	North America	4.1	32.7
	AWS	EU / UK	3.5	27.7
	AWS	South America / APAC / Middle East	2.8	22.5
GPT-5-nano ↗	AWS	North America	0.82	6.5
	AWS	EU / UK	0.69	5.5
	AWS	South America / APAC / Middle East	0.56	4.5
GPT-OSS-20B ↗	AWS	North America	1.1	4.9
	AWS	EU / UK	1.0	4.2
	AWS	South America / APAC / Middle East	0.79	3.4
GPT-OSS-120B ↗	AWS	North America	2.5	9.8
	AWS	EU / UK	2.1	8.3
	AWS	South America / APAC / Middle East	1.7	6.8
GPT-5 Codex ↗	AWS	North America	20.5	163.6
	AWS	EU / UK	17.3	138.5
	AWS	South America / APAC / Middle East	14.1	112.5
o1 ↗	AWS	North America	232	927
	AWS	EU / UK	196	785
	AWS	South America / APAC / Middle East	159	638
o1-mini ↗	AWS	North America	17	68
	AWS	EU / UK	14	58
	AWS	South America / APAC / Middle East	12	47
o3 ↗	AWS	North America	31	124
	AWS	EU / UK	26	105
	AWS	South America / APAC / Middle East	21	85
o3-mini ↗	AWS	North America	17	68
	AWS	EU / UK	14	58
	AWS	South America / APAC / Middle East	12	47
o4-mini ↗	AWS	North America	17	68
	AWS	EU / UK	14	58
	AWS	South America / APAC / Middle East	12	47
`ada` embedding ↗	AWS	North America	1.68	N/A
	AWS	EU / UK	1.42	N/A
	AWS	South America / APAC / Middle East	1.16	N/A
text-embedding-3-large ↗	AWS	North America	2.24	N/A
	AWS	EU / UK	1.89	N/A
	AWS	South America / APAC / Middle East	1.54	N/A
text-embedding-3-small ↗	AWS	North America	0.34	N/A
	AWS	EU / UK	0.29	N/A
	AWS	South America / APAC / Middle East	0.24	N/A
Anthropic Claude 3 ↗	AWS	North America	52	258
	AWS	EU / UK	44	218
	AWS	South America / APAC / Middle East	35	177
Anthropic Claude 3 Haiku ↗	AWS	North America	4.3	21.5
	AWS	EU / UK	3.6	18.2
	AWS	South America / APAC / Middle East	3.0	14.8
Anthropic Claude 3.5 Haiku ↗	AWS	North America	12	62
	AWS	EU / UK	10	52
	AWS	South America / APAC / Middle East	9	43
Anthropic Claude 3.5 Sonnet ↗	AWS	North America	52	258
	AWS	EU / UK	44	218
	AWS	South America / APAC / Middle East	35	177
Anthropic Claude 3.5 Sonnet v2 ↗	AWS	North America	46	232
	AWS	EU / UK	39	196
	AWS	South America / APAC / Middle East	32	159
Anthropic Claude 3.7 Sonnet ↗	AWS	North America	46	232
	AWS	EU / UK	39	196
	AWS	South America / APAC / Middle East	32	159
Anthropic Claude 4 Sonnet <= 200k tokens ↗	AWS	North America	46.4	231.8
	AWS	EU / UK	39.2	196.2
	AWS	South America / APAC / Middle East	31.9	159.4
Anthropic Claude 4 Sonnet > 200k tokens↗	AWS	North America	103.6	388.6
	AWS	EU / UK	87.7	328.8
	AWS	South America / APAC / Middle East	71.3	267.2
Anthropic Claude 4.5 Sonnet <= 200k tokens↗	AWS	North America	51.8	259.1
	AWS	EU / UK	43.8	219.2
	AWS	South America / APAC / Middle East	35.6	178.1
Anthropic Claude 4.5 Sonnet > 200k tokens↗	AWS	North America	103.6	388.6
	AWS	EU / UK	87.7	328.8
	AWS	South America / APAC / Middle East	71.3	267.2
Anthropic Claude 4 Opus ↗	AWS	North America	232	1159
	AWS	EU / UK	196	981
	AWS	South America / APAC / Middle East	159	797
Anthropic Claude 4.1 Opus ↗	AWS	North America	259	1295
	AWS	EU / UK	219	1096
	AWS	South America / APAC / Middle East	178	891
Mistral Small 24B ↗	AWS	North America	158	525
	AWS	EU / UK	133	444
	AWS	South America / APAC / Middle East	108	361
Llama 3.1_8B ↗	AWS	North America	158	525
	AWS	EU / UK	133	444
	AWS	South America / APAC / Middle East	108	361
Llama 3.3_70B ↗	AWS	North America	158	525
	AWS	EU / UK	133	444
	AWS	South America / APAC / Middle East	108	361
Llama 4 Scout_17B 16E Instruct ↗	AWS	North America	1.5	5.7
	AWS	EU / UK	1.2	4.8
	AWS	South America / APAC / Middle East	1.0	3.9
Llama 4 Maverick_17B 128E Instruct ↗	AWS	North America	2.1	8.4
	AWS	EU / UK	1.8	7.1
	AWS	South America / APAC / Middle East	1.4	5.8
Snowflake Arctic Embed ↗	AWS	North America	38	38
	AWS	EU / UK	32	32
	AWS	South America / APAC / Middle East	26	26
Gemini 1.5 Flash ↗	AWS	North America	1.3	5.2
	AWS	EU / UK	1.1	4.4
	AWS	South America / APAC / Middle East	0.9	3.5
Gemini 1.5 Pro ↗	AWS	North America	21	86
	AWS	EU / UK	18	73
	AWS	South America / APAC / Middle East	15	59
Gemini 2.0 Flash ↗	AWS	North America	1.5	6.2
	AWS	EU / UK	1.3	5.2
	AWS	South America / APAC / Middle East	1.1	4.3
Gemini 2.5 Flash Lite ↗	AWS	North America	1.7	6.9
	AWS	EU / UK	1.5	5.8
	AWS	South America / APAC / Middle East	1.2	4.8
Gemini 2.5 Flash ↗	AWS	North America	5.2	43.2
	AWS	EU / UK	4.4	36.5
	AWS	South America / APAC / Middle East	3.6	29.7
Gemini 2.5 Pro prompts <= 200k tokens ↗	AWS	North America	21.6	172.7
	AWS	EU / UK	18.3	146.2
	AWS	South America / APAC / Middle East	14.8	118.8
Gemini 2.5 Pro prompts > 200k tokens ↗	AWS	North America	43.2	259.1
	AWS	EU / UK	36.5	219.1
	AWS	South America / APAC / Middle East	29.7	178.1
Document Information Extraction	AWS	North America	182	N/A
	AWS	EU / UK	154	N/A
	AWS	South America / APAC / Middle East	125	N/A

AIP routes text directly to backing LLMs which run the tokenization themselves. The size of the text will dictate the amount of compute that is used by the backing model to serve the response.

Take the following example sentence that is sent to the GPT-4o model.

AIP incorporates all of Palantir's advanced security measures for the protection of sensitive data in compliance with industry regulations.

This sentence contains 140 characters and will tokenize in the following way, with a | character separating each token. Note that a token is not always equivalent to a word; some words are broken into multiple tokens, like AIP and Palantir in the example below.

A|IP| incorporates| all| of| Pal|ant|ir|'s| advanced| security| measures| for| the| protection| of| sensitive| data| in| compliance| with| industry| regulations|.

This sentence contains 24 tokens and will use the following number of compute-seconds:

compute-seconds = 24 tokens * 43 compute-seconds / 10,000 tokens
compute-seconds = 24 * 43 / 10,000
compute-seconds = 0.1032

The number of tokens and characters in the above sentence was verified with OpenAI's Tokenizer feature ↗.

Understanding drivers of compute usage with AIP

Usage of compute-seconds resulting from LLM tokens is attached directly to the individual application resource that requests the usage. For example, if you use AIP to automatically explain a pipeline in Pipeline Builder, the compute-seconds used by the LLM to generate that explanation will be attributed to that specific pipeline. This is true across the platform; keeping this in mind will help you track where you are using tokens.

In some cases, compute usage is not attributable to a single resource in the platform; examples include AIP Assist and Error Explainer, among others. When usage is not attributable to a single resource, the tokens will be attributed to the user folder initiating the use of tokens.

We recommend staying aware of the tokens that are sent to LLMs on your behalf. Generally, the more information that you include when using LLMs, the more compute-seconds will be used. For example, the following scenarios describe different ways of using compute-seconds.

In Pipeline Builder, you can ask AIP to explain your transformation nodes; the number of selected nodes affects the number of tokens used by the LLM to generate a response, and thus compute-second usage. This is because as the number of nodes increases, so does the amount of text the LLM must process regarding the configuration of those nodes.
In AIP Assist, asking the LLM to generate large blocks of code requires more output tokens. Shorter responses use fewer tokens and thus less compute.
In AIP Logic, sending large amounts of text with your prompts requires more tokens and thus more compute-seconds.

←

PREVIOUSAIP security and privacy

NEXTAdministration / Enable AIP features

→