AIP compute usage involves large language models (LLMs). Fundamentally, LLMs take text as an input and respond with text as an output. The amount of text input and output is measured in tokens. Compute usage for LLMs is measured in compute-seconds per some number of tokens. Different models may have different rates for compute usage, as described below.
Tokens are the basic units of text that LLMs use to process and understand input. A token can be as short as a single character or as long as a whole word depending on the language and the specific model.
Importantly, tokens do not map one-to-one with words. For example, common words might be a single token, but longer or less common words may be split into multiple tokens. Even punctuation marks and spaces can be considered tokens.
Different model providers have distinct definitions for what constitutes a token; for instance, OpenAI ↗ and Anthropic ↗. On average, tokens are around 4 characters long, with a character being a single letter or punctuation mark.
In AIP, tokens are consumed by applications that send prompts to and receive prompts from LLMs. Each of these prompts and responses consist of a measurable number of tokens. These tokens can be sent to multiple LLM providers; due to differences between providers, these tokens are converted into compute-seconds to match the price of the underlying model provider.
All applications that provide LLM-backed capabilities consume tokens when being used. See the following list for the set of applications that may use tokens when you interact with their LLM-backed capabilities.
Model | Foundry cloud provider | Foundry region | Compute seconds per 10k input tokens | Compute seconds per 10k output tokens |
---|---|---|---|---|
Grok-2 ↗ | AWS | North America | 36 | 182 |
AWS | EU / UK | 31 | 154 | |
AWS | South America / APAC / Middle East | 25 | 125 | |
Grok-2-Vision ↗ | AWS | North America | 36 | 182 |
AWS | EU / UK | 31 | 154 | |
AWS | South America / APAC / Middle East | 25 | 125 | |
Grok-3 ↗ | AWS | North America | 55 | 273 |
AWS | EU / UK | 46 | 231 | |
AWS | South America / APAC / Middle East | 38 | 188 | |
Grok-3-Mini-Reasoning ↗ | AWS | North America | 5.5 | 9.1 |
AWS | EU / UK | 4.6 | 7.7 | |
AWS | South America / APAC / Middle East | 3.8 | 6.3 | |
GPT-4o ↗ | AWS | North America | 43 | 172 |
AWS | EU / UK | 36 | 145 | |
AWS | South America / APAC / Middle East | 30 | 118 | |
GPT-4o mini ↗ | AWS | North America | 2.6 | 10.3 |
AWS | EU / UK | 2.2 | 8.7 | |
AWS | South America / APAC / Middle East | 1.8 | 7.1 | |
GPT-4.1 ↗ | AWS | North America | 31 | 124 |
AWS | EU / UK | 26 | 105 | |
AWS | South America / APAC / Middle East | 21 | 85 | |
GPT-4.1-mini ↗ | AWS | North America | 6.2 | 24.7 |
AWS | EU / UK | 5.2 | 20.9 | |
AWS | South America / APAC / Middle East | 4.3 | 17 | |
GPT-4.1-nano ↗ | AWS | North America | 1.5 | 6.2 |
AWS | EU / UK | 1.3 | 5.2 | |
AWS | South America / APAC / Middle East | 1.1 | 4.3 | |
o1 ↗ | AWS | North America | 232 | 927 |
AWS | EU / UK | 196 | 785 | |
AWS | South America / APAC / Middle East | 159 | 638 | |
o1-mini ↗ | AWS | North America | 17 | 68 |
AWS | EU / UK | 14 | 58 | |
AWS | South America / APAC / Middle East | 12 | 47 | |
o3 ↗ | AWS | North America | 31 | 124 |
AWS | EU / UK | 26 | 105 | |
AWS | South America / APAC / Middle East | 21 | 85 | |
o3-mini ↗ | AWS | North America | 17 | 68 |
AWS | EU / UK | 14 | 58 | |
AWS | South America / APAC / Middle East | 12 | 47 | |
o4-mini ↗ | AWS | North America | 17 | 68 |
AWS | EU / UK | 14 | 58 | |
AWS | South America / APAC / Middle East | 12 | 47 | |
ada embedding ↗ | AWS | North America | 1.68 | N/A |
AWS | EU / UK | 1.42 | N/A | |
AWS | South America / APAC / Middle East | 1.16 | N/A | |
text-embedding-3-large ↗ | AWS | North America | 2.24 | N/A |
AWS | EU / UK | 1.89 | N/A | |
AWS | South America / APAC / Middle East | 1.54 | N/A | |
text-embedding-3-small ↗ | AWS | North America | 0.34 | N/A |
AWS | EU / UK | 0.29 | N/A | |
AWS | South America / APAC / Middle East | 0.24 | N/A | |
Anthropic Claude 3 ↗ | AWS | North America | 52 | 258 |
AWS | EU / UK | 44 | 218 | |
AWS | South America / APAC / Middle East | 35 | 177 | |
Anthropic Claude 3 Haiku ↗ | AWS | North America | 4.3 | 21.5 |
AWS | EU / UK | 3.6 | 18.2 | |
AWS | South America / APAC / Middle East | 3.0 | 14.8 | |
Anthropic Claude 3.5 Haiku ↗ | AWS | North America | 12 | 62 |
AWS | EU / UK | 10 | 52 | |
AWS | South America / APAC / Middle East | 9 | 43 | |
Anthropic Claude 3.5 Sonnet ↗ | AWS | North America | 52 | 258 |
AWS | EU / UK | 44 | 218 | |
AWS | South America / APAC / Middle East | 35 | 177 | |
Anthropic Claude 3.5 Sonnet v2 ↗ | AWS | North America | 46 | 232 |
AWS | EU / UK | 39 | 196 | |
AWS | South America / APAC / Middle East | 32 | 159 | |
Anthropic Claude 3.7 Sonnet ↗ | AWS | North America | 46 | 232 |
AWS | EU / UK | 39 | 196 | |
AWS | South America / APAC / Middle East | 32 | 159 | |
Anthropic Claude 4 Sonnet ↗ | AWS | North America | 46 | 232 |
AWS | EU / UK | 39 | 196 | |
AWS | South America / APAC / Middle East | 32 | 159 | |
Anthropic Claude 4 Opus ↗ | AWS | North America | 232 | 1159 |
AWS | EU / UK | 196 | 981 | |
AWS | South America / APAC / Middle East | 159 | 797 | |
Mistral Small 24B ↗ | AWS | North America | 158 | 525 |
AWS | EU / UK | 133 | 444 | |
AWS | South America / APAC / Middle East | 108 | 361 | |
Llama 3.1_8B ↗ | AWS | North America | 158 | 525 |
AWS | EU / UK | 133 | 444 | |
AWS | South America / APAC / Middle East | 108 | 361 | |
Llama 3.3_70B ↗ | AWS | North America | 158 | 525 |
AWS | EU / UK | 133 | 444 | |
AWS | South America / APAC / Middle East | 108 | 361 | |
Snowflake Arctic Embed ↗ | AWS | North America | 38 | 38 |
AWS | EU / UK | 32 | 32 | |
AWS | South America / APAC / Middle East | 26 | 26 | |
Gemini 1.5 Flash ↗ | AWS | North America | 1.3 | 5.2 |
AWS | EU / UK | 1.1 | 4.4 | |
AWS | South America / APAC / Middle East | 0.9 | 3.5 | |
Gemini 1.5 Pro ↗ | AWS | North America | 21 | 86 |
AWS | EU / UK | 18 | 73 | |
AWS | South America / APAC / Middle East | 15 | 59 | |
Gemini 2.0 Flash ↗ | AWS | North America | 1.5 | 6.2 |
AWS | EU / UK | 1.3 | 5.2 | |
AWS | South America / APAC / Middle East | 1.1 | 4.3 | |
Document Information Extraction | AWS | North America | 182 | N/A |
AWS | EU / UK | 154 | N/A | |
AWS | South America / APAC / Middle East | 125 | N/A |
AIP routes text directly to backing LLMs which run the tokenization themselves. The size of the text will dictate the amount of compute that is used by the backing model to serve the response.
Take the following example sentence that is sent to the GPT-4o
model.
AIP incorporates all of Palantir's advanced security measures for the protection of sensitive data in compliance with industry regulations.
This sentence contains 140 characters and will tokenize in the following way, with a |
character separating each token. Note that a token is not always equivalent to a word; some words are broken into multiple tokens, like AIP
and Palantir
in the example below.
A|IP| incorporates| all| of| Pal|ant|ir|'s| advanced| security| measures| for| the| protection| of| sensitive| data| in| compliance| with| industry| regulations|.
This sentence contains 24 tokens and will use the following number of compute-seconds:
compute-seconds = 24 tokens * 43 compute-seconds / 10,000 tokens
compute-seconds = 24 * 43 / 10,000
compute-seconds = 0.1032
The number of tokens and characters in the above sentence was verified with OpenAI's Tokenizer feature ↗.
Usage of compute-seconds resulting from LLM tokens is attached directly to the individual application resource that requests the usage. For example, if you use AIP to automatically explain a pipeline in Pipeline Builder, the compute-seconds used by the LLM to generate that explanation will be attributed to that specific pipeline. This is true across the platform; keeping this in mind will help you track where you are using tokens.
In some cases, compute usage is not attributable to a single resource in the platform; examples include AIP Assist and Error Explainer, among others. When usage is not attributable to a single resource, the tokens will be attributed to the user folder initiating the use of tokens.
We recommend staying aware of the tokens that are sent to LLMs on your behalf. Generally, the more information that you include when using LLMs, the more compute-seconds will be used. For example, the following scenarios describe different ways of using compute-seconds.