Compute usage with AIP

AIP compute usage involves large language models (LLMs). Fundamentally, LLMs take text as an input and respond with text as an output. The amount of text input and output is measured in tokens. Compute usage for LLMs is measured in compute-seconds per some number of tokens. Different models may have different rates for compute usage, as described below.

Tokens in AIP

Tokens are the basic units of text that LLMs use to process and understand input. A token can be as short as a single character or as long as a whole word depending on the language and the specific model.

Importantly, tokens do not map one-to-one with words. For example, common words might be a single token, but longer or less common words may be split into multiple tokens. Even punctuation marks and spaces can be considered tokens.

Different model providers have distinct definitions for what constitutes a token; for instance, OpenAI ↗ and Anthropic ↗. On average, tokens are around 4 characters long, with a character being a single letter or punctuation mark.

In AIP, tokens are consumed by applications that send prompts to and receive prompts from LLMs. Each of these prompts and responses consist of a measurable number of tokens. These tokens can be sent to multiple LLM providers; due to differences between providers, these tokens are converted into compute-seconds to match the price of the underlying model provider.

All applications that provide LLM-backed capabilities consume tokens when being used. See the following list for the set of applications that may use tokens when you interact with their LLM-backed capabilities.

  • AIP Assist
  • AIP Logic
  • AIP Error Enhancer
  • AIP Code Assist
  • Workshop LLM-backed tools
  • Quiver LLM-backed tools
  • Pipeline Builder LLM-backed tools
  • Direct calls to the Language Model Service (including both Python and TypeScript libraries)

Measuring compute with AIP

ModelFoundry cloud providerFoundry regionCompute seconds per 10k input tokensCompute seconds per 10k output tokens
Grok-2 ↗AWSNorth America36182
AWSEU / UK31154
AWSSouth America / APAC / Middle East25125
Grok-2-Vision ↗AWSNorth America36182
AWSEU / UK31154
AWSSouth America / APAC / Middle East25125
Grok-3 ↗AWSNorth America55273
AWSEU / UK46231
AWSSouth America / APAC / Middle East38188
Grok-3-Mini-Reasoning ↗AWSNorth America5.59.1
AWSEU / UK4.67.7
AWSSouth America / APAC / Middle East3.86.3
GPT-4o ↗AWSNorth America43172
AWSEU / UK36145
AWSSouth America / APAC / Middle East30118
GPT-4o mini ↗AWSNorth America2.610.3
AWSEU / UK2.28.7
AWSSouth America / APAC / Middle East1.87.1
GPT-4.1 ↗AWSNorth America31124
AWSEU / UK26105
AWSSouth America / APAC / Middle East2185
GPT-4.1-mini ↗AWSNorth America6.224.7
AWSEU / UK5.220.9
AWSSouth America / APAC / Middle East4.317
GPT-4.1-nano ↗AWSNorth America1.56.2
AWSEU / UK1.35.2
AWSSouth America / APAC / Middle East1.14.3
o1 ↗AWSNorth America232927
AWSEU / UK196785
AWSSouth America / APAC / Middle East159638
o1-mini ↗AWSNorth America1768
AWSEU / UK1458
AWSSouth America / APAC / Middle East1247
o3 ↗AWSNorth America31124
AWSEU / UK26105
AWSSouth America / APAC / Middle East2185
o3-mini ↗AWSNorth America1768
AWSEU / UK1458
AWSSouth America / APAC / Middle East1247
o4-mini ↗AWSNorth America1768
AWSEU / UK1458
AWSSouth America / APAC / Middle East1247
ada embedding ↗AWSNorth America1.68N/A
AWSEU / UK1.42N/A
AWSSouth America / APAC / Middle East1.16N/A
text-embedding-3-large ↗AWSNorth America2.24N/A
AWSEU / UK1.89N/A
AWSSouth America / APAC / Middle East1.54N/A
text-embedding-3-small ↗AWSNorth America0.34N/A
AWSEU / UK0.29N/A
AWSSouth America / APAC / Middle East0.24N/A
Anthropic Claude 3 ↗AWSNorth America52258
AWSEU / UK44218
AWSSouth America / APAC / Middle East35177
Anthropic Claude 3 Haiku ↗AWSNorth America4.321.5
AWSEU / UK3.618.2
AWSSouth America / APAC / Middle East3.014.8
Anthropic Claude 3.5 Haiku ↗AWSNorth America1262
AWSEU / UK1052
AWSSouth America / APAC / Middle East943
Anthropic Claude 3.5 Sonnet ↗AWSNorth America52258
AWSEU / UK44218
AWSSouth America / APAC / Middle East35177
Anthropic Claude 3.5 Sonnet v2 ↗AWSNorth America46232
AWSEU / UK39196
AWSSouth America / APAC / Middle East32159
Anthropic Claude 3.7 Sonnet ↗AWSNorth America46232
AWSEU / UK39196
AWSSouth America / APAC / Middle East32159
Anthropic Claude 4 Sonnet ↗AWSNorth America46232
AWSEU / UK39196
AWSSouth America / APAC / Middle East32159
Anthropic Claude 4 Opus ↗AWSNorth America2321159
AWSEU / UK196981
AWSSouth America / APAC / Middle East159797
Mistral Small 24B ↗AWSNorth America158525
AWSEU / UK133444
AWSSouth America / APAC / Middle East108361
Llama 3.1_8B ↗AWSNorth America158525
AWSEU / UK133444
AWSSouth America / APAC / Middle East108361
Llama 3.3_70B ↗AWSNorth America158525
AWSEU / UK133444
AWSSouth America / APAC / Middle East108361
Snowflake Arctic Embed ↗AWSNorth America3838
AWSEU / UK3232
AWSSouth America / APAC / Middle East2626
Gemini 1.5 Flash ↗AWSNorth America1.35.2
AWSEU / UK1.14.4
AWSSouth America / APAC / Middle East0.93.5
Gemini 1.5 Pro ↗AWSNorth America2186
AWSEU / UK1873
AWSSouth America / APAC / Middle East1559
Gemini 2.0 Flash ↗AWSNorth America1.56.2
AWSEU / UK1.35.2
AWSSouth America / APAC / Middle East1.14.3
Document Information ExtractionAWSNorth America182N/A
AWSEU / UK154N/A
AWSSouth America / APAC / Middle East125N/A

AIP routes text directly to backing LLMs which run the tokenization themselves. The size of the text will dictate the amount of compute that is used by the backing model to serve the response.

Take the following example sentence that is sent to the GPT-4o model.

AIP incorporates all of Palantir's advanced security measures for the protection of sensitive data in compliance with industry regulations.

This sentence contains 140 characters and will tokenize in the following way, with a | character separating each token. Note that a token is not always equivalent to a word; some words are broken into multiple tokens, like AIP and Palantir in the example below.

A|IP| incorporates| all| of| Pal|ant|ir|'s| advanced| security| measures| for| the| protection| of| sensitive| data| in| compliance| with| industry| regulations|.

This sentence contains 24 tokens and will use the following number of compute-seconds:

compute-seconds = 24 tokens * 43 compute-seconds / 10,000 tokens
compute-seconds = 24 * 43 / 10,000
compute-seconds = 0.1032

The number of tokens and characters in the above sentence was verified with OpenAI's Tokenizer feature ↗.

Understanding drivers of compute usage with AIP

Usage of compute-seconds resulting from LLM tokens is attached directly to the individual application resource that requests the usage. For example, if you use AIP to automatically explain a pipeline in Pipeline Builder, the compute-seconds used by the LLM to generate that explanation will be attributed to that specific pipeline. This is true across the platform; keeping this in mind will help you track where you are using tokens.

In some cases, compute usage is not attributable to a single resource in the platform; examples include AIP Assist and Error Explainer, among others. When usage is not attributable to a single resource, the tokens will be attributed to the user folder initiating the use of tokens.

We recommend staying aware of the tokens that are sent to LLMs on your behalf. Generally, the more information that you include when using LLMs, the more compute-seconds will be used. For example, the following scenarios describe different ways of using compute-seconds.

  • In Pipeline Builder, you can ask AIP to explain your transformation nodes; the number of selected nodes affects the number of tokens used by the LLM to generate a response, and thus compute-second usage. This is because as the number of nodes increases, so does the amount of text the LLM must process regarding the configuration of those nodes.
  • In AIP Assist, asking the LLM to generate large blocks of code requires more output tokens. Shorter responses use fewer tokens and thus less compute.
  • In AIP Logic, sending large amounts of text with your prompts requires more tokens and thus more compute-seconds.