LLM capacity is a limited resource at the industry level, and all providers (Azure, OpenAI, AWS Bedrock, Google Cloud Vertex, etc.) limit the maximum capacity available per account. Palantir AIP consequently follows the market-level constraint set introduced by LLM providers. The standard unit of measure across the industry is tokens per minute (TPM) and requests per minute (RPM).
Palantir has set a certain maximum capacity for each enrollment, referred to as "enrollment-level rate limits". This capacity is measured per model using TPM and RPM, and includes all models of all providers enabled on your enrollment, including GPT, Claude, Gemini, Llama, Mixtral, and more. In this way, each model has a separate, independent capacity not affected by the usage of other models.
LLM capacity in AIP is managed at three levels: enrollment-level limits set the overall ceiling, project rate limits control how much of that capacity each project can use, and user rate limits govern individual user consumption for traffic not attributed to a project.
By default, all customers are on the medium tier, which is large enough to build prototypes and scale to a few use cases, even with hundreds of users and large datasets, including millions of documents for example.
Additionally, AIP offers the option to upgrade the enrollment capacity from the medium tier to a large or XL tier if you require additional capacity. If you are constantly hitting enrollment rate limits blocking you from expanding your AIP usage, or if you expect you will increase the volume of your pipelines or total number of users, contact Palantir Support.
Enrollment limits are now displayed on the AIP rate limits tab in the Resource Management application, along with the enrollment tier.

AIP offers enough capacity to build large scale workflows with enrollment tiers, particularly the XL tier. These tiers have provided enough capacity for hundreds of Palantir customers using LLMs at scale, and we continue to increase these limits.
For a full breakdown of enrollment rate limits per model and tier, see LLM enrollment rate limits.
Enrollment administrators can navigate to the AIP usage & limits page in the Resource Management application to:
View usage: View LLM token and request usage of all Palantir-provided models for all projects and resources in your enrollment.
Manage rate limits: Manage project and user rate limits for your enrollment.
The View usage tab provides visibility into LLM token and request usage of all Palantir-provided models for all projects, resources, and users in your enrollment. Administrators can use this view to better manage LLM capacity and handle rate limits.

This view allows you to:
Note that this view is not optimized to address cost management for LLM usage. Learn how to review LLM cost on AIP-enabled enrollments via the Analysis tab.
If you are hitting rate limits at the enrollment or project level, you may consider taking any of the following actions:
LLM requests are attributed in one of two ways, and the two are mutually exclusive — every request is governed by exactly one of these limit types:

On the Manage rate limits tab under the AIP usage & limits page in Resource Management, administrators have the flexibility to maximize LLM usage for production use cases in case of ambitious use cases in AIP, and limit or disallow experimental projects from saturating the entire enrollment capacity. Enrollment administrators can configure the maximum percent of TPM and RPM that all resources within a given project can use at every given minute combined, per model.

By default, all projects are given a specific limit at which to operate. An admin can create additional project limits, define which projects are included in each limit, and what percent of enrollment capacity can be used.
Within each project limit, you can configure model-specific overrides to further control capacity allocation at the model level. Model overrides allow you to set different percentage limits for individual models, overriding the base project limit. These overrides only apply to the projects included in that specific project limit (or for the default limit, all projects not assigned to any other manually created project limit).
Model overrides enable more granular capacity management and allow you to create model "allowlists"; you can set the base project limit to 0%, and then add model overrides with specific percentages for approved models only. You can also explicitly disallow certain models by setting their override limit percentage to 0%.
For example, the steps below explain how to restrict projects in a project limit to only use Claude Sonnet 4 and GPT-4.1:
Users in all projects included in this project limit will only be able to access the specified models within their allocated capacity limits.

Per-user rate limits govern the capacity available to a single user for user-attributed requests. They ensure that no single user can exhaust the enrollment's entire capacity for a model through interactive workflows.
User rate limits are managed on the Manage rate limits tab, under the AIP usage & limits page in Resource Management. Enrollment administrators can view default per-user limits, create user-group overrides, and configure per-model overrides.

The default per-user limit for each model is shown in the User rate limits tab in Resource Management, or in the Per-user Limits column of the enrollment rate limit table. These defaults are set by Palantir and apply to all users unless overridden by an administrator in the User rate limits tab. The default per-user limit for each model is shown in the user rate limits tab in Resource Management, or in the Per-user Limits column of the enrollment rate limit table. These defaults are set by Palantir and apply to all users unless overridden by an administrator in the user rate limits tab.
We recommend using the Palantir default user rate limits. We defined them to balance (a) protecting the enrollment limits from being saturated by a single user; and (b) enabling users to maximize their productivity working on the latest AIP tools in Foundry effectively. If an administrator chooses to set a new custom limit for all models, they should revisit it as new models are released to make sure they are not limiting their users unintentionally.
Enrollment administrators can override the per-user defaults to grant specific users (as part of specific groups of users) a different per-user limit. This is useful for power users, service accounts behind interactive applications, or teams whose user-attributed workflows require sustained high throughput, without raising the enrollment's overall capacity tier. It is also a tool to protect the capacity of a certain model used in a production workflow, from accidentally getting saturated by users.
A per-user override can be expressed in one of two ways:
When an override is configured, it replaces the published per-user default; it is not blended with or floored by the default.
Setting per-user limits below 50,000 TPM or 10 RPM for a model may break some AIP features for affected users. The override form displays a warning when a configured limit falls below these recommended minimums.
Per-user overrides can be configured at three levels, applied in order of specificity:
If no override is configured at any level, the published per-user default for the model is used.

A user-group override applies to a named set of Foundry user groups. Each override has a name, an optional description, an optional default percentage, and an optional set of per-model limits. A user is matched against an override when they belong to any group listed in that override.
If a user belongs to groups covered by multiple overrides, the highest resulting limit among those overrides wins for that user and model. This makes overrides additive and predictable: granting a user a higher limit through one group will not be silently undone by their membership in another group with a lower configuration. For example, take an enrollment where the default user limit is 40%. User A is on two groups under two different user limits overrides, one where user limits are defined to 10%, and the other defined to 35% of the enrollment capacity. User A will have user limits of 35%, the highest among the overrides.
When a user-group override is removed, members of the affected groups fall back to the enrollment-wide per-user configuration. If the enrollment has no per-user configuration of its own, they fall back to the published per-user defaults.
Reserved capacity is an AIP LLM capacity management tool in Resource Management. Reserved capacity can secure tokens per minute (TPM) and requests per minute (RPM) for production workflows in addition to existing enrollment capacity. This aims to secure critical production workflows that should not be limited by project rate limits, enrollment limits, and other resources that compete over the same pool of tokens and RPM.

We cannot guarantee the availability of reserved capacity for all models at all times. This depends on the availability and offerings of model providers such as Azure, AWS, GCP, xAI, and others. We aim to offer reserved capacity on all industry-leading flagship models.
Reserved capacity has been sufficient for 99.9% uptime based on the performance of AIP in the past year. We cannot guarantee 100% capacity availability, but based on usage patterns in the past year, over 99% of LLM request failures were due to enrollment and project rate limits. These issues can be addressed and solved with the reserved capacity tool.
There is no extra cost for reserved capacity as a service; added costs will depend on additional token usage, as with all other LLM usage in AIP. This is subject to change in the future for new use cases or specific models. If this policy changes, we will not retroactively charge existing workflows for using reserved capacity; these workflows will continue to only incur charges based on additional token usage.
Palantir provides default reserved capacity on the latest LLMs in standard environments. Users with resource management administrator permissions can distribute this reserved capacity across specific projects.
Consider the following example to further understand reserved capacity usage:
Use the Analysis page to view the cost of LLM usage on your AIP-enabled enrollment.
From the Analysis page, select Filter by source: All LLMs and Group by source. This will generate a chart of daily LLM cost, segmented by model.

Generally, AIP prioritizes interactive requests over pipelines with batch requests. Interactive queries are defined as any real-time interaction that a user has with an LLM, such as Workshop, Chatbot Studio, preview in the AIP Logic LLM board, and preview in the Pipeline Builder LLM node. Batch queries are defined as a large set of requests sent without a user expecting an immediate response, for example Transforms pipelines, Pipeline Builder, Automate (for Logic).
This principle currently guarantees that 20% of capacity at the enrollment and project level will always be reserved for interactive queries. This means that for a 100,000 TPM capacity for a certain model, only a maximum of 80,000 TPM can be used for pipelines at any given minute, while at least 20,000 TPM (and up to 100,000 TPM) is available for interactive queries.
Consider the following example:
First, there is significant variance in the offering of different providers in terms of TPM, RPM, and regional availability. While AIP does leverage the capacity of all providers, Palantir cannot bypass limitations imposed by the various cloud service providers.
On top of that, LLM capacity provided to a customer by Palantir has a high bar of compliance requirements compared to the common offering from most providers. Palantir guarantees zero data retention (ZDR) and control over routing of data to specific regions (geo-restriction).
Most providers, namely Azure OpenAI, AWS Bedrock, GCP Vertex and Palantir-hosted models, all support geo-restrictions but also have smaller LLM capacity guarantees for geo-restricted requests. Other providers, such as OpenAI direct, Anthropic direct and xAI, offer their models in fewer regions.
As mentioned above, our medium to XL tiers are enough for large scale production workflows. Contact Palantir Support to change your tier.