This guide will walk you through the process of connecting your organization's data to Foundry.
Before starting, it is important to recognize that this first step towards connecting your organizational data to Foundry is fundamentally a networking concept. The initial setup is best done by someone familiar with network engineering and aware of the organization's network topology and configurations, such as firewall rules.
Connecting data to Foundry requires that three components are configured in the following order:
You must first ensure that there is a valid networking path between Foundry and the external system.
For external systems hosted on the same network from Foundry's network (for cloud-hosted Foundry instances, this is typically systems accessible over the Internet), define direct connection egress policies to route the traffic. Make sure that the external system allows inbound connections from Foundry.
For external systems hosted on a separate network from Foundry's network (for cloud-hosted Foundry instances, this is typically on-premise systems), an agent is required. An agent is Palantir software that runs within your organization’s network and functions as a secure intermediary between your organization’s data sources and your Foundry instance. Make sure the external system allows inbound connections from the agent and that the agent can establish outbound connections to the external system as well as to Foundry.
The agent can then be used to define agent proxy egress policies to route traffic through the agent when using a Foundry worker.
Additionally, an agent can be used as an agent worker itself, where data connection capabilities are executed on the agent directly. However, we generally do not recommend this method as executing capabilities on agents comes with many known limitations.
Agents can be shared by agent worker sources and sources using agent proxy egress policies, though we recommend always having multiple agents assigned to a given source to maximize availability.
Learn more about various architecture patterns.
You must configure a source, or connection, to connect your external system to Foundry before executing any capability. An external system could be, for example, a Postgres database, an S3 bucket, a filesystem on a Linux server, an SAP instance, or a REST API accessible over the Internet.
Once a source is configured to connect to the external system, you must configure the capability to execute on that source. Capabilities include batch syncs of data, streaming syncs, webhooks, exports, and more.
A batch sync, for example, reads specific data from an external system and ingests it into Foundry. If you have a PostgreSQL database that contains multiple tables, you might configure a sync to ingest one specific table into Foundry. Once a sync has successfully run, the result in Foundry will be a dataset to use across all of Foundry's data pipelining, model development, and analytical tools.
Most Foundry users will never need to set up a new agent themselves. Agent setup requires an IT-focused skill set, though the same agent can be reused to support multiple sources and syncs. Some organizations can operate long-term with agents set up during the first week of a Foundry deployment. New agents are only needed to access data that your existing agents cannot access (due to network segmentation or data scale, for example) or to set up an additional agent to allow for high availability.
The table below summarizes the configuration frequency and skill set required for maintaining the resources required for connecting to data:
Resource | Frequency of configuration | Typical user role | Knowledge required |
---|---|---|---|
Agent | Rare | IT / Network Engineer | Network and firewall policies; Linux VMs; SSH |
Source | Occasional | IT / Network Engineer; Data Engineer | Debugging network access; credential management |
Sync | Frequent | Data Engineer; Data Scientist | Writing SQL queries; managing files |
We recommend setting up redundant hardware to establish a high availability (HA) architecture. High availability increases resiliency and allows no-downtime maintenance during operating hours.
Foundry offers HA at the source level, meaning that if a source is assigned to multiple agents, Foundry will dispatch ingestions to one of the healthy agents. We strongly recommend configuring agents in a high availability setup at the start of source creation; adding extra agents to a created source requires re-entering the credentials for that source.
The following best practices are recommended when setting up high availability:
agent-1
and agent-2
.To get started, move on to setting up a source.