External transforms

Data Connection sources can now be directly imported into code repositories and are the preferred method to interact with external systems, superseding credentials-based legacy external transforms

External transforms allow connections to external systems from Python transforms repositories.

External transforms are primarily used to perform batch sync, export, and media sync workflows when one of the following is true:

  • An existing Data Connection source type is not available.
  • The desired capability is not available for the target source type.
  • The capability offered through the Data Connection user interface does not have the desired features.

Solutions to these situations may include the following:

  • Connecting to REST APIs, both over the Internet and within a private network.
  • Connecting to databases to arrange customized query logic not currently possible in the Data Connection user interface.
  • Transforming data as needed during sync or export. This could include batching files together before writing to Foundry, handling custom encryption/decryption of data during transfer, and more.

Any transforms that use virtual tables are also considered to be external transforms, since the transforms job must be able to reach out to the external system that contains the virtualized data. To use virtual tables in Python transforms, follow the instructions below for details on how to set up the source.

Setup guide

In this setup guide, we will walk through creating a Python transforms repository that connects to the free public dictionary API ↗. The examples then use this API to explain various features of external transforms and how they can be used with the API.

The dictionary API used in this setup guide is unaffiliated with Palantir and may change at any time. This tutorial is not an endorsement, recommendation, or suggestion to use this API for production use cases.

Prerequisite: Create a Python transforms repository

Before following this guide, be sure to first create a Python transforms repository and review how to author Python transforms as described in our tutorial. All features of Python transforms are compatible with external transforms.

Prerequisite: Create a Data Connection source

Before you can connect to an external system from your Python repository, you must create a Data Connection source that you can import into code. For this tutorial, we will create a REST API source that connects to the dictionary API mentioned above.

Option 1: Create source in the external systems sidebar

The quickest way to create a source for use in external transforms is from a Python transforms code repository. Once you have initialized a repository, complete the following steps to set up a generic source:

  1. From the left side panel, open the External systems tab.
  2. Select Add > Create new.

Select "Create new" to create a new generic connector from Code Repositories.

  1. Choose a name for your source and a Project in which to store it. Upon creation, the newly created source will show up in the left side panel. Any egress policies, secrets and exportable markings can be directly configured from this panel.

Newly created generic connector from Code Repositories

  1. For this tutorial, you should add an egress policy for the dictionary API: api.dictionaryapi.dev. You will not need any secrets since this API does not require authentication, and export controls may be skipped for now. However, they will be required to use Foundry data inputs with this source.

  2. Since this connection is to a REST API, you will be automatically prompted to convert your generic connector to a REST API source so that you can use the built-in Python requests client.

Option 2: Create a source in Data Connection

You may also create a source from the Data Connection application or use an existing source you have already configured. To use this option, follow the steps below:

  1. Navigate to the Data Connection application within Foundry and choose New Source. From the list of options, select REST API.

Data connection new source page with a red box around the REST API card

  1. Review the Overview page, then select Continue in the bottom right. You will be prompted to choose the connection runtime: a direct connection, through an agent worker, or through an agent proxy. Since agent worker connections are not supported for external transforms, choose to use a direct connection to the Internet or an agent proxy to connect to the dictionary API.

  2. Choose a name for your source, and select a Project to which it should be saved.

  3. Fill out the Domains section with the connection information of the API source. The configuration for the dictionary API example is shown below:

REST API source creation page showing configuration to connect to api.dictionaryapi.dev without any authentication

  1. For this example, we also need to create the necessary egress policy. The policy will be automatically suggested in the Network Connectivity section if you completed the previous step:

Suggested egress panel showing a suggested policy for api.dictionaryapi.dev on port 443

  1. Select Save, then Save and continue to complete the source setup.

Prerequisite: Import the transforms-external-systems library in your repository

To use external transforms, you must first import the transforms-external-systems library in your repository. Libraries are installed using the Libraries tab in the left side panel, searching for the desired library, then selecting Install.

Code repository showing the transforms-external-systems library installed.

Learn more about installing and managing libraries..

Prerequisite: Import a source into code

REST API sources with multiple domains may not be imported. Instead, you should create a separate REST API source per domain if multiple domains are required in the same external transform.

  1. First, you must allow the REST API source to import into code. To configure this setting, navigate to the source in Data Connection, then to the Connection settings > Code import configuration tab.

  2. Toggle on the option to Allow this source to be imported into code repositories. Any code repositories that import this source will be displayed on this page.

Dictionary API source configuration options in data connection, showing the panel for code import configuration with code imports toggled on.

  1. You are now ready to return to your code repository and import the source. In the repository, navigate to the left side panel and select the External Systems tab represented by the globe icon. Within the side panel, select Add, then search for the Dictionary API source that you previously created. Select this source, then Confirm selection to import.

Dialog for importing the Dictionary API source into a Python transforms repository.

You must have at least Editor access to the source to be able to import it in the repository. Read more about permissions

Write external transforms

Once you set up a Python transforms repository that imports your Dictionary API source, you are ready to start writing Python transforms code that uses the source to connect externally.

Review our external transforms examples to find fully configured examples of typical read or write workflows on top of common systems.

Import and configure the @external_systems decorator

To use external transforms, you must import external_systems decorator and Source object from the transforms.external.systems library:

Copied!
1 from transforms.external.systems import external_systems, Source

You should then specify the sources that should be included in a transform by using the external_systems decorator:

Copied!
1 2 3 @external_systems( dictionary_source=Source("ri.magritte..source.e301d738-b532-431a-8bda-fa211228bba6") )

Sources will automatically be rendered as links to open in Data Connection and will display the source name instead of the resource identifier.

Access source attributes and credentials

Once a source is imported into your transform, you can access attributes of the source using the built-in connection object using the get_https_connection() method. The example below shows how we can grab the base URL of the Dictionary API source we configured in the previous step.

Copied!
1 dictionary_api_url = dictionary_api_source.get_https_connection().url

Additional secrets or credentials stored on the source can also be accessed from the source. To identify the secret names that can be accessed, navigate to the left panel in your transform.

Left panel showing the Dictionary API source details.

Use the following syntax to access secrets in code:

Copied!
1 dictionary_api_source.get_secret("additionalSecretFoo")

Currently, it is not possible to access source attributes that are not credentials unless the source provides an HTTPS client. For example, on a PostgreSQL source you will not be able to access the hostname or other non-secret attributes.

Use the built-in HTTP client

For sources that provide a RESTful API, the source object allows you to interact with a built-in HTTPS client. This client will be pre-configured with all of the details specified on the source, including any server or client certificates, and you can simply start making requests to the external system.

Copied!
1 2 3 4 5 6 dictionary_api_url = dictionary_api_source.get_https_connection().url dictionary_api_client = dictionary_api_source.get_https_connection().get_client() # dictionary_api_client is a pre-configured Session object from Python `requests` library. # Example of GET request: response = dictionary_api_client.get(dictionary_api_url + "/api/v2/entries/en/" + word, timeout=10)

Alternatively, you can use your own client or source-specific python libraries, and use the source object to retrieve attributes and credentials.

When connecting to an on-premise system using an agent proxy, you must use the built-in client, since that will be automatically configured with the necessary agent proxy configuration.

Example: Import data from the Dictionary API

The below example illustrates a complete transform that runs through a list of words and retrieves their phonetic transcription from the Dictionary API.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 from pandas import DataFrame from transforms.api import transform from transforms.api import Output, Input, TransformContext, transform_pandas from transforms.external.systems import external_systems, Source import pandas as pd import logging logger = logging.getLogger(__name__) @external_systems( dictionary_api_source=Source( "<source_rid>" ) ) @transform_pandas(Output("<output_dataset_rid>")) def compute(dictionary_api_source) -> DataFrame: dictionary_api_url = dictionary_api_source.get_https_connection().url dictionary_api_client = dictionary_api_source.get_https_connection().get_client() words = ["apple", "dog", "cat"] phonetics = [] for word in words: logger.info("Fetching word from api.dictionaryapi.dev : " + word) response = dictionary_api_client.get( dictionary_api_url + "/api/v2/entries/en/" + word ).json() phonetics += [{"word": word, "phonetic": response[0]["phonetic"]}] return pd.DataFrame(phonetics)

Use Foundry inputs in external transforms

External transforms often need to use Foundry input data. For example, you might want to query an API to gather additional metadata for each row in a tabular dataset. Alternatively, you might have a workflow where you need to export Foundry data into an external software system.

Such cases are considered export-controlled workflows, as they open the possibility of exporting secure Foundry data into another system with unknown security guarantees and severed data provenance. When configuring a source connection, the source owner must specify whether or not data from Foundry may be exported, and provide the set of security markings and organizations may be exported. Foundry provides governance controls to ensure developers can clearly encode security intent, and Information Security Officers can audit the scope and intent of workflows interacting with external systems.

Configure export controls on the source

Exports are controlled using security markings. When configuring a source, the export configuration is used to specify which security markings and organizations are safe to export to the external system. This is done by navigating to the source in the data connection application, and then navigating to the Connection settings > Export configuration tab. You should then toggle on the option to Enable exports to this source and select the set of markings and organizations that may potentially be exported.

Doing this requires permission to remove markings on the relevant data and Organizations, since exporting is considered equivalent to removing markings on data within Foundry.

The setting to Enable exports to this source must be toggled on to allow the following:

  • Use datasets, media sets, and streams as an input to Python transforms code importing this source.
  • Use virtual tables registered on this source in Python transforms.

Below you can see an example export configuration for the Dictionary API source, allowing data from the Palantir organization with no additional security markings to be exported to the Dictionary API:

Data connection settings showing the export configuration for Dictionary API source with enable exports to this source toggled on

Note that Enable exports to this source must be toggled on even if you are not actually exporting data to this system, since allowing Foundry data inputs into the same compute job with an open connection to this system means that data could be exported.

Example: Use Foundry imports alongside data from the Dictionary API

In this example, we use an input dataset of words instead of a static hard-coded list. It also illustrates basic error handling based on the status code of the response.

Copied!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 from pandas import DataFrame from transforms.api import transform from transforms.api import Output, Input, TransformContext, transform_pandas from transforms.external.systems import external_systems, Source, ResolvedSource import pandas as pd import logging logger = logging.getLogger(__name__) @external_systems( dictionary_api_source=Source( "<source_rid>" ) ) @transform_pandas( Output("<output_dataset_rid>"), words_df=Input("<input_dataset_rid>"), ) def compute(dictionary_api_source: ResolvedSource, words_df: DataFrame) -> DataFrame: dictionary_api_url = dictionary_api_source.get_https_connection().url dictionary_api_client = dictionary_api_source.get_https_connection().get_client() words = words_df["word"].tolist() phonetics= [] for word in words: logger.info("Fetching word from api.dictionaryapi.dev: " + word) response = dictionary_api_client.get( dictionary_api_url + "/api/v2/entries/en/" + word ) if response.status_code == 200: data = response.json()[0] if "phonetic" in data: phonetic_transcription = data["phonetic"] else: logger.warning(f"No phonetic transcription found for {word}.") phonetic_transcription = None else: logger.warning(f"Request for {words} failed with status code {response.status_code}.") phonetic_transcription = None phonetics += [{"word": word, "phonetic": phonetic_transcription}] return pd.DataFrame(phonetics)

End-to-end examples

Review the documentation below the find complex end-to-end examples for common systems:

Permissions

Before using external transforms, make sure to familiarize yourself with the Data Connection - Permissions reference page.

Comparison of external transforms and legacy external transforms

The following are some key workflow differences between external transforms and legacy external transforms:

  • The tab for importing sources will always automatically show for external transforms. Previously, tabs for adding egress policy and credentials would only show after an Information Security Officer had toggled on the ability to use external systems in repository settings.
  • Settings to allow external connections and the use of inputs are no longer located in repository settings. Instead, these are controlled on each individual source.
  • Credentials, egress policies, and exportable markings are no longer specified in code. Instead, these settings are taken from the sources that are imported into the transform and applied automatically to the job.
    • If this configuration is changed at the source level, it will automatically be picked up by transforms that import the source without any code change or version bump required. This allows centralized governance of credentials, egress, and exportable Markings which will propagate immediately to downstream workflows.
    • Changes will take effect as of the start of a build and will not affect running builds.
  • The decorator has changed from @use_external_systems() to @external_systems().

Key advantages of external transforms include the following:

  • Support for connecting to systems not accessible from the Internet
  • Support for rotating/updating credentials without requiring code changes
  • Support for sharing connection configuration across multiple repositories
  • Out-of-the-box Python clients for selected source types
  • Improved and simplified governance workflows for enabling and managing external transform repositories
  • Visualization of external transforms connected to external sources in Data Lineage

Migrate to external transforms

There is currently no automatic migration path to update external transforms to external transforms. However, the manual action required is expected to be minimal for most workflows.

The following are the main steps to manually migrate to external transforms:

  1. Identify the set of credentials, egress policies, and export control Markings used in your existing legacy external transforms code.
  2. Identify or configure Data Connection sources that connect to the systems you wish to connect to from your external transforms. Ensure these sources are configured to allow imports into code.
  3. Import the relevant sources from step 2 into your existing Python transforms repository.
  4. Change your code to import and use the new @external_systems() decorator with source references, then remove any instances of the @use_external_systems() decorator. This will likely involve updating any references to credentials in your transforms logic to instead reference credentials retrieved from the sources you are now importing.
  5. Test your changes on a branch to ensure that your transforms continue to build successfully.
  6. After merging your updated transforms code, you can now un-toggle the repository settings.

Transforms cannot contain both external transforms and their legacy version. To remedy this, you can migrate all legacy external transforms to use source-based external transforms instead (preferred), or split your transform into multiple transforms. Transforms can be split into one that uses the use_external_systems decorator and another that uses the external_systems decorator.

Capabilities

lightweight external transforms

External transforms are compatible with the @lightweight decorator. Using this decorator can dramatically increase the execution speed for transforms operating on small and medium-sized data.

The below example shows how the @lightweight decorator can be added to a transform along with the @external_systems decorator. For more information on the options for configuring lightweight transforms, see the lightweight transforms documentation.

Copied!
1 2 3 4 @lightweight @external_systems( dictionary_api_source=Source("<source_rid>") )

For more in-depth examples, refer to sources in Python.