Train a model in Code Repositories

Models can be trained in the Code Repositories application with the Model Training Template.

To train a model, complete the following steps:

  1. Author a model adapter
  2. Write a Python transform to train a model
  3. Build your Python transform to publish the trained model
  4. Consume the model

1. Author a model adapter

A model uses a model adapter to ensure Foundry correctly initializes, serializes, deserializes, and performs inference on a Model. You can author model adapters in Code Repositories from either the Model Adapter Library template or the Model Training template.

Model adapters cannot be authored directly in Python transforms repositories. To produce a model from an existing repository, use a Model Adapter Library and import the library into the transforms repository or migrate to the Model Training template.

Read more about when to use each type of model adapter repository, and how to create them in the Model Adapter creation documentation. The steps below assume usage of the Model Training template.

Model adapter implementation

Model Training Template Default Structure in Code Repositories

The model adapter and model training code should be in separate Python modules to ensure the trained model can be used in downstream transforms. In the template, we have separate model_adapters and model_training modules for this purpose. Author your model adapter in the adapter.py file.

The model adapter definition will be dependent on the model being trained. To learn more, consult the ModelAdapter API reference, review the example sklearn model adapter, or read through the supervised machine learning tutorial to learn more.

2. Write a Python transform to train a model

Next, in your code repository, create a new Python file to house your training logic.

Copied!
1from transforms.api import transform, Input, lightweight 2from palantir_models.transforms import ModelOutput 3from model_adapter.example_adapter import ExampleModelAdapter # This is your custom ModelAdapter 4 5 6def train_model(training_data): 7 ''' 8 This function contains your custom training logic. 9 ''' 10 pass 11 12 13# Use the lightweight decorator unless your model natively supports Spark inputs 14# (i.e. uses Spark ML). Without it, the input data will first be loaded in Spark 15# and will require conversion to Pandas for use in the model, which is a slow and 16# memory-intensive operation. 17 18# Lightweight requires the foundry-transforms-lib-python package 19# and for the repository to be up-to-date. 20# https://palantir.com/docs/foundry/code-repositories/repository-upgrades 21@lightweight() 22@transform( 23 training_data=Input("/path/to/training_data"), 24 model_output=ModelOutput("/path/to/model") # This is the path to the model 25) 26def compute(training_data, model_output): 27 ''' 28 This function contains logic to read and write to Foundry. 29 ''' 30 trained_model = train_model(training_data) # 1. Train the model in a Python transform 31 wrapped_model = ExampleModelAdapter(trained_model) # 2. Wrap the trained model in your custom ModelAdapter 32 model_output.publish( # 3. Save the wrapped model to Foundry 33 model_adapter=wrapped_model # Foundry will call ModelAdapter.save to produce a model 34 )

This logic is publishing to a ModelOutput. Foundry will automatically create a model resource at the provided path after you commit your changes. You can also configure the required resources for model training such as CPU, memory, and GPU requirements with the @configure annotation.

(Optional) Log metrics and hyperparameters to a model experiment

Model experiments is a lightweight framework for logging metrics and hyperparameters produced during a model training run, which can then be published alongside a model and persisted in the model page.

Learn more about creating and writing to experiments.

3. Preview your Python transform to test your logic

In the Code Repositories application, you can select Preview to test your transform logic without running a full build. Note that preview will run on a smaller resource profile than you may have otherwise configured with the @configure annotation.

ModelOutput preview

ModelOutput preview allows you to validate your model training logic as well as your model serialization, deserialization, and API implementation.

Model output preview in the Code Repositories application.

ModelInput preview

ModelInput preview allows you to validate your inference logic against an existing model. Note that for preview in Code Repositories, there is a 5GB size limit for every ModelInput.

Model input preview in the Code Repositories application.

4. Build your Python transform to publish the trained model

In your code repository, select Build to run your transform. Foundry will resolve both the Python dependencies and dependencies of your model before executing your training logic.

Build a model in Code Repositories.

Calling ModelOutput.publish() will publish a version of your model to Foundry. Foundry will call the ModelAdapter.save() function and give your ModelAdapter the ability to serialize all required fields for execution.

5. Consume the model

Submit to a Modeling Objective

A model can be published to a modeling objective for:

Run inference in Python Transforms

You can also use the Palantir model in a pipeline as detailed below.

We recommend using Lightweight transforms for modeling jobs which do not require Spark, such as models built from the scikit-learn, xgboost or keras libraries, for the following reasons:

  • The default transform profile includes executors and overhead which will not be used unless your code explicitly handles distributing inference over Spark. Consider using a DistributedInferenceWrapper if you want to leverage Spark executors for distributed inference.
  • Foundry transforms load data in Spark DataFrames by default, which will require a slow and memory-intensive conversion to the data representation expected by your model, such as a pandas DataFrame.

Lightweight transforms address these issues by executing without Spark and reading data directly into popular formats like pandas, pyarrow, or polars. To use lightweight transforms with models, decorate your transform with @lightweight and import the foundry-transforms-lib-python package into your repository. Additionally, make sure to upgrade your repository and palantir_models version to at least 0.1640.0.

To run a model within a transform repository in which the model was not defined, set use_sidecar = True in ModelInput. This will automatically import the model adapter and its dependencies while running them in a separate environment to prevent dependency conflicts. use_sidecar is unavailable for Lightweight transforms. Review the ModelInput class reference for more details.

If use_sidecar is not set to True, the model adapter and its dependencies must be imported into or defined within the current code repository.

If you are using a model in a different Python transform repository from the repository in which the model was created, you must add the model adapter Python library to your authoring Python environment. This brings the Python packages required to load the model into the consuming repository's environment. The code authoring user interface will detect if a model's dependencies are not present in the repository and offer to perform the library import when hovering over the warning. The adapter library and its version corresponding to a specific model version can be found on the model page in Foundry.

Import dependencies if the model is from another repository.

Copied!
1from transforms.api import transform, Input, Output, lightweight 2from palantir_models.transforms import ModelInput 3 4 5# Use the lightweight decorator unless your model natively supports Spark inputs 6# (i.e. uses Spark ML). Without it, the input data will first be loaded in Spark 7# and will require conversion to Pandas for use in the model, which is a slow and 8# memory-intensive operation. 9 10# Lightweight requires the foundry-transforms-lib-python package 11# and for the repository to be up to-date. 12# https://palantir.com/docs/foundry/code-repositories/repository-upgrades 13@lightweight() 14@transform( 15 inference_output=Output("/path/to/inference_output"), 16 inference_input=Input("/path/to/inference_input"), 17 model=ModelInput("/path/to/model"), 18) 19def compute(inference_output, inference_input, model): # model will be an instance of ExampleModelAdapter 20 inference_results = model.transform(inference_input) # inference_results will be the returned result of predict() or run_inference() method. 21 # Replace "output_data" with an output specified in the Model Version's API, viewable on the Model Version's web page. 22 # For example, inference_results.output_data is appropriate output for Hugging Face adapters. 23 inference = inference_results.output_data 24 inference_output.write_pandas(inference)

ModelInput and ModelOutput APIs

The ModelInput and ModelOutput objects invoked in the above transforms are the objects directly responsible for interacting with models in Code Repositories. A summary of the objects is as follows:

ModelInput

Copied!
1# ModelInput can be imported from palantir_models.transforms 2 3class ModelInput: 4 def __init__(self, model_rid_or_path: str, model_version: Optional[str] = None): 5 ''' 6 The `ModelInput` retrieves an existing model for use in a transform. It takes up to two arguments: 7 1. A path to a model (or the model's RID). 8 2. (Optional) A version RID of the model to retrieve. 9 If this is not provided, the most recently published model will be used. 10 11 For example: ModelInput("/path/to/model/asset") 12 ''' 13 pass

In the transform function, the model specified by the ModelInput will be instantiated as an instance of the model adapter associated with the retrieved model version. Foundry will call the ModelAdapter.load or use the defined @auto_serialize instance to set up the model before initiating the transform build. Thus, the ModelInput instance has access to its associated loaded model state and all methods defined in the model adapter.

ModelOutput

Copied!
1# ModelOutput can be imported from palantir_models.transforms 2 3class ModelOutput: 4 def __init__(self, model_rid_or_path: str): 5 ''' 6 The `ModelOutput` is used to publish new versions to a model. 7 `ModelOutput` takes one argument, which is the path to a model (or model RID). 8 If the asset does not yet exist, the ModelOutput will create the asset when a user selects Commit or Build and transforms checks (CI) are executed. 9 ''' 10 pass

In the transform function, the object retrieved by assigning a ModelOutput is a WritableModel capable of publishing a new model version through the use of the publish() method. This method takes a model adapter as a parameter and creates a new model version associated with it. During publish(), the platform uses the defined @auto_serialize instance or executes the implemented save() method. This allows the model adapter to serialize model files or checkpoints to the state_writer object.