The following documentation provides an example on how to train a scikit-learn binary classification model using the open source UCI ML Breast Cancer Wisconsin (Diagnostic) ↗ dataset in the Code Repositories application using the Model Training Template.
For a detailed walkthrough of the following steps, including how to author a model adapter and write Python transforms for model training, refer to our documentation on how to train a model in Code Repositories.
First, author a model adapter using the Model Training Template in Code Repositories.
The example logic below assumes the following:
model.columns, prediction, probability_0, and probability_1, where,
prediction is 0 or 1, with 0 being no cancer detected, and 1 being cancer detected.probability_0 is the probability that cancer was not detected.probability_1 is the probability that cancer was detected.foundry-transforms-lib-python,pandas 1.5.3, scikit-learn 1.3.2, and dill 0.3.7Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44import palantir_models as pm from palantir_models.serializers import * class SklearnClassificationAdapter(pm.ModelAdapter): @pm.auto_serialize( model=DillSerializer() ) def __init__(self, model): self.model = model @classmethod def api(cls): columns = [ 'mean_radius', 'mean_texture', 'mean_perimeter', 'mean_area', 'mean_smoothness', 'mean_compactness', 'mean_concavity', 'mean_concave_points', 'mean_symmetry', 'mean_fractal_dimension', 'radius_error', 'texture_error', 'perimeter_error', 'area_error', 'smoothness_error', 'compactness_error', 'concavity_error', 'concave_points_error', 'symmetry_error', 'fractal_dimension_error', 'worst_radius', 'worst_texture', 'worst_perimeter', 'worst_area', 'worst_smoothness', 'worst_compactness', 'worst_concavity', 'worst_concave_points', 'worst_symmetry', 'worst_fractal_dimension' ] inputs = {"df_in": pm.Pandas(columns=columns)} outputs = {"df_out": pm.Pandas(columns= columns + [ ("prediction", int), ("probability_0", float), ("probability_1", float) ])} return inputs, outputs def predict(self, df_in): X = df_in.copy() predictions = self.model.predict(X) probabilities = self.model.predict_proba(X) df_in['prediction'] = predictions for idx, label in enumerate(self.model.classes_): df_in[f"probability_{label}"] = probabilities[:, idx] return df_in
In the same repository in model_training/model_training.py, author the model training logic.
This example uses the open source UCI ML Breast Cancer Wisconsin (Diagnostic) dataset ↗ provided in the scikit-learn library.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41from transforms.api import transform from palantir_models.transforms import ModelOutput from main.model_adapters.adapter import SklearnClassificationAdapter from sklearn.datasets import load_breast_cancer from sklearn.compose import make_column_transformer from sklearn.impute import SimpleImputer from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier @transform.using( model_output=ModelOutput("/path/to/model_asset"), ) def compute(model_output): X_train, y_train = load_breast_cancer(as_frame=True, return_X_y=True) X_train.columns = X_train.columns.str.replace(' ', '_') columns = X_train.columns numeric_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler()) ] ) preprocessor = make_column_transformer( (numeric_transformer, columns), remainder="passthrough" ) model = Pipeline( steps=[ ("preprocessor", preprocessor), ("classifier", RandomForestClassifier(n_estimators=50, max_depth=3)) ] ) model.fit(X_train, y_train) foundry_model = SklearnClassificationAdapter(model) model_output.publish(model_adapter=foundry_model)
You can run inference with your model in a Python transform. For example, once your model has been trained, copy the below inference logic into the model_training/run_inference.py file and select Build.
To run a model within a transform repository in which the model was not defined, set use_sidecar = True in ModelInput. This will automatically import the model adapter and its dependencies, while running them in a separate environment to prevent dependency conflicts. Review the ModelInput class reference for more details.
If use_sidecar is not set to True, the model adapter and its dependencies must be imported into or defined within the current code repository.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16from transforms.api import transform, Output, LightweightOutput from palantir_models.transforms import ModelInput from palantir_models import ModelAdapter from sklearn.datasets import load_breast_cancer @transform.using( model=ModelInput("ri.models.main.model.cfc11519-28be-4f3e-9176-9afe91ecf3e1"), inference_output=Output("ri.foundry.main.dataset.5dd9907f-79bc-4ae9-a106-1fa87ff021c3"), ) def compute(model: ModelAdapter, inference_output: LightweightOutput): X, y = load_breast_cancer(as_frame=True, return_X_y=True) X.columns = X.columns.str.replace(' ', '_') inference_results = model.transform(X) inference_output.write_pandas(inference_results.df_out)
A Palantir model can be submitted to a modeling objective for the following:
After submitting this model to a modeling objective, you can create a release to host this model for live inference. Once the deployment is ready, you can perform live inference and connect this model to an operational application.
The example below shows input for the binary classification model using the single I/O endpoint:
[
{
"mean_radius": 15.09,
"mean_texture": 23.71,
"mean_perimeter": 92.65,
"mean_area": 944.07,
"mean_smoothness": 0.53,
"mean_compactness": 0.21,
"mean_concavity": 0.76,
"mean_concave_points": 0.39,
"mean_symmetry": 0.08,
"mean_fractal_dimension": 0.14,
"radius_error": 0.49,
"texture_error": 0.82,
"perimeter_error": 2.51,
"area_error": 17.22,
"smoothness_error": 0.07,
"compactness_error": 0.01,
"concavity_error": 0.05,
"concave_points_error": 0.05,
"symmetry_error": 0.01,
"fractal_dimension_error": 0.08,
"worst_radius": 12.95,
"worst_texture": 20.66,
"worst_perimeter": 185.41,
"worst_area": 624.87,
"worst_smoothness": 0.18,
"worst_compactness": 0.26,
"worst_concavity": 0.01,
"worst_concave_points": 0.05,
"worst_symmetry": 0.29,
"worst_fractal_dimension": 0.05
}
]