The Use LLM node in Pipeline Builder offers a convenient method for executing Large Language Models (LLMs) on your data at scale. The integration of this node within Pipeline Builder allows you to seamlessly incorporate LLM processing logic between various data transformations, simplifying the integration of LLMs into your pipeline with no coding required.
The Use LLM node includes pre-engineered prompt templates. These templates provide a beginner-friendly start to using LLMs that leverages the expertise of experienced prompt engineers. You can also run trials over a few rows of your input dataset to iterate on your prompt before running your model on an entire dataset. This preview functionality computes in seconds, speeding up the feedback loop and enhancing the overall development process.
To use, users must be granted permission for AIP capabilities for custom workflows by a platform administrator.
To apply an LLM to a dataset, select a dataset node in your workspace and select Use LLM.
Below are different examples of the available template prompts. To create your own, select Empty prompt.
You should use the classification prompt when you want to categorize data into different categories.
The example below demonstrates how the prompt would be filled out for our notional objective of classifying restaurant reviews into three categories: Service, Food, and Atmosphere.
The Multiplicity field allows you to choose whether you want the output column to have one category, multiple categories, or an exact number of categories. In our example, we want to include all the categories a review could fall in to, so we will choose the One or more categories option.
In the Context field, enter a description for your data. In our example, we will input Restaurant Review
.
In the Categories field, input the distinct categories to which you want to assign your data. In our example we specify the three categories: Food
, Service
, and Atmosphere
because we want to categorize our restaurant reviews into any of these three categories.
In the Column to classify field, choose the column that contains the data you want to classify. In our example, we choose the review
column because that is the column containing our restaurant reviews.
You can use the summarization template to summarize your data to a given length.
In this template, you can specify the length of the summarization. You can choose the number of words, sentences, or paragraphs and specify the size in the Summarization size field.
In our example, we want a one sentence summary of the restaurant review, so we specify 1
as the summarization size, and we choose Sentences from the dropdown.
To translate your data into a different language, use the translation prompt. Specify the language you want to translate the data to in the Language field. In our example below, we want to translate the restaurant reviews to Spanish, so we specify Spanish
under the Language field.
Use the sentiment analysis prompt when you want to assign a numeric score to your data based on its positive or negative sentiment.
In this template, you can configure the scale of the output score. For our example below, we want a number from zero to five where five denotes a review being the most positive and zero being the most negative.
Use the entity extraction prompt when there are specific elements you want to extract from your data. In our example, we want to extract all the food
, service
, and times visited
elements in our restaurant reviews.
In particular, we want to extract all food elements in a String Array, the service quality as a String, and an Integer denoting the number of times that person has visited the restaurant.
To obtain those results, we update the Entities to extract field. Enter food
, service
, and number visited
under Entity name with the following properties:
food
, specify an Array
for the Type and select String
as the type for that array.service
, select String
as the typenumber visited
, select Integer
.The LLM output is now configured to conform to our specified types for this example.
You can also adjust the types of the extracted entities within the struct under the Output type on the prompt page.
If none of the prompt templates fit your use case, you can create your own by selecting Empty prompt.
Currently, Pipeline Builder's GPT-4o model supports prompts that require vision capabilities. This means the model can take in images, analyze them, and answer questions based on the visual input.
To use this vision functionality, enter the media reference column in the Provide input data section of an empty prompt template and select GPT-4o as the model.
Currently, the vision prompt does not support media sets as a direct input. Use the Convert Media Set to Table Rows transform to get the mediaReference
column that you can feed into the Use LLM node.
On the prompt page, you can designate the desired output type for your LLM output to conform to. Select the Output type option located near the bottom of the screen, then choose the preferred type from the dropdown menu.
Also on the prompt page, you can configure your output to show the LLM errors alongside your output value. This configuration will change your output type to a struct consisting of your original output type and the error string. To include the LLM error, tick the box next to Include errors.
To change your output back to the original output without errors, untick the Include errors box.
This is a new feature that may not yet be available on all enrollments.
To save on compute costs and time, you can skip computing already processed rows by toggling Skip recomputing rows.
When Skip recomputing rows is enabled, rows will be compared with previously processed rows based on the columns and parameters passed into the input prompt. Matching rows with the same column and parameter values will get the cached output value without reprocessing in future deployments.
The cache can be cleared if changes that require all rows to be recomputed are made to the prompt. A warning banner will appear over the LLM view.
To clear the cache, select the red wastebasket icon. If the cache is cleared, all rows will be reprocessed in the next deployment.
The cache will automatically be cleared if the output type is changed. When this happens, a warning banner will appear. If this was a mistake, you can select undo change in the banner.
Any changes to the cache's state will show up in the Changes page when merging a branch.
If a use LLM node with multiple downstream outputs has Skip recomputing rows enabled, you must put these outputs in the same job group. Otherwise, you will get the following error when attempting to deploy:
Create a new job group outside of the default job group to fix this error.
For every prompt, you can configure the model being used for that Use LLM node:
3.5
or 4
. The Use LLM node also supports open source models like Mistral AI's Mixtral 8x7b.At the bottom of each Use LLM board, you have the option to test out your specific LLM with examples. Select the Trial run tab and enter the value you want to test on the left hand side. Then select Run.
To test out more examples, you can select Add trial run.
To add examples directly from your input data, navigate to the Input table tab and select the rows you want to use in your trial run. Select Use rows for trial run, then you will automatically be brought back to the Trial run tab with the rows that you selected, populated as trial runs.
If you use one of the five templates, you can preview the LLM prompt instructions before creating the prompt by selecting the Preview tab. You will only be able to view but not edit the instructions in the Preview tab. If you want to edit the template, go back to the Configure tab.
You should select Create prompt to edit the name of the new output column and preview the results in the Output column.
Once you select Create prompt, you will not be able to go back to the template for that particular board.
To change the output column name, edit the Output column section. To view your changes applied to the output table preview, select Applied. To preview the output table, select the Output table tab. This preview will only show the first 50 rows.
Finally, when you are finished configuring your Use LLM node, select Apply in the top right. This allows you to add transform logic to the output of your LLM board, and to view the preview of the first 50 rows when you select the LLM board in the main Pipeline Builder workspace.