Observability

As your workflow increases in complexity, it can become more difficult to ensure that every step is working as intended. Setting up observability tooling is an important step in creating a stable production workflow and detecting issues before they affect users.

The Palantir platform provides built-in observability tools that give you visibility into the status and health of many resource types. These tools can help you debug issues during development, as well as monitor pipeline and workflow stability in production.

Logs

Users can view logs from both third-party libraries being used to run their code (such as Kafka when using streams), as well as logs emitted by your code. Logs are available on several types of jobs:

Exporting organization logs to a stream

Log exporting may not be available in your enrollment. Contact Palantir Support for more information.

To allow for arbitrary processing outside the current capabilities of in-platform tools, you can create a stream in a specified folder containing all telemetry for an organization. This includes logs, metrics, and traces. Data in this stream can be analyzed using Foundry’s suite of data analysis tools or exported to third-party systems.

Log exporting through Control Panel.

Learn more about exporting logs to a stream on the Configure logging page.

Metrics

Metrics for long-running compute workloads allow you to monitor the health and stability of a stream or compute module over time.

Access metrics in the Metrics tab of both streams and compute modules.

Metrics tab.

Tracing

You can trace the execution of actions, functions, and other AIP tools using AIP trace views.

Alerting

Monitors in Foundry can alert you to a wide variety of preconfigured events and metric thresholds for Foundry resources.

Configuring monitors

There are two ways to configure monitors in Foundry:

  • Health checks: Configured directly on the resource type that relate to its specific behavior (such as validating a dataset schema or a job duration).
  • Monitoring rules: Configured as part of monitoring views that monitor all resources within a configured scope.

Receiving alerts

You can subscribe directly to health checks to receive a notification when an alert is raised.

You can choose to watch alerts and choose whether to be alerted for all failures, only critical, or turn alerts off.

You can also create a monitoring view to group and review alerts from both health checks and monitoring rules.

Alerts from this monitoring view can be sent in the following ways:

  • Foundry notifications: Users and groups can directly subscribe to a monitoring view to receive all alerts as Foundry notifications.
  • External systems: External systems such as PagerDuty and Slack can be integrated directly into a monitoring view.