Helm chart rollout strategies and timeouts

This page outlines rollout strategies and timeouts for Helm charts. Details on how these work and how they can be configured are outlined below.

Rollout strategies

Apollo supports two rollout strategies for Helm charts:

manageRollout (default): Apollo helm-chart-operator will apply Kubernetes resource changes and then wait for all resources to become ready before succeeding a Plan. If either the Kubernetes resource changes fail to apply successfully or resources fail to become ready within the resource readiness timeout (for example, all pods for a deployment are not ready), helm-chart-operator will rollback the changes and fail the Plan.
- We recommend using this rollout strategy in production because it will rollback bad changes impacting service availability as quickly as possible.
applyChangesNoWait: Apollo helm-chart-operator will apply Kubernetes resource changes and then immediately succeed the Plan, regardless of the "readiness" of the created resources. For example, with this strategy helm-chart-operator will not fail a Plan if pods do not become ready within the timeout. If the Kubernetes resource change fails to apply, helm-chart-operator will still rollback the changes and fail the Plan.
- This strategy can be useful when iterating to get a new service working, where availability of the service is not yet a concern. As long as the Kubernetes resources apply successfully, it will leave them in place, making it easier to debug why a change failed.
- Note that there is a five minute timeout for Helm install and upgrade, so long-running Helm hooks may still cause a Plan to time out if they run longer than this.

You can configure these strategies in two ways:

On a global basis, override the capabilities.defaultRolloutStrategy field in helm-chart-operator configuration.
On a per-chart basis, set rollout-strategy in a Helm chart's manifest extensions. For example, you could set rollout-strategy: applyChangesNoWait. Using the Apollo CLI, this can be set with the --rollout-strategy flag when initializing a new Helm chart Product.

You may find it useful to globally override the default rollout strategy when first iterating on a new Environment. However, we recommend reverting to the default managed rollout strategy before moving to production. After that, override on a per-chart basis as needed.

Resource readiness timeouts during managed rollouts

The resource readiness timeout refers to how long helm-chart-operator will wait for resources to become ready after applying the resource changes to Kubernetes when the rollout strategy is managed. This timeout is distinct from the time it takes for Helm to apply changes, including any Helm hooks. Helm operations have their own 5-minute timeout.

You can configure the default timeout for resource readiness based on Plan type, either install or upgrade, for the helm-chart-operator Entity using the following overrides:

For install Plans, use the capabilities.scopedResourceReadinessTimeouts.install override to specify the default resource readiness timeout.
For upgrade Plans, use the capabilities.scopedResourceReadinessTimeouts.upgrade to specify the default resource readiness timeout on change version or configuration Plans.

You can also override the default resource readiness timeout on an individual Helm chart Product by setting resource-readiness-timeout in the Product manifest extensions. For example, resource-readiness-timeout: 1h. When set, this will always override helm-chart-operator's own configuration. Using the Apollo CLI, this can also be set with the --resource-readiness-timeout flag when initializing a new Helm chart Product.

Image pre-pulling

Apollo's helm-chart-operator pre-pulls container images before applying Helm charts. This helps prevent service downtime caused by slow image pulls at Pod startup. You can configure the pre-pulling mechanism in the helm-chart-operator Entity configuration under capabilities.imagePrePullEnsurer by setting the following fields:

enabled: Boolean controlling whether pre-pulling is active.
timeout: Maximum duration helm-chart-operator will spend attempting to pre-pull images before failing the Plan.

By default, the pre-pulling capability is enabled with a timeout of 15 minutes. An example configuration for the pre-pulling mechanism is below:

Copied!1
2
3
4
capabilities:
  imagePrePullEnsurer:
    enabled: true
    timeout: 15m

You can also override the pre-pull timeout on a per-product basis by adding an image-pre-pull-config extension to the Helm chart manifest and setting the image-pre-pull-config.timeout field. The per-product timeout must be at least 1 minute and at most 90 minutes. This flexibility allows you to adapt pre-pull behavior to the specific needs of each product.

An example of the Helm extension for image pre-pulling is below:

Copied!1
2
3
manifest-extensions:
  image-pre-pull-config:
    timeout: "3m"

←

PREVIOUSPublishing versioned Helm charts

NEXTTroubleshooting in Kubernetes clusters

→