NEW: Get project updates onTwitterandMastodon

Best Practice

In this section you will learn how to configure cert-manager to comply with popular security standards such as the CIS Kubernetes Benchmark, the NSA Kubernetes Hardening Guide, or the BSI Kubernetes Security Recommendations.

And you will learn about best practices for deploying cert-manager in production; such as those enforced by tools like Datree and its built in rules, and those documented by the likes of Learnk8s in their "Kubernetes production best practices" checklist.

Overview

The default cert-manager resources in the Helm chart or YAML manifests (Deployment, Pod, ServiceAccount etc) are designed for backwards compatibility rather than for best practice or maximum security. You may find that the default resources do not comply with the security policy on your Kubernetes cluster and in that case you can modify the installation configuration using Helm chart values to override the defaults.

Use Liveness Probes

An example of this recommendation is found in the Datree Documentation: Ensure each container has a configured liveness probe:

Liveness probes allow Kubernetes to determine when a pod should be replaced. They are fundamental in configuring a resilient cluster architecture.

The cert-manager webhook and controller Pods do have liveness probes, but only the webhook liveness probe is enabled by default. The cainjector Pod does not have a liveness probe, yet. More information below.

webhook

The cert-manager webhook has a liveness probe which is enabled by default and the timings and thresholds can be configured using Helm values.

controller

ℹī¸ The cert-manager controller liveness probe was introduced in cert-manager release 1.12.

The cert-manager controller has a liveness probe, but it is disabled by default. You can enable it using the Helm chart value livenessProbe.enabled=true, but first read the background information below.

đŸ“ĸ The controller liveness probe is a new feature in cert-manager release 1.12 and it is disabled by default, as a precaution, in case it causes problems in the field. Please get in touch and tell is if you have enabled the controller liveness probe in production and tell us whether you would like it to be turned on by default. Tell us about any circumstances where the controller has become stuck and where the liveness probe has been necessary to automatically restart the process.

The liveness probe for the cert-manager controller is an HTTP probe which connects to the /livez endpoint of a healthz server which listens on port 9443 and runs in its own thread. The /livez endpoint currently reports the combined status of the following sub-systems and each sub-system has its own /livez endpoint. These are:

  • /livez/leaderElection: Returns an error if the leader election record has not been renewed or if the leader election thread has exited without also crashing the parent process.

ℹī¸ In future more sub-systems could be checked by the /livez endpoint, similar to how Kubernetes ensure logging is not blocked and have health checks for each controller.

📖 Read about how to access individual health checks and verbose status information (cert-manager uses the same healthz endpoint multiplexer as Kubernetes).

cainjector

The cainjector Pod does not have a liveness probe or a /livez healthz endpoint, but there is justification for it in the GitHub issue: cainjector in a zombie state after attempting to shut down. Please add your remarks to that issue if you have also experienced this specific problem, and add your remarks to Helm: Allow configuration of readiness, liveness and startup probes for all created Pods if you have a general request for a liveness probe in cainjector.

Background Information

The cert-manager controller process and the cainjector process, both use the Kubernetes leader election library, to ensure that only one replica of each process can be active at any one time. The Kubernetes control-plane components also use this library.

The leader election code runs in a loop in a separate thread (go routine). If it initially wins the leader election race and if it later fails to renew its leader election lease, it exits. If the leader election thread exits, all the other threads are gracefully shutdown and then the process exits. Similarly, if any of the other main threads exit unexpectedly, that will trigger the orderly shutdown of the remaining threads and the process will exit.

This adheres to the principle that Containers should crash when there's a fatal error. Kubernetes will restart the crashed container, and if it crashes repeatedly, there will be increasing time delays between successive restarts.

For this reason, the liveness probe should only be needed if there is a bug in this orderly shutdown process, or if there is a bug in one of the other threads which causes the process to deadlock and not shutdown.

You may want to enable the liveness probe anyway, for defense against unforeseen bugs and deadlocks, but you will need to monitor the processes closely and, tweak the various liveness probe time settings and thresholds, if necessary.

📖 Read Configure Liveness, Readiness and Startup Probes in the Kubernetes documentation, paying particular attention to the notes and cautions in that document.

📖 Read Shooting Yourself in the Foot with Liveness Probes for more cautionary information about liveness probes.

Restrict Auto-Mount of Service Account Tokens

This recommendation is described in the Kyverno Policy Catalogue as follows:

Kubernetes automatically mounts ServiceAccount credentials in each Pod. The ServiceAccount may be assigned roles allowing Pods to access API resources. Blocking this ability is an extension of the least privilege best practice and should be followed if Pods do not need to speak to the API server to function. This policy ensures that mounting of these ServiceAccount tokens is blocked

The cert-manager components do need to speak to the API server but we still recommend setting automountServiceAccountToken: false for the following reasons:

  1. Setting automountServiceAccountToken: false will allow cert-manager to be installed on clusters where Kyverno (or some other policy system) is configured to deny Pods that have this field set to true. The Kubernetes default value is true.
  2. With automountServiceAccountToken: true, all the containers in the Pod will mount the ServiceAccount token, including side-car and init containers that might have been injected into the cert-manager Pod resources by Kubernetes admission controllers. The principle of least privilege suggests that it is better to explicitly mount the ServiceAccount token into the cert-manager containers.

So it is recommended to set automountServiceAccountToken: false and manually add a projected Volume to each of the cert-manager Deployment resources, containing the ServiceAccount token, CA certificate and namespace files that would normally be added automatically by the Kubernetes ServiceAccount controller, and to explicitly add a read-only VolumeMount to each of the cert-manager containers.

An example of this configuration is included in the Helm Chart Values file below.

Best Practice Helm Chart Values

Download the following Helm chart values file and supply it to helm install, helm upgrade, or helm template using the --values flag:

🔗 values.best-practice.yaml

# Helm chart values which make cert-manager comply with CIS, BSI and NSA
# security benchmarks.
automountServiceAccountToken: false
serviceAccount:
automountServiceAccountToken: false
volumes:
- name: serviceaccount-token
projected:
defaultMode: 0444
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: serviceaccount-token
readOnly: true
webhook:
automountServiceAccountToken: false
serviceAccount:
automountServiceAccountToken: false
volumes:
- name: serviceaccount-token
projected:
defaultMode: 0444
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: serviceaccount-token
readOnly: true
cainjector:
automountServiceAccountToken: false
serviceAccount:
automountServiceAccountToken: false
volumes:
- name: serviceaccount-token
projected:
defaultMode: 0444
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: serviceaccount-token
readOnly: true
startupapicheck:
automountServiceAccountToken: false
serviceAccount:
automountServiceAccountToken: false
volumes:
- name: serviceaccount-token
projected:
defaultMode: 0444
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: serviceaccount-token
readOnly: true

Other

This list of recommendations is a work-in-progress. If you have other best practice recommendations please contribute to this page.