Renewing certs with zero downtime on K8s

September 7, 2020

Government and large enterprise requires periodic SSL certificate renewals, at least once a year to comply with NIST’s Risk Management Framework (RMF). Typically there is a slight downtime associated with renewing the certificates and to be on the safe side the process is done typically outside the business hours. In this post we show how the certificates can be renewed with zero downtime in a Kubernetes microservice environment with Ambassador as the gateway.

Ambassador is a Kubernetes API Gateway that provides Ingress Controller for Routing traffic to Kubernetes clusters. Ambassador supports a broad range of protocols and TLS termination, it also provides traffic management controls for resource availability. TLS installation is covered in Ambassador Installation.

Challenge

How do you renew certificates during normal business hours with zero downtime on any of the pods running in your Kubernetes cluster?

Solution

We run Ambassaodor (version 1.0.0) as Deployment with NodePort configured on port 30043. Kubectl client is connected to the cluster with admin permissions. (Note: In this configuration Ambassador is an internal API gateway, updating certificates on the external edge device is not included)

$ kubectl get deployments | grep ambassador
ambassador   2/2     2            2           336d

$ kubectl get svc | grep NodePort
ambassador        NodePort   10.100.4.46  <none>  443:30043/TCP  397d
ambassador-admin  NodePort  10.100.13.98  <none>  8877:30001/TCP 397d

TLS certificate is installed on the default namespace, same as Ambassador.

$ kubectl get secret tls-cert
NAME       TYPE                DATA   AGE
tls-cert   kubernetes.io/tls   2      336d

Check the expiry date of new certificate

$ openssl x509 -enddate -noout -in newcert-domain.crt 
notAfter=Jul 29 19:13:48 2021 GMT

Save copy of existing certificate

 $ kubectl get secret tls-cert -oyaml > existing.crt

cat existing.crt
apiVersion: v1
data:
  tls.crt:XXX  #PEM data
  tls.key: XXXX
kind: Secret
metadata:
  creationTimestamp: "2020-08-14T13:17:24Z"
  name: tls-cert
  namespace: default
  resourceVersion: "100743823"
  selfLink: /api/v1/namespaces/default/secrets/tls-cert
  uid: 06b1113b-ebda-4ef3-9628-174d873758c6
type: kubernetes.io/tls

Delete the currently installed certficiate. Note that deleting certificate does not remove the certificate from running Ambassador pod

$ kubectl delete secret tls-cert 
secret "tls-cert" deleted

Install the new certificate. Kubernetes will not verify certificate data, so instead use openssl to verify if the certificate is in a valid PEM format.

$ kubectl create secret tls tls-cert --cert=newcert-domain.crt --key=newcert-domain.key 

In our example we have two ambassador pods running

$ kubectl get pods | grep ambassador
ambassador-79d4dcd47f-8n4ts    1/1     Running   0  69d     
ambassador-79d4dcd47f-ftdnd    1/1     Running   0  70d

Delete each pod sequentially, wait for new ambassador pod to be healhty before deleteing next one.

$ kubectl delete pod ambassador-79d4dcd47f-8n4ts

We are running Ambassaodor Deployment with scale factor of two, this ensures that two pods are running at all times. As an additional step verify that the defualt application pod configured at port 80/443 is running. Also check ambassador-admin (typically running on port 30001) interface for all endpoints before moving to next ambassador pod. This will ensure that applications have a zero downtime.

Conclusion

In this post we show how an Ambassador gateway running with scale factor for two can be used to renew certificates with zero downtime. Since the old certificates aren’t deleted from the running pods, we can replace the certificate and sequentially start new pods.

If you’re interested in learning more about our best practices for zero downtime, reach out to us at [email protected].

Tags: blogs

Subscribe via RSS