66 KiB

original_name

cce_10_0197.html

Upgrade Overview

To enable interoperability from one Kubernetes installation to the next, you must upgrade your Kubernetes clusters before the maintenance period ends.

After the latest Kubernetes version is available in CCE, CCE will describe the changes in this version.

You can use the CCE console to upgrade the Kubernetes version of a cluster.

An upgrade flag will be displayed on the cluster card view if there is a new version for the cluster to upgrade.

How to check:

Log in to the CCE console and check whether the message "New version available" is displayed in the lower left corner of the cluster. If yes, the cluster can be upgraded. If no, the cluster cannot be upgraded.

Figure 1 Cluster with the upgrade flag

Cluster Upgrade

The following table describes the target version to which each cluster version can be upgraded, the supported upgrade modes, and upgrade impacts.

Table 1 Cluster upgrade paths and impacts
Source Version Target Version Upgrade Modes Impacts
v1.19 v1.21 In-place upgrade You need to learn about the differences between versions. For details, see Precautions for Major Version Upgrade <cce_10_0197__section191131551162610>.

v1.17

v1.15

v1.19 In-place upgrade You need to learn about the differences between versions. For details, see Precautions for Major Version Upgrade <cce_10_0197__section191131551162610>.
v1.13 v1.15

Rolling upgrade

Replace upgrade

  • proxy in the coredns add-on cannot be configured and needs to be replaced with forward.
  • The storage add-on is changed from storage-driver to everest.

Upgrade Modes

The upgrade processes are the same for master nodes. The differences between the upgrade modes of worker nodes are described as follows:

Table 2 Differences between upgrade modes and their advantages and disadvantages
Upgrade Mode Method Advantage Disadvantage
In-place upgrade Kubernetes components, network components, and CCE management components are upgraded on the node. During the upgrade, service pods and networks are not affected. The SchedulingDisabled label will be added to all existing nodes. After the upgrade is complete, you can properly use existing nodes. You do not need to migrate services, ensuring service continuity. In-place upgrade does not upgrade the OS of a node. If you want to upgrade the OS, clear the corresponding node data after the node upgrade is complete and reset the node to upgrade the OS to a new version.
Rolling upgrade

Only the Kubernetes components and certain network components are upgraded on the node. The SchedulingDisabled label will be added to all existing nodes to ensure that the running applications are not affected.

Important

NOTICE:

  • After the upgrade is complete, you need to manually create nodes and gradually release the old nodes, thereby migrating your applications to the new nodes. In this mode, you can control the upgrade process.
Services are not interrupted.
  • After the upgrade is complete, you need to manually create nodes and gradually release the old nodes. The new nodes are billed additionally. After services are migrated to the new nodes, the old nodes can be deleted.
  • After the rolling upgrade is complete, if you want to continue the upgrade to a later version, you need to reset the old nodes first. Otherwise, the pre-upgrade check cannot be passed. Services may be interrupted during the upgrade.
Replace upgrade The latest worker node image is used to reset the node OS. This is the fastest upgrade mode and requires few manual interventions. Data or configurations on the node will be lost, and services will be interrupted for a period of time.

Precautions for Major Version Upgrade

Upgrade Path Difference Self-Check
v1.19 to v1.21 The bug of exec probe timeouts is fixed in Kubernetes 1.21. Before this bug fix, the exec probe does not consider the timeoutSeconds field. Instead, the probe will run indefinitely, even beyond its configured deadline. It will stop until the result is returned. If this field is not specified, the default value 1 is used. This field takes effect after the upgrade. If the probe runs over 1 second, the application health check may fail and the application may restart frequently. Before the upgrade, check whether the timeout is properly set for the exec probe.

kube-apiserver of CCE 1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly.

Root cause: X.509 CommonName is discarded in Go 1.15. kube-apiserver of CCE 1.19 is compiled using Go 1.15. If your webhook certificate does not have SANs, kube-apiserver does not process the CommonName field of the X.509 certificate as the host name by default. As a result, the authentication fails.

Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server.

  • If you do not have your own webhook server, you can skip this check.
  • If the field is not set, you are advised to use the SAN field to specify the IP address and domain name supported by the certificate.
v1.15 to v1.19

The control plane of CCE 1.19 is incompatible with Kubelet 1.15. If the master node fails to be upgraded or the node to be upgraded restarts after the master node is successfully upgraded, there is a high probability that the node is in the NotReady status.

There is a high probability that kubelet restarts on the node that fails to be upgraded, triggering the node registration process. The default registration labels of kubelet 1.15 (failure-domain.beta.kubernetes.io/is-baremetal and kubernetes.io/availablezone) are regarded as an invalid label by kube-apiserver 1.19.

The valid labels in v1.19 are node.kubernetes.io/baremetal and failure-domain.beta.kubernetes.io/zone.

  1. In normal cases, this scenario is not triggered.
  2. After the master node is upgraded, do not suspend the upgrade. Upgrade the node quickly.
  3. If a node fails to be upgraded and cannot be restored, evict applications on the node as soon as possible. Contact technical support and skip the node upgrade. After the upgrade is complete, reset the node.
In CCE 1.15 and 1.19 clusters, the Docker storage driver file system is switched from XFS to Ext4. As a result, the import package sequence in the pods of the upgraded Java application may be abnormal, causing pod exceptions.

Before the upgrade, check the Docker configuration file /etc/docker/daemon.json on the node. Check whether the value of dm.fs is xfs.

  • If the value is ext4 or the storage driver is Overlay, you can skip the next steps.
  • If the value is xfs, you are advised to deploy applications in the cluster of the new version in advance to test whether the applications are compatible with the new cluster version.
{
      "storage-driver": "devicemapper",
      "storage-opts": [
      "dm.thinpooldev=/dev/mapper/vgpaas-thinpool",
      "dm.use_deferred_removal=true",
      "dm.fs=xfs",
      "dm.use_deferred_deletion=true"
      ]
}

kube-apiserver of CCE 1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly.

Root cause: X.509 CommonName is discarded in Go 1.15. kube-apiserver of CCE 1.19 is compiled using Go 1.15. The CommonName field is processed as the host name. As a result, the authentication fails.

Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server.

  • If you do not have your own webhook server, you can skip this check.
  • If the field is not set, you are advised to use the SAN field to specify the IP address and domain name supported by the certificate.

Important

NOTICE: To mitigate the impact of version differences on cluster upgrade, CCE performs special processing during the upgrade from 1.15 to 1.19 and still supports certificates without SANs. However, no special processing is required for subsequent upgrades. You are advised to rectify your certificate as soon as possible.

In clusters of v1.17.17 and later, CCE automatically creates pod security policies (PSPs) for you, which restrict the creation of pods with unsafe configurations, for example, pods for which net.core.somaxconn under a sysctl is configured in the security context. After an upgrade, you can allow insecure system configurations as required. For details, see Configuring a Pod Security Policy <cce_10_0275>.
v1.13 to v1.15 After a VPC network cluster is upgraded, the master node occupies an extra CIDR block due to the upgrade of network components. If no container CIDR block is available for the new node, the pod scheduled to the node cannot run. Generally, this problem occurs when the nodes in the cluster are about to fully occupy the container CIDR block. For example, the container CIDR block is 10.0.0.0/16, the number of available IP addresses is 65,536, and the VPC network is allocated a CIDR block with the fixed size (using the mask to determine the maximum number of container IP addresses allocated to each node). If the upper limit is 128, the cluster supports a maximum of 512 (65536/128) nodes, including the three master nodes. After the cluster is upgraded, each of the three master nodes occupies one CIDR block. As a result, 506 nodes are supported.