diff --git a/umn/source/_static/images/en-us_image_0000001499725826.png b/umn/source/_static/images/en-us_image_0000001499725826.png deleted file mode 100644 index e2ce488..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001499725826.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001504661902.png b/umn/source/_static/images/en-us_image_0000001504661902.png new file mode 100644 index 0000000..f1ae631 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001504661902.png differ diff --git a/umn/source/_static/images/en-us_image_0000001504821802.png b/umn/source/_static/images/en-us_image_0000001504821802.png new file mode 100644 index 0000000..4c012a5 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001504821802.png differ diff --git a/umn/source/_static/images/en-us_image_0000001517903020.png b/umn/source/_static/images/en-us_image_0000001517903020.png new file mode 100644 index 0000000..6887683 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001517903020.png differ diff --git a/umn/source/_static/images/en-us_image_0000001517903052.png b/umn/source/_static/images/en-us_image_0000001517903052.png new file mode 100644 index 0000000..865e37b Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001517903052.png differ diff --git a/umn/source/_static/images/en-us_image_0000001517903056.png b/umn/source/_static/images/en-us_image_0000001517903056.png new file mode 100644 index 0000000..c301af9 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001517903056.png differ diff --git a/umn/source/_static/images/en-us_image_0000001517903128.png b/umn/source/_static/images/en-us_image_0000001517903128.png new file mode 100644 index 0000000..60b60d5 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001517903128.png differ diff --git a/umn/source/_static/images/en-us_image_0000001518062524.png b/umn/source/_static/images/en-us_image_0000001518062524.png new file mode 100644 index 0000000..65e1184 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001518062524.png differ diff --git a/umn/source/_static/images/en-us_image_0000001518062540.png b/umn/source/_static/images/en-us_image_0000001518062540.png new file mode 100644 index 0000000..4cff8a7 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001518062540.png differ diff --git a/umn/source/_static/images/en-us_image_0000001518062624.png b/umn/source/_static/images/en-us_image_0000001518062624.png new file mode 100644 index 0000000..e970982 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001518062624.png differ diff --git a/umn/source/_static/images/en-us_image_0000001518062716.png b/umn/source/_static/images/en-us_image_0000001518062716.png new file mode 100644 index 0000000..ce83481 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001518062716.png differ diff --git a/umn/source/_static/images/en-us_image_0000001518222492.png b/umn/source/_static/images/en-us_image_0000001518222492.png new file mode 100644 index 0000000..d09e52e Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001518222492.png differ diff --git a/umn/source/_static/images/en-us_image_0000001519063542.png b/umn/source/_static/images/en-us_image_0000001519063542.png deleted file mode 100644 index 261859f..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001519063542.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001519067438.png b/umn/source/_static/images/en-us_image_0000001519067438.png deleted file mode 100644 index 93a63c8..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001519067438.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001519544422.png b/umn/source/_static/images/en-us_image_0000001519544422.png deleted file mode 100644 index 1a4d76d..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001519544422.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001520080400.png b/umn/source/_static/images/en-us_image_0000001520080400.png deleted file mode 100644 index f3a9ba2..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001520080400.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001550245869.png b/umn/source/_static/images/en-us_image_0000001550245869.png index 2c1d419..80fd8a4 100644 Binary files a/umn/source/_static/images/en-us_image_0000001550245869.png and b/umn/source/_static/images/en-us_image_0000001550245869.png differ diff --git a/umn/source/_static/images/en-us_image_0000001550365693.png b/umn/source/_static/images/en-us_image_0000001550365693.png deleted file mode 100644 index 43eba23..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001550365693.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001568822861.png b/umn/source/_static/images/en-us_image_0000001568822861.png deleted file mode 100644 index d74a10f..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001568822861.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001568902509.png b/umn/source/_static/images/en-us_image_0000001568902509.png new file mode 100644 index 0000000..30a7eb2 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001568902509.png differ diff --git a/umn/source/_static/images/en-us_image_0000001568902601.png b/umn/source/_static/images/en-us_image_0000001568902601.png new file mode 100644 index 0000000..0e6fa61 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001568902601.png differ diff --git a/umn/source/_static/images/en-us_image_0000001569022901.png b/umn/source/_static/images/en-us_image_0000001569022901.png new file mode 100644 index 0000000..827836e Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001569022901.png differ diff --git a/umn/source/_static/images/en-us_image_0000001570344789.png b/umn/source/_static/images/en-us_image_0000001570344789.png deleted file mode 100644 index 75f3cb3..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001570344789.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001571360421.png b/umn/source/_static/images/en-us_image_0000001571360421.png deleted file mode 100644 index 78c561f..0000000 Binary files a/umn/source/_static/images/en-us_image_0000001571360421.png and /dev/null differ diff --git a/umn/source/_static/images/en-us_image_0000001578443828.png b/umn/source/_static/images/en-us_image_0000001578443828.png new file mode 100644 index 0000000..0819970 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001578443828.png differ diff --git a/umn/source/_static/images/en-us_image_0000001579008782.png b/umn/source/_static/images/en-us_image_0000001579008782.png new file mode 100644 index 0000000..bbd7fce Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001579008782.png differ diff --git a/umn/source/_static/images/en-us_image_0000001626725269.png b/umn/source/_static/images/en-us_image_0000001626725269.png new file mode 100644 index 0000000..6dd1e58 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001626725269.png differ diff --git a/umn/source/_static/images/en-us_image_0000001628843805.png b/umn/source/_static/images/en-us_image_0000001628843805.png new file mode 100644 index 0000000..6bc88c9 Binary files /dev/null and b/umn/source/_static/images/en-us_image_0000001628843805.png differ diff --git a/umn/source/add-ons/metrics-server.rst b/umn/source/add-ons/metrics-server.rst index 429925b..5212d15 100644 --- a/umn/source/add-ons/metrics-server.rst +++ b/umn/source/add-ons/metrics-server.rst @@ -20,7 +20,7 @@ Installing the Add-on #. Select **Single**, **Custom**, or **HA** for **Add-on Specifications**. - **Pods**: Set the number of pods based on service requirements. - - Multi AZ + - **Multi AZ**: - **Preferred**: Deployment pods of the add-on are preferentially scheduled to nodes in different AZs. If the nodes in the cluster do not meet the requirements of multiple AZs, the pods are scheduled to a single AZ. - **Required**: Deployment pods of the add-on are forcibly scheduled to nodes in different AZs. If the nodes in the cluster do not meet the requirements of multiple AZs, not all pods can run. diff --git a/umn/source/add-ons/npd.rst b/umn/source/add-ons/npd.rst index 5b6f47b..d4d1401 100644 --- a/umn/source/add-ons/npd.rst +++ b/umn/source/add-ons/npd.rst @@ -36,7 +36,6 @@ Installing the Add-on #. On the **Install Add-on** page, select the add-on specifications and set related parameters. - **Pods**: Set the number of pods based on service requirements. - - **Multi AZ**: - **Containers**: Select a proper container quota based on service requirements. #. Set the npd parameters and click **Install**. diff --git a/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/containerizing_an_entire_application.rst b/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/containerizing_an_entire_application.rst index 1c8c341..de6e36b 100644 --- a/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/containerizing_an_entire_application.rst +++ b/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/containerizing_an_entire_application.rst @@ -23,7 +23,7 @@ When a third-party enterprise needs to use this application, a suit of **Tomcat **Figure 1** Application architecture -As shown in :ref:`Figure 1 `, the application is a standard Tomcat application, and its backend interconnects with MongoDB and MySQL databases. For this type of applications, there is no need to split its architecture. The entire application is packed as an image, and the mongoDB database is deployed in the same image as the Tomcat application. In this way, the application can be deployed or upgraded through the image. +As shown in :ref:`Figure 1 `, the application is a standard Tomcat application, and its backend interconnects with MongoDB and MySQL databases. For this type of applications, there is no need to split the architecture. The entire application is built as an image, and the MongoDB database is deployed in the same image as the Tomcat application. In this way, the application can be deployed or upgraded through the image. - Interconnecting with the MongoDB database for storing user files. - Interconnecting with the MySQL database for storing third-party enterprise data. The MySQL database is an external cloud database. @@ -35,7 +35,7 @@ In this example, the application was deployed on a VM. During application deploy By using containers, you can easily pack application code, configurations, and dependencies and convert them into easy-to-use building blocks. This achieves the environmental consistency and version management, as well as improves the development and operation efficiency. Containers ensure quick, reliable, and consistent deployment of applications and prevent applications from being affected by deployment environment. -.. table:: **Table 1** Comparison between the tow deployment modes +.. table:: **Table 1** Comparison between the two deployment modes +---------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Item | Before: Application Deployment on VM | After: Application Deployment Using Containers | diff --git a/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/creating_a_container_workload.rst b/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/creating_a_container_workload.rst index 17edd28..ef923b6 100644 --- a/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/creating_a_container_workload.rst +++ b/umn/source/best_practice/containerization/containerizing_an_enterprise_application_erp/procedure/creating_a_container_workload.rst @@ -55,7 +55,7 @@ Procedure | | | | | | | a. Log in to the management console. | | | | | - | | | b. In the service list, choose **Security and Compliance** > **Data Encryption Workshop**. | + | | | b. In the service list, choose **Data Encryption Workshop** under **Security & Compliance**. | | | | | | | | c. In the navigation pane, choose **Key Pair Service**. On the **Private Key Pairs** tab page, click **Create Key Pair**. | | | | | @@ -68,7 +68,7 @@ Procedure #. Create a cluster and a node. - a. Log in to the CCE console, choose **Clusters**, and click **Buy** next to **CCE cluster**. + a. Log in to the CCE console. Choose **Clusters**. On the displayed page, select the type of the cluster to be created and click Create. Configure cluster parameters and select the VPC created in :ref:`1 `. diff --git a/umn/source/best_practice/networking/planning_cidr_blocks_for_a_cluster.rst b/umn/source/best_practice/networking/planning_cidr_blocks_for_a_cluster.rst index d00d229..dbf99ab 100644 --- a/umn/source/best_practice/networking/planning_cidr_blocks_for_a_cluster.rst +++ b/umn/source/best_practice/networking/planning_cidr_blocks_for_a_cluster.rst @@ -148,9 +148,9 @@ In the VPC network model, after creating a peering connection, you need to add r .. figure:: /_static/images/en-us_image_0261818886.png - :alt: **Figure 7** VPC Network - VPC interconnection scenario + :alt: **Figure 7** VPC network - VPC interconnection scenario - **Figure 7** VPC Network - VPC interconnection scenario + **Figure 7** VPC network - VPC interconnection scenario When creating a VPC peering connection between containers across VPCs, pay attention to the following points: diff --git a/umn/source/best_practice/networking/selecting_a_network_model.rst b/umn/source/best_practice/networking/selecting_a_network_model.rst index b48f95c..b1912ac 100644 --- a/umn/source/best_practice/networking/selecting_a_network_model.rst +++ b/umn/source/best_practice/networking/selecting_a_network_model.rst @@ -5,7 +5,7 @@ Selecting a Network Model ========================= -CCE uses self-proprietary, high-performance container networking add-ons to support the tunnel network, Cloud Native Network 2.0, and VPC network models. +CCE uses proprietary, high-performance container networking add-ons to support the tunnel network, Cloud Native Network 2.0, and VPC network models. .. caution:: diff --git a/umn/source/best_practice/storage/custom_storage_classes.rst b/umn/source/best_practice/storage/custom_storage_classes.rst index b647a0a..e312d5a 100644 --- a/umn/source/best_practice/storage/custom_storage_classes.rst +++ b/umn/source/best_practice/storage/custom_storage_classes.rst @@ -214,36 +214,6 @@ Other types of storage resources can be defined in the similar way. You can use reclaimPolicy: Delete volumeBindingMode: Immediate -Specifying an Enterprise Project for Storage Classes ----------------------------------------------------- - -CCE allows you to specify an enterprise project when creating EVS disks and OBS PVCs. The created storage resources (EVS disks and OBS) belong to the specified enterprise project. **The enterprise project can be the enterprise project to which the cluster belongs or the default enterprise project.** - -If you do no specify any enterprise project, the enterprise project in StorageClass is used by default. The created storage resources by using the csi-disk and csi-obs storage classes of CCE belong to the default enterprise project. - -If you want the storage resources created from the storage classes to be in the same enterprise project as the cluster, you can customize a storage class and specify the enterprise project ID, as shown below. - -.. note:: - - To use this function, the everest add-on must be upgraded to 1.2.33 or later. - -.. code-block:: - - kind: StorageClass - apiVersion: storage.k8s.io/v1 - metadata: - name: csi-disk-epid #Customize a storage class name. - provisioner: everest-csi-provisioner - parameters: - csi.storage.k8s.io/csi-driver-name: disk.csi.everest.io - csi.storage.k8s.io/fstype: ext4 - everest.io/disk-volume-type: SAS - everest.io/enterprise-project-id: 86bfc701-9d9e-4871-a318-6385aa368183 #Specify the enterprise project ID. - everest.io/passthrough: 'true' - reclaimPolicy: Delete - allowVolumeExpansion: true - volumeBindingMode: Immediate - Setting a Default Storage Class ------------------------------- diff --git a/umn/source/change_history.rst b/umn/source/change_history.rst index b09ab6f..284b946 100644 --- a/umn/source/change_history.rst +++ b/umn/source/change_history.rst @@ -12,7 +12,6 @@ Change History +===================================+=======================================================================================================================================================================================================================================+ | 2023-05-30 | - Added\ :ref:`Configuring a Node Pool `. | | | - Added\ :ref:`Configuring Health Check for Multiple Ports `. | - | | - Added\ :ref:`NetworkAttachmentDefinition `. | | | - Updated\ :ref:`Creating a Node `. | | | - Updated\ :ref:`Creating a Node Pool `. | | | - Updated\ :ref:`OS Patch Notes for Cluster Nodes `. | diff --git a/umn/source/clusters/cluster_overview/cce_turbo_clusters_and_cce_clusters.rst b/umn/source/clusters/cluster_overview/cce_turbo_clusters_and_cce_clusters.rst index f231bc6..2a5b2e4 100644 --- a/umn/source/clusters/cluster_overview/cce_turbo_clusters_and_cce_clusters.rst +++ b/umn/source/clusters/cluster_overview/cce_turbo_clusters_and_cce_clusters.rst @@ -12,26 +12,25 @@ The following table lists the differences between CCE Turbo clusters and CCE clu .. table:: **Table 1** Cluster types - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ - | Dimension | Sub-dimension | CCE Turbo cluster | CCE cluster | - +=================+=============================+================================================================================================================================+========================================================================================+ - | Cluster | Positioning | Next-gen container cluster, with accelerated computing, networking, and scheduling. Designed for Cloud Native 2.0 | Standard cluster for common commercial use | - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ - | | Node type | Hybrid deployment of VMs and bare-metal servers | Hybrid deployment of VMs and bare-metal servers | - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ - | Network | Model | **Cloud Native Network 2.0**: applies to large-scale and high-performance scenarios. | **Cloud-native network 1.0**: applies to common, smaller-scale scenarios. | - | | | | | - | | | Max networking scale: 2,000 nodes | - Tunnel network model | - | | | | - VPC network model | - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ - | | Network performance | Flattens the VPC network and container network into one. No performance loss. | Overlays the VPC network with the container network, causing certain performance loss. | - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ - | | Container network isolation | Associates pods with security groups. Unifies security isolation in and out the cluster via security groups' network policies. | - Tunnel network model: supports network policies for intra-cluster communications. | - | | | | - VPC network model: supports no isolation. | - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ - | Security | Isolation | - Physical machine: runs Kata containers, allowing VM-level isolation. | Common containers are deployed and isolated by cgroups. | - | | | - VM: runs common containers, isolated by cgroups. | | - +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ + | Dimension | Sub-dimension | CCE Turbo cluster | CCE cluster | + +=================+=============================+================================================================================================================================+================================================================================================+ + | Cluster | Positioning | Next-gen container cluster, with accelerated computing, networking, and scheduling. Designed for Cloud Native 2.0 | Standard cluster for common commercial use | + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ + | | Node type | Deployment of VMs | Hybrid deployment of VMs and bare metal servers | + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ + | Network | Model | **Cloud Native Network 2.0**: applies to large-scale and high-performance scenarios. | **Cloud-native network 1.0**: applies to common, smaller-scale scenarios. | + | | | | | + | | | Max networking scale: 2,000 nodes | - Container tunnel network model | + | | | | - VPC network model | + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ + | | Network performance | Flattens the VPC network and container network into one. No performance loss. | Overlays the VPC network with the container network, causing certain performance loss. | + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ + | | Container network isolation | Associates pods with security groups. Unifies security isolation in and out the cluster via security groups' network policies. | - Container tunnel network model: supports network policies for intra-cluster communications. | + | | | | - VPC network model: supports no isolation. | + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ + | Security | Isolation | - VM: runs common containers, isolated by cgroups. | Common containers are deployed and isolated by cgroups. | + +-----------------+-----------------------------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+ QingTian Architecture --------------------- diff --git a/umn/source/clusters/managing_a_cluster/cluster_configuration_management.rst b/umn/source/clusters/managing_a_cluster/cluster_configuration_management.rst index 759bd53..abaf903 100644 --- a/umn/source/clusters/managing_a_cluster/cluster_configuration_management.rst +++ b/umn/source/clusters/managing_a_cluster/cluster_configuration_management.rst @@ -70,51 +70,51 @@ Procedure .. table:: **Table 3** kube-controller-manager parameters - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | Parameter | Description | Value | - +=======================================+=====================================================================================================================================================================+=======================+ - | concurrent-deployment-syncs | Number of Deployments that are allowed to synchronize concurrently. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-endpoint-syncs | Number of endpoints that are allowed to synchronize concurrently. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-gc-syncs | Number of garbage collector workers that are allowed to synchronize concurrently. | Default: 20 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-job-syncs | Number of jobs that can be synchronized at the same time. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-namespace-syncs | Number of namespaces that are allowed to synchronize concurrently. | Default: 10 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-replicaset-syncs | Number of ReplicaSets that are allowed to synchronize concurrently. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-resource-quota-syncs | Number of resource quotas that are allowed to synchronize concurrently. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-service-syncs | Number of Services that are allowed to synchronize concurrently. | Default: 10 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-serviceaccount-token-syncs | Number of service account tokens that are allowed to synchronize concurrently. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-ttl-after-finished-syncs | Number of TTL-after-finished controller workers that are allowed to synchronize concurrently. | Default: 5 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent_rc_syncs | Number of replication controllers that are allowed to synchronize concurrently. | Default: 5 | - | | | | - | | .. note:: | | - | | | | - | | This parameter is used only in clusters of v1.19 or earlier. | | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | concurrent-rc-syncs | Number of replication controllers that are allowed to synchronize concurrently. | Default: 5 | - | | | | - | | .. note:: | | - | | | | - | | This parameter is used only in clusters of v1.21 to v1.23. In clusters of v1.25 and later, this parameter is deprecated (officially deprecated from v1.25.3-r0). | | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | horizontal-pod-autoscaler-sync-period | How often HPA audits metrics in a cluster. | Default: 15 seconds | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | kube-api-qps | Query per second (QPS) to use while talking with kube-apiserver. | Default: 100 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | kube-api-burst | Burst to use while talking with kube-apiserver. | Default: 100 | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ - | terminated-pod-gc-threshold | Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. | Default: 1000 | - | | | | - | | If <= 0, the terminated pod garbage collector is disabled. | | - +---------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Value | + +=======================================+========================================================================================================================================================================+=======================+ + | concurrent-deployment-syncs | Number of Deployments that are allowed to synchronize concurrently. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-endpoint-syncs | Number of endpoints that are allowed to synchronize concurrently. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-gc-syncs | Number of garbage collector workers that are allowed to synchronize concurrently. | Default: 20 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-job-syncs | Number of jobs that can be synchronized at the same time. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-namespace-syncs | Number of namespaces that are allowed to synchronize concurrently. | Default: 10 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-replicaset-syncs | Number of ReplicaSets that are allowed to synchronize concurrently. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-resource-quota-syncs | Number of resource quotas that are allowed to synchronize concurrently. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-service-syncs | Number of Services that are allowed to synchronize concurrently. | Default: 10 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-serviceaccount-token-syncs | Number of service account tokens that are allowed to synchronize concurrently. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-ttl-after-finished-syncs | Number of TTL-after-finished controller workers that are allowed to synchronize concurrently. | Default: 5 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent_rc_syncs | Number of replication controllers that are allowed to synchronize concurrently. | Default: 5 | + | | | | + | | .. note:: | | + | | | | + | | This parameter is used only in clusters of v1.19 or earlier. | | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | concurrent-rc-syncs | Number of replication controllers that are allowed to synchronize concurrently. | Default: 5 | + | | | | + | | .. note:: | | + | | | | + | | This parameter is used only in clusters of v1.21 to v1.23. In clusters of v1.25 and later, this parameter is deprecated (officially deprecated from v1.25.3-r0 on). | | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | horizontal-pod-autoscaler-sync-period | How often HPA audits metrics in a cluster. | Default: 15 seconds | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | kube-api-qps | Query per second (QPS) to use while talking with kube-apiserver. | Default: 100 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | kube-api-burst | Burst to use while talking with kube-apiserver. | Default: 100 | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | terminated-pod-gc-threshold | Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. | Default: 1000 | + | | | | + | | If <= 0, the terminated pod garbage collector is disabled. | | + +---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ .. table:: **Table 4** kube-scheduler parameters @@ -139,7 +139,7 @@ Procedure +----------------------------+----------------------------------------------------------------------------------------------+-----------------------+ | nic-max-above-warm-target | Reclaim number of ENIs pre-bound to a node at the cluster level | Default: 2 | +----------------------------+----------------------------------------------------------------------------------------------+-----------------------+ - | prebound-subeni-percentage | Low threshold of the number of bound ENIs:High threshold of the number of bound ENIs | Default: 0:0 | + | prebound-subeni-percentage | Low threshold of the number of bound ENIs : High threshold of the number of bound ENIs | Default: 0:0 | | | | | | | .. note:: | | | | | | diff --git a/umn/source/clusters/upgrading_a_cluster/index.rst b/umn/source/clusters/upgrading_a_cluster/index.rst index abb1088..72114fb 100644 --- a/umn/source/clusters/upgrading_a_cluster/index.rst +++ b/umn/source/clusters/upgrading_a_cluster/index.rst @@ -7,9 +7,11 @@ Upgrading a Cluster - :ref:`Upgrade Overview ` - :ref:`Before You Start ` +- :ref:`Post-Upgrade Verification ` - :ref:`Performing Replace/Rolling Upgrade ` - :ref:`Performing In-place Upgrade ` - :ref:`Migrating Services Across Clusters of Different Versions ` +- :ref:`Troubleshooting for Pre-upgrade Check Exceptions ` .. toctree:: :maxdepth: 1 @@ -17,6 +19,8 @@ Upgrading a Cluster upgrade_overview before_you_start + post-upgrade_verification/index performing_replace_rolling_upgrade performing_in-place_upgrade migrating_services_across_clusters_of_different_versions + troubleshooting_for_pre-upgrade_check_exceptions/index diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/index.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/index.rst new file mode 100644 index 0000000..095e6cd --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/index.rst @@ -0,0 +1,26 @@ +:original_name: cce_10_0560.html + +.. _cce_10_0560: + +Post-Upgrade Verification +========================= + +- :ref:`Service Verification ` +- :ref:`Pod Check ` +- :ref:`Node and Container Network Check ` +- :ref:`Node Label and Taint Check ` +- :ref:`New Node Check ` +- :ref:`New Pod Check ` +- :ref:`Node Skipping Check for Reset ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + service_verification + pod_check + node_and_container_network_check + node_label_and_taint_check + new_node_check + new_pod_check + node_skipping_check_for_reset diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/new_node_check.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/new_node_check.rst new file mode 100644 index 0000000..4dc1a13 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/new_node_check.rst @@ -0,0 +1,21 @@ +:original_name: cce_10_0565.html + +.. _cce_10_0565: + +New Node Check +============== + +Check Item +---------- + +Check whether nodes can be created in the cluster. + +Procedure +--------- + +Go to the CCE console and access the cluster console. Choose **Nodes** in the navigation pane, and click **Create Node**. + +Solution +-------- + +If nodes cannot be created in your cluster after the cluster is upgraded, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/new_pod_check.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/new_pod_check.rst new file mode 100644 index 0000000..1a7f45c --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/new_pod_check.rst @@ -0,0 +1,64 @@ +:original_name: cce_10_0566.html + +.. _cce_10_0566: + +New Pod Check +============= + +Check Item +---------- + +- Check whether pods can be created on the existing nodes after the cluster is upgraded. +- Check whether pods can be created on new nodes after the cluster is upgraded. + +Procedure +--------- + +After creating a node based on :ref:`New Node Check `, create a DaemonSet workload to create pods on each node. + +Go to the CCE console, access the cluster console, and choose **Workloads** in the navigation pane. On the displayed page, switch to the **DaemonSets** tab page and click **Create Workload** or **Create from YAML** in the upper right corner. + +You are advised to use the image for routine tests as the base image. You can deploy a pod by referring to the following YAML file. + +.. note:: + + In this test, YAML deploys DaemonSet in the default namespace, uses **ngxin:perl** as the base image, requests 10 MB CPU and 10 Mi memory, and limits 100 MB CPU and 50 Mi memory. + +.. code-block:: + + apiVersion: apps/v1 + kind: DaemonSet + metadata: + name: post-upgrade-check + namespace: default + spec: + selector: + matchLabels: + app: post-upgrade-check + version: v1 + template: + metadata: + labels: + app: post-upgrade-check + version: v1 + spec: + containers: + - name: container-1 + image: nginx:perl + imagePullPolicy: IfNotPresent + resources: + requests: + cpu: 10m + memory: 10Mi + limits: + cpu: 100m + memory: 50Mi + +After the workload is created, check whether the pod status of the workload is normal. + +After the check is complete, go to the CCE console and access the cluster console. Choose **Workloads** in the navigation pane. On the displayed page, switch to the **DaemonSets** tab page, choose **More** > **Delete** in the **Operation** column of the **post-upgrade-check** workload to delete the test workload. + +Solution +-------- + +If the pod cannot be created or the pod status is abnormal, contact technical support and specify whether the exception occurs on new nodes or existing nodes. diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_and_container_network_check.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_and_container_network_check.rst new file mode 100644 index 0000000..49c6fd2 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_and_container_network_check.rst @@ -0,0 +1,68 @@ +:original_name: cce_10_0563.html + +.. _cce_10_0563: + +Node and Container Network Check +================================ + +Check Item +---------- + +- Check whether the nodes are running properly. +- Check whether the node network is normal. +- Check whether the container network is normal. + +Procedure +--------- + +The node status reflects whether the node component or network is normal. + +Go to the CCE console and access the cluster console. Choose **Nodes** in the navigation pane. You can filter node status by status to check whether there are abnormal nodes. + +|image1| + +The container network affects services. Check whether your services are available. + +Solution +-------- + +If the node status is abnormal, contact technical support. + +If the container network is abnormal and your services are affected, contact technical support and confirm the abnormal network access path. + ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| Source | Destination | Destination Type | Possible Fault | ++==============================================+==============================================================================+======================================+======================================================================================================================================+ +| - Pods (inside a cluster) | Public IP address of Service ELB | Cluster traffic load balancing entry | No record. | +| - Nodes (inside a cluster) | | | | +| - Nodes in the same VPC (outside a cluster) | | | | +| - Third-party clouds | | | | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Private IP address of Service ELB | Cluster traffic load balancing entry | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Public IP address of ingress ELB | Cluster traffic load balancing entry | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Private IP address of ingress ELB | Cluster traffic load balancing entry | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Public IP address of NodePort Service | Cluster traffic entry | The kube proxy configuration is overwritten. This fault has been rectified in the upgrade process. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Private IP address of NodePort Service | Cluster traffic entry | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | ClusterIP Service | Service network plane | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Non NodePort Service port | Container network | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Cross-node pods | Container network plane | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Pods on the same node | Container network plane | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | Service and pod domain names are resolved by CoreDNS. | Domain name resolution | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | External domain names are resolved based on the CoreDNS hosts configuration. | Domain name resolution | After the coredns add-on is upgraded, the configuration is overwritten. This fault has been rectified in the add-on upgrade process. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | External domain names are resolved based on the CoreDNS upstream server. | Domain name resolution | After the coredns add-on is upgraded, the configuration is overwritten. This fault has been rectified in the add-on upgrade process. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ +| | External domain names are not resolved by CoreDNS. | Domain name resolution | No record. | ++----------------------------------------------+------------------------------------------------------------------------------+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + +.. |image1| image:: /_static/images/en-us_image_0000001518062524.png diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_label_and_taint_check.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_label_and_taint_check.rst new file mode 100644 index 0000000..61234d3 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_label_and_taint_check.rst @@ -0,0 +1,26 @@ +:original_name: cce_10_0564.html + +.. _cce_10_0564: + +Node Label and Taint Check +========================== + +Check Item +---------- + +- Check whether the label is lost. +- Check whether there are unexpected taints. + +Procedure +--------- + +Go to the CCE console, access the cluster console, and choose **Nodes** in the navigation pane. On the displayed page, click the **Nodes** tab, select all nodes, and click **Manage Labels and Taints** to view the labels and taints of the current node. + +Solution +-------- + +User labels are not changed during the cluster upgrade. If you find that labels are lost or added abnormally, contact technical support. + +If you find a new taint (**node.kubernetes.io/upgrade**) on a node, the node may be skipped during the upgrade. For details, see :ref:`Node Skipping Check for Reset `. + +If you find that other taints are added to the node, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_skipping_check_for_reset.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_skipping_check_for_reset.rst new file mode 100644 index 0000000..69ab900 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/node_skipping_check_for_reset.rst @@ -0,0 +1,22 @@ +:original_name: cce_10_0567.html + +.. _cce_10_0567: + +Node Skipping Check for Reset +============================= + +Check Item +---------- + +After the cluster is upgraded, you need to reset the nodes that fail to be upgraded. + +Procedure +--------- + +Go back to the previous step or view the upgrade details on the upgrade history page to view the nodes that are skipped during the upgrade. + +The skipped nodes are displayed on the upgrade details page. Reset the skipped nodes after the upgrade is complete. For details about how to reset a node, see :ref:`Resetting a Node `. + +.. note:: + + Resetting a node will reset all node labels, which may affect workload scheduling. Before resetting a node, check and retain the labels that you have manually added to the node. diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/pod_check.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/pod_check.rst new file mode 100644 index 0000000..c3740f5 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/pod_check.rst @@ -0,0 +1,31 @@ +:original_name: cce_10_0562.html + +.. _cce_10_0562: + +Pod Check +========= + +Check Item +---------- + +- Check whether unexpected pods exist in the cluster. +- Check whether there are pods restart unexpectedly in the cluster. + +Procedure +--------- + +Go to the CCE console and access the cluster console. Choose **Workloads** in the navigation pane. On the displayed page, switch to the **Pods** tab page. Select **All namespaces**, click **Status**, and check whether abnormal pods exist. + +|image1| + +View the **Restarts** column to check whether there are pods that are restarted abnormally. + +|image2| + +Solution +-------- + +If there are abnormal pods in your cluster after the cluster upgrade, contact technical support. + +.. |image1| image:: /_static/images/en-us_image_0000001518222492.png +.. |image2| image:: /_static/images/en-us_image_0000001518062540.png diff --git a/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/service_verification.rst b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/service_verification.rst new file mode 100644 index 0000000..ff9d008 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/post-upgrade_verification/service_verification.rst @@ -0,0 +1,28 @@ +:original_name: cce_10_0561.html + +.. _cce_10_0561: + +Service Verification +==================== + +Check Item +---------- + +After the cluster is upgraded, check whether the services are running normal. + +Procedure +--------- + +Different services have different verification mode. Select a suitable one and verify the service before and after the upgrade. + +You can verify the service from the following aspects: + +- The service page is available. +- No alarm or event is generated on the normal platform. +- No error log is generated for key processes. +- The API dialing test is normal. + +Solution +-------- + +If your online services are abnormal after the cluster upgrade, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/cce-hpa-controller_restriction_check.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/cce-hpa-controller_restriction_check.rst new file mode 100644 index 0000000..125559c --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/cce-hpa-controller_restriction_check.rst @@ -0,0 +1,16 @@ +:original_name: cce_10_0479.html + +.. _cce_10_0479: + +cce-hpa-controller Restriction Check +==================================== + +Check Item +---------- + +Check whether the current cce-controller-hpa add-on has compatibility restrictions. + +Solution +-------- + +The current cce-controller-hap add-on has compatibility restrictions. An add-on that can provide metric APIs, for example, metric-server, must be installed in the cluster. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_coredns_configuration_consistency.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_coredns_configuration_consistency.rst new file mode 100644 index 0000000..bf07507 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_coredns_configuration_consistency.rst @@ -0,0 +1,78 @@ +:original_name: cce_10_0493.html + +.. _cce_10_0493: + +Checking CoreDNS Configuration Consistency +========================================== + +Check Item +---------- + +Check whether the current CoreDNS key configuration Corefile is different from the Helm Release record. The difference may be overwritten during the add-on upgrade, affecting domain name resolution in the cluster. + +Solution +-------- + +You can upgrade the coredns add-on separately after confirming the configuration differences. + +#. For details about how to configure kubectl, see :ref:`Connecting to a Cluster Using kubectl `. + +#. .. _cce_10_0493__en-us_topic_0000001548755413_li1178291934910: + + Obtain the Corefile that takes effect currently. + + .. code-block:: + + kubectl get cm -nkube-system coredns -o jsonpath='{.data.Corefile}' > corefile_now.txt + cat corefile_now.txt + +#. .. _cce_10_0493__en-us_topic_0000001548755413_li111544111811: + + Obtain the Corefile in the Helm Release record (depending on Python 3). + + .. code-block:: + + latest_release=`kubectl get secret -nkube-system -l owner=helm -l name=cceaddon-coredns --sort-by=.metadata.creationTimestamp | awk 'END{print $1}'` + kubectl get secret -nkube-system $latest_release -o jsonpath='{.data.release}' | base64 -d | base64 -d | gzip -d | python -m json.tool | python -c " + import json,sys,re,yaml; + manifests = json.load(sys.stdin)['manifest'] + files = re.split('(?:^|\s*\n)---\s*',manifests) + for file in files: + if 'coredns/templates/configmap.yaml' in file and 'Corefile' in file: + corefile = yaml.safe_load(file)['data']['Corefile'] + print(corefile,end='') + exit(0); + print('error') + exit(1); + " > corefile_record.txt + cat corefile_record.txt + +#. Compare the output information of :ref:`2 ` and :ref:`3 `. + + .. code-block:: + + diff corefile_now.txt corefile_record.txt -y; + + |image1| + +#. Return to the CCE console and click the cluster name to go to the cluster console. On the **Add-ons** page, select the coredns add-on and click **Upgrade**. + + To retain the differentiated configurations, use either of the following methods: + + - Set **parameterSyncStrategy** to **force**. You need to manually enter the differentiated configurations. For details, see :ref:`coredns (System Resource Add-On, Mandatory) `. + - If **parameterSyncStrategy** is set to **inherit**, differentiated configurations are automatically inherited. The system automatically parses, identifies, and inherits differentiated parameters. + + |image2| + +#. Click **OK**. After the add-on upgrade is complete, check whether all CoreDNS instances are available and whether the Corefile meets the expectation. + + .. code-block:: + + kubectl get cm -nkube-system coredns -o jsonpath='{.data.Corefile}' + +#. Change the value of **parameterSyncStrategy** to **ensureConsistent** to enable configuration consistency verification. + + Use the parameter configuration function of CCE add-on management to modify the Corefile configuration to avoid differences. + +.. |image1| image:: /_static/images/en-us_image_0000001628843805.png +.. |image2| image:: /_static/images/en-us_image_0000001578443828.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_deprecated_kubernetes_apis.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_deprecated_kubernetes_apis.rst new file mode 100644 index 0000000..dc9b337 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_deprecated_kubernetes_apis.rst @@ -0,0 +1,28 @@ +:original_name: cce_10_0487.html + +.. _cce_10_0487: + +Checking Deprecated Kubernetes APIs +=================================== + +Check Item +---------- + +The system scans the audit logs of the past day to check whether the user calls the deprecated APIs of the target Kubernetes version. + +.. note:: + + Due to the limited time range of audit logs, this check item is only an auxiliary method. APIs to be deprecated may have been used in the cluster, but their usage is not included in the audit logs of the past day. Check the API usage carefully. + +Solution +-------- + +**Description** + +The check result shows that your cluster calls a deprecated API of the target cluster version through kubectl or other applications. You can rectify the fault before the upgrade. Otherwise, the API will be intercepted by kube-apiserver after the upgrade. For details about each deprecated API, see `Deprecated API Migration Guide `__. + +**Cases** + +Ingresses of extensions/v1beta1 and networking.k8s.io/v1beta1 API are deprecated in clusters of v1.22. If you upgrade a CCE cluster from v1.19 or v1.21 to v1.23, existing resources are not affected, but the v1beta1 API version may be intercepted in the creation and editing scenarios. + +For details about the YAML configuration structure changes, see :ref:`Using kubectl to Create an ELB Ingress `. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_add-on.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_add-on.rst new file mode 100644 index 0000000..fc6b1e0 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_add-on.rst @@ -0,0 +1,33 @@ +:original_name: cce_10_0433.html + +.. _cce_10_0433: + +Checking the Add-on +=================== + +Check Item +---------- + +Check the following aspects: + +- Check whether the add-on status is normal. +- Check whether the add-on support the target version. + +Solution +-------- + +- **Scenario 1: The add-on status is abnormal.** + + Log in to the CCE console and go to the target cluster. Choose **O&M** > **Add-ons** to view and handle the abnormal add-on. + +- **Scenario 2: The target version does not support the current add-on.** + + The add-on cannot be automatically upgraded with the cluster. Log in to the CCE console and go to the target cluster. Choose **O&M** > **Add-ons** to manually upgrade the add-on. + +- **Scenario 3: The add-on does not support the target cluster even if the add-on is upgraded to the latest version. In this case, go to the cluster console and choose Add-ons in the navigation pane to manually uninstall the add-on.** + + For details about the supported add-on versions and replacement solutions, see the :ref:`add-on overview `. + + |image1| + +.. |image1| image:: /_static/images/en-us_image_0000001518062716.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_blocklist.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_blocklist.rst new file mode 100644 index 0000000..132914b --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_blocklist.rst @@ -0,0 +1,21 @@ +:original_name: cce_10_0432.html + +.. _cce_10_0432: + +Checking the Blocklist +====================== + +Check Item +---------- + +Check whether the current user is in the upgrade blocklist. + +Solution +-------- + +CCE temporarily disables the cluster upgrade function due to the following reasons: + +- The cluster is identified as the core production cluster. +- Other O&M tasks are being or will be performed to improve cluster stability, for example, 3AZ reconstruction of the master node. + +You can contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_helm_chart.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_helm_chart.rst new file mode 100644 index 0000000..2f30fc8 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_helm_chart.rst @@ -0,0 +1,20 @@ +:original_name: cce_10_0434.html + +.. _cce_10_0434: + +Checking the Helm Chart +======================= + +Check Item +---------- + +Check whether the current HelmRelease record contains discarded Kubernetes APIs that are not supported by the target cluster version. If yes, the Helm chart may be unavailable after the upgrade. + +Solution +-------- + +Convert the discarded Kubernetes APIs to APIs that are compatible with both the source and target versions. + +.. note:: + + This item has been automatically processed in the upgrade process. You can ignore this item. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_master_node_ssh_connectivity.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_master_node_ssh_connectivity.rst new file mode 100644 index 0000000..5e159fe --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_master_node_ssh_connectivity.rst @@ -0,0 +1,16 @@ +:original_name: cce_10_0435.html + +.. _cce_10_0435: + +Checking the Master Node SSH Connectivity +========================================= + +Check Item +---------- + +Check whether CCE can connect to your master nodes. + +Solution +-------- + +Contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_node.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_node.rst new file mode 100644 index 0000000..059c7a2 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_node.rst @@ -0,0 +1,54 @@ +:original_name: cce_10_0431.html + +.. _cce_10_0431: + +Checking the Node +================= + +Check Item +---------- + +Check the following aspects: + +- Check whether the node is available. +- Check whether the node OS supports the upgrade. +- Check whether there are unexpected node pool tags in the node. +- Check whether the Kubernetes node name is consistent with the ECS name. + +Solution +-------- + +- **Scenario 1: The node is unavailable.** + + Log in to the CCE console and access the cluster console. Choose **Nodes** in the navigation pane and check the node status. Ensure that the node is in the **Running** status. A node in the **Installing** or **Deleting** status cannot be upgraded. + + If the node status is abnormal, restore the node by referring to and retry the check task. + +- **Scenario 2: The node OS does not support the upgrade.** + + The following table lists the node OSs that support the upgrade. You can reset the node OS to an available OS in the list. + + .. table:: **Table 1** OSs that support the upgrade + + ============================ =========== + OS Restriction + ============================ =========== + EulerOS 2.3/2.5/2.8/2.9/2.10 None. + ============================ =========== + +- **Scenario 3: There are unexpected node pool tags in the node.** + + If a node is migrated from a node pool to the default node pool, the node pool label **cce.cloud.com/cce-nodepool** is retained, affecting cluster upgrade. Check whether the load scheduling on the node depends on the label. + + - If there is no dependency, delete the tag. + - If yes, modify the load balancing policy, remove the dependency, and then delete the tag. + +- **Scenario 4: The Kubernetes node name is consistent with the ECS name.** + + Kubernetes node name, which defaults to the node's private IP. If you select a cloud server name as the node name, the cluster cannot be upgraded. + + Log in to the CCE console and access the cluster console. Choose **Nodes** in the navigation pane, view the node label, and check whether the value of **kubernetes.io/hostname** is consistent with the ECS name. If they are the same, remove the node before the cluster upgrade. + + |image1| + +.. |image1| image:: /_static/images/en-us_image_0000001517903020.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_node_pool.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_node_pool.rst new file mode 100644 index 0000000..012260d --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_node_pool.rst @@ -0,0 +1,45 @@ +:original_name: cce_10_0436.html + +.. _cce_10_0436: + +Checking the Node Pool +====================== + +Check Item +---------- + +Check the following aspects: + +- Check the node status. +- Check whether the auto scaling function of the node pool is disabled. + +Solution +-------- + +- **Scenario 1: The node pool status is abnormal.** + + Log in to the CCE console, go to the target cluster and choose **Nodes**. On the displayed page, click **Node Pools** tab and check the node pool status. If the node pool is being scaled, wait until the scaling is complete, and disable the auto scaling function by referring to :ref:`Scenario 2 `. + +- .. _cce_10_0436__li2791152121810: + + **Scenario 2: The auto scaling function of the node pool is enabled.** + + **Solution 1 (Recommended)** + + Log in to the CCE console and go to the target cluster. Choose **O&M** > **Add-ons** and uninstall the autoscaler add-on. + + .. note:: + + Before uninstalling the autoscaler add-on, click **Upgrade** to back up the configuration so that the add-on configuration can be restored during reinstallation. + + Before uninstalling the autoscaler add-on, choose **O&M** > **Node Scaling** and back up the current scaling policies so that they can be restored during reinstallation. These policies will be deleted when the autoscaler add-on is uninstalled. + + Obtain and back up the node scaling policy by clicking **Edit**. + + **Solution 2** + + If you do not want to uninstall the autoscaler add-on, log in to the CCE console and access the cluster detail page. Choose **Nodes** in the navigation pane. On the displayed page, click the **Node Pools** tab and click **Edit** of the corresponding node pool to disable the auto scaling function. + + .. note:: + + Before disabling the auto scaling function, back up the autoscaling configuration so that the configuration can be restored when the function is enabled. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_security_group.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_security_group.rst new file mode 100644 index 0000000..db669df --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/checking_the_security_group.rst @@ -0,0 +1,23 @@ +:original_name: cce_10_0437.html + +.. _cce_10_0437: + +Checking the Security Group +=========================== + +Check Item +---------- + +Check whether the security group allows the master node to access nodes using ICMP. + +Solution +-------- + +Log in to the VPC console, choose **Access Control** > **Security Groups**, and enter the target cluster name in the search box. Two security groups are displayed: + +- The security group name is **cluster name-node-xxx**. This security group is associated with the user nodes. +- The security group name is **cluster name-control-xxx**. This security group is associated with the master nodes. + +Click the security group of the node user and ensure that the following rules are configured to allow the master node to access the node using **ICMP**. + +Otherwise, add a rule to the node security group. Set **Source** to **Security group**. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/compatibility_risk.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/compatibility_risk.rst new file mode 100644 index 0000000..3dc0be0 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/compatibility_risk.rst @@ -0,0 +1,66 @@ +:original_name: cce_10_0441.html + +.. _cce_10_0441: + +Compatibility Risk +================== + +Check Item +---------- + +Read the version compatibility differences and ensure that they are not affected. + +The patch upgrade does not involve version compatibility differences. + +Version compatibility +--------------------- + ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Major Version Upgrade Path | Precaution | Self-Check | ++======================================+==============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ +| Upgrade from v1.19 to v1.21 or v1.23 | The bug of **exec probe timeouts** is fixed in Kubernetes 1.21. Before this bug fix, the exec probe does not consider the **timeoutSeconds** field. Instead, the probe will run indefinitely, even beyond its configured deadline. It will stop until the result is returned. If this field is not specified, the default value **1** is used. This field takes effect after the upgrade. If the probe runs over 1 second, the application health check may fail and the application may restart frequently. | Before the upgrade, check whether the timeout is properly set for the exec probe. | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| | kube-apiserver of CCE 1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly. | Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server. | +| | | | +| | Root cause: X.509 `CommonName `__ is discarded in Go 1.15. kube-apiserver of CCE 1.19 is compiled using Go 1.15. If your webhook certificate does not have SANs, kube-apiserver does not process the **CommonName** field of the X.509 certificate as the host name by default. As a result, the authentication fails. | - If you do not have your own webhook server, you can skip this check. | +| | | - If the field is not set, you are advised to use the SAN field to specify the IP address and domain name supported by the certificate. | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| | Arm nodes are not supported in clusters of v1.21 and later. | Check whether your services will be affected if Arm nodes cannot be used. | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Upgrade from v1.15 to v1.19 | The control plane of in the clusters v1.19 is incompatible with kubelet v1.15. If a node fails to be upgraded or the node to be upgraded restarts after the master node is successfully upgraded, there is a high probability that the node is in the **NotReady** status. | #. In normal cases, this scenario is not triggered. | +| | | #. After the master node is upgraded, do not suspend the upgrade so the node can be quickly upgraded. | +| | This is because the node failed to be upgraded restarts the kubelet and trigger the node registration. In clusters of v1.15, the default registration tags (**failure-domain.beta.kubernetes.io/is-baremetal** and **kubernetes.io/availablezone**) are regarded as invalid tags by the clusters of v1.19. | #. If a node fails to be upgraded and cannot be restored, evict applications on the node as soon as possible. Contact technical support and skip the node upgrade. After the upgrade is complete, reset the node. | +| | | | +| | The valid tags in the clusters of v1.19 are **node.kubernetes.io/baremetal** and **failure-domain.beta.kubernetes.io/zone**. | | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| | In CCE 1.15 and 1.19 clusters, the Docker storage driver file system is switched from XFS to Ext4. As a result, the import package sequence in the pods of the upgraded Java application may be abnormal, causing pod exceptions. | Before the upgrade, check the Docker configuration file **/etc/docker/daemon.json** on the node. Check whether the value of **dm.fs** is **xfs**. | +| | | | +| | | - If the value is **ext4** or the storage driver is Overlay, you can skip the next steps. | +| | | - If the value is **xfs**, you are advised to deploy applications in the cluster of the new version in advance to test whether the applications are compatible with the new cluster version. | +| | | | +| | | .. code-block:: | +| | | | +| | | { | +| | | "storage-driver": "devicemapper", | +| | | "storage-opts": [ | +| | | "dm.thinpooldev=/dev/mapper/vgpaas-thinpool", | +| | | "dm.use_deferred_removal=true", | +| | | "dm.fs=xfs", | +| | | "dm.use_deferred_deletion=true" | +| | | ] | +| | | } | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| | kube-apiserver of CCE 1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly. | Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server. | +| | | | +| | Root cause: X.509 `CommonName `__ is discarded in Go 1.15. kube-apiserver of CCE 1.19 is compiled using Go 1.15. The **CommonName** field is processed as the host name. As a result, the authentication fails. | - If you do not have your own webhook server, you can skip this check. | +| | | - If the field is not set, you are advised to use the SAN field to specify the IP address and domain name supported by the certificate. | +| | | | +| | | .. important:: | +| | | | +| | | NOTICE: | +| | | To mitigate the impact of version differences on cluster upgrade, CCE performs special processing during the upgrade from 1.15 to 1.19 and still supports certificates without SANs. However, no special processing is required for subsequent upgrades. You are advised to rectify your certificate as soon as possible. | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| | In clusters of v1.17.17 and later, CCE automatically creates pod security policies (PSPs) for you, which restrict the creation of pods with unsafe configurations, for example, pods for which **net.core.somaxconn** under a sysctl is configured in the security context. | After an upgrade, you can allow insecure system configurations as required. For details, see :ref:`Configuring a Pod Security Policy `. | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Upgrade from v1.13 to v1.15 | After a VPC network cluster is upgraded, the master node occupies an extra CIDR block due to the upgrade of network components. If no container CIDR block is available for the new node, the pod scheduled to the node cannot run. | This problem occurs when almost all CIDR blocks are occupied. For example, the container CIDR block is 10.0.0.0/16, the number of available IP addresses is 65,536, and the VPC network is allocated a CIDR block with the fixed size (using the mask to determine the maximum number of container IP addresses allocated to each node). If the upper limit is 128, the cluster supports a maximum of 512 (65536/128) nodes, including the three master nodes. After the cluster is upgraded, each of the three master nodes occupies one CIDR block. As a result, 506 nodes are supported. | ++--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/containerd.sock_check.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/containerd.sock_check.rst new file mode 100644 index 0000000..4310841 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/containerd.sock_check.rst @@ -0,0 +1,21 @@ +:original_name: cce_10_0457.html + +.. _cce_10_0457: + +containerd.sock Check +===================== + +Check Item +---------- + +Check whether the containerd.sock file exists on the node. This file affects the startup of container runtime in the Euler OS. + +Solution +-------- + +**Scenario: The Docker used by the node is the customized Euler-dokcer.** + +#. Log in to the node. +#. Run the **rpm -qa \| grep docker \| grep euleros** command. If the command output is not empty, the Docker used on the node is Euler-docker. +#. Run the **ls /run/containerd/containerd.sock** command. If the file exists, Docker fails to be started. +#. Run the **rm -rf /run/containerd/containerd.sock** command and perform the cluster upgrade check again. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/controller_node_components_health.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/controller_node_components_health.rst new file mode 100644 index 0000000..16d08fc --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/controller_node_components_health.rst @@ -0,0 +1,16 @@ +:original_name: cce_10_0485.html + +.. _cce_10_0485: + +Controller Node Components Health +================================= + +Check Item +---------- + +Check whether the Kubernetes, container runtime, and network components of the controller node are healthy. + +Solution +-------- + +If a component on the controller node is abnormal, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/crd_check.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/crd_check.rst new file mode 100644 index 0000000..1740da7 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/crd_check.rst @@ -0,0 +1,19 @@ +:original_name: cce_10_0444.html + +.. _cce_10_0444: + +CRD Check +========= + +Check Item +---------- + +Check the following aspects: + +- Check whether the key CRD **packageversions.version.cce.io** of the cluster is deleted. +- Check whether the cluster key CRD **network-attachment-definitions.k8s.cni.cncf.io** is deleted. + +Solution +-------- + +If check results are abnormal, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/discarded_kubernetes_resource.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/discarded_kubernetes_resource.rst new file mode 100644 index 0000000..ea69a7d --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/discarded_kubernetes_resource.rst @@ -0,0 +1,35 @@ +:original_name: cce_10_0440.html + +.. _cce_10_0440: + +Discarded Kubernetes Resource +============================= + +Check Item +---------- + +Check whether there are discarded resources in the clusters. + +Solution +-------- + +**Scenario 1: The PodSecurityPolicy resource object has been discarded since clusters of v1.25.** + +|image1| + +Run the **kubectl get psp -A** command in the cluster to obtain the existing PSP object. + +If these two objects are not used, skip the check. Otherwise, upgrade the corresponding functions to PodSecurity by referring to :ref:`Pod Security `. + +**Scenario 2: The discarded annotation (tolerate-unready-endpoints) exists in Services in clusters of 1.25 or later.** + +|image2| + +Check whether the Service in the log information contains the annotation **tolerate-unready-endpoints**. If yes, delete the annotation and add the following field to the spec of the corresponding Service to replace the annotation: + +.. code-block:: + + publishNotReadyAddresses: true + +.. |image1| image:: /_static/images/en-us_image_0000001569022901.png +.. |image2| image:: /_static/images/en-us_image_0000001517903056.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/enhanced_cpu_management_policy.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/enhanced_cpu_management_policy.rst new file mode 100644 index 0000000..fa08db3 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/enhanced_cpu_management_policy.rst @@ -0,0 +1,29 @@ +:original_name: cce_10_0480.html + +.. _cce_10_0480: + +Enhanced CPU Management Policy +============================== + +Check Item +---------- + +Check whether the current cluster version and the target version support enhanced CPU policy. + +Solution +-------- + +**Scenario**: The current cluster version uses the enhanced CPU management policy, but the target cluster version does not support the enhanced CPU management policy. + +Upgrade the cluster to a version that supports the enhanced CPU management policy. The following table lists the cluster versions that support the enhanced CPU management policy. + +.. table:: **Table 1** Cluster versions that support the enhanced CPU policy + + ================ ============================= + Cluster Version Enhanced CPU Policy Supported + ================ ============================= + v1.17 or earlier No + v1.19 No + v1.21 No + v1.23 or later Yes + ================ ============================= diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/everest_restriction_check.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/everest_restriction_check.rst new file mode 100644 index 0000000..c5f52d6 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/everest_restriction_check.rst @@ -0,0 +1,28 @@ +:original_name: cce_10_0478.html + +.. _cce_10_0478: + +everest Restriction Check +========================= + +Check Item +---------- + +Check whether the current everest add-on has compatibility restrictions. See :ref:`Table 1 `. + +.. _cce_10_0478__table1126154011128: + +.. table:: **Table 1** List of everest add-on versions that have compatibility restrictions + + +-----------------------------------+-----------------------------------+ + | Add-on Name | Versions Involved | + +===================================+===================================+ + | everest | v1.0.2-v1.0.7 | + | | | + | | v1.1.1-v1.1.5 | + +-----------------------------------+-----------------------------------+ + +Solution +-------- + +The current everest add-on has compatibility restrictions and cannot be upgraded with the cluster upgrade. Contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/index.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/index.rst new file mode 100644 index 0000000..71bf744 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/index.rst @@ -0,0 +1,96 @@ +:original_name: cce_10_0550.html + +.. _cce_10_0550: + +Troubleshooting for Pre-upgrade Check Exceptions +================================================ + +- :ref:`Performing Pre-upgrade Check ` +- :ref:`Checking the Node ` +- :ref:`Checking the Blocklist ` +- :ref:`Checking the Add-on ` +- :ref:`Checking the Helm Chart ` +- :ref:`Checking the Master Node SSH Connectivity ` +- :ref:`Checking the Node Pool ` +- :ref:`Checking the Security Group ` +- :ref:`To-Be-Migrated Node ` +- :ref:`Discarded Kubernetes Resource ` +- :ref:`Compatibility Risk ` +- :ref:`Node CCEAgent Version ` +- :ref:`Node CPU Usage ` +- :ref:`CRD Check ` +- :ref:`Node Disk ` +- :ref:`Node DNS ` +- :ref:`Node Key Directory File Permissions ` +- :ref:`Kubelet ` +- :ref:`Node Memory ` +- :ref:`Node Clock Synchronization Server ` +- :ref:`Node OS ` +- :ref:`Node CPU Count ` +- :ref:`Node Python Command ` +- :ref:`Node Readiness ` +- :ref:`Node journald ` +- :ref:`containerd.sock Check ` +- :ref:`Internal Error ` +- :ref:`Node Mount Point ` +- :ref:`Kubernetes Node Taint ` +- :ref:`everest Restriction Check ` +- :ref:`cce-hpa-controller Restriction Check ` +- :ref:`Enhanced CPU Management Policy ` +- :ref:`User Node Components Health ` +- :ref:`Controller Node Components Health ` +- :ref:`Memory Resource Limit of Kubernetes Components ` +- :ref:`Checking Deprecated Kubernetes APIs ` +- :ref:`IPv6 Capability of a CCE Turbo Cluster ` +- :ref:`Node NetworkManager ` +- :ref:`Node ID File ` +- :ref:`Node Configuration Consistency ` +- :ref:`Node Configuration File ` +- :ref:`Checking CoreDNS Configuration Consistency ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + performing_pre-upgrade_check + checking_the_node + checking_the_blocklist + checking_the_add-on + checking_the_helm_chart + checking_the_master_node_ssh_connectivity + checking_the_node_pool + checking_the_security_group + to-be-migrated_node + discarded_kubernetes_resource + compatibility_risk + node_cceagent_version + node_cpu_usage + crd_check + node_disk + node_dns + node_key_directory_file_permissions + kubelet + node_memory + node_clock_synchronization_server + node_os + node_cpu_count + node_python_command + node_readiness + node_journald + containerd.sock_check + internal_error + node_mount_point + kubernetes_node_taint + everest_restriction_check + cce-hpa-controller_restriction_check + enhanced_cpu_management_policy + user_node_components_health + controller_node_components_health + memory_resource_limit_of_kubernetes_components + checking_deprecated_kubernetes_apis + ipv6_capability_of_a_cce_turbo_cluster + node_networkmanager + node_id_file + node_configuration_consistency + node_configuration_file + checking_coredns_configuration_consistency diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/internal_error.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/internal_error.rst new file mode 100644 index 0000000..a8ae92d --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/internal_error.rst @@ -0,0 +1,16 @@ +:original_name: cce_10_0458.html + +.. _cce_10_0458: + +Internal Error +============== + +Check Item +---------- + +Before the upgrade, check whether an internal error occurs. + +Solution +-------- + +If this check fails, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/ipv6_capability_of_a_cce_turbo_cluster.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/ipv6_capability_of_a_cce_turbo_cluster.rst new file mode 100644 index 0000000..cff094f --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/ipv6_capability_of_a_cce_turbo_cluster.rst @@ -0,0 +1,22 @@ +:original_name: cce_10_0488.html + +.. _cce_10_0488: + +IPv6 Capability of a CCE Turbo Cluster +====================================== + +Check Item +---------- + +If IPv6 is enabled for a CCE Turbo cluster, check whether the target cluster version supports IPv6. + +Solution +-------- + +CCE Turbo clusters support IPv6 since v1.23. This feature is available in the following versions: + +- v1.23: 1.23.8-r0 or later +- v1.25: 1.25.3-r0 or later +- v1.25 or later + +If IPv6 has been enabled in the cluster before the upgrade, the target cluster version must also support IPv6. Select a proper cluster version. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/kubelet.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/kubelet.rst new file mode 100644 index 0000000..9c69cd3 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/kubelet.rst @@ -0,0 +1,22 @@ +:original_name: cce_10_0448.html + +.. _cce_10_0448: + +Kubelet +======= + +Check Item +---------- + +Check whether the kubelet on the node is running properly. + +Solution +-------- + +- **Scenario 1: The kubelet status is abnormal.** + + If the kubelet is abnormal, the node is unavailable. Restore the node by following the instructions provided in and check again. + +- **Scenario 2: The cce-pause version is abnormal.** + + The version of the pause container image on which kubelet depends is not cce-pause:3.1. If you continue the upgrade, pods will restart in batches. Currently, the upgrade is not supported. Contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/kubernetes_node_taint.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/kubernetes_node_taint.rst new file mode 100644 index 0000000..eeb035e --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/kubernetes_node_taint.rst @@ -0,0 +1,40 @@ +:original_name: cce_10_0460.html + +.. _cce_10_0460: + +Kubernetes Node Taint +===================== + +Check Item +---------- + +Check whether the taint, as shown in :ref:`Table 1 `, exists on the node. + +.. _cce_10_0460__table1126154011128: + +.. table:: **Table 1** Taint checklist + + ========================== ========== + Name Impact + ========================== ========== + node.kubernetes.io/upgrade NoSchedule + ========================== ========== + +Solution +-------- + +Scenario 1: The node is skipped during the cluster upgrade. + +#. For details about how to configure kubectl, see :ref:`Connecting to a Cluster Using kubectl `. + +#. Check the kubelet version of the corresponding node. If the following information is expected: + + |image1| + + If the version of the node is different from that of other nodes, the node is skipped during the upgrade. Reset the node and upgrade the cluster again. For details about how to reset a node, see :ref:`Resetting a Node `. + + .. note:: + + Resetting a node will reset all node labels, which may affect workload scheduling. Before resetting a node, check and retain the labels that you have manually added to the node. + +.. |image1| image:: /_static/images/en-us_image_0000001568902601.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/memory_resource_limit_of_kubernetes_components.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/memory_resource_limit_of_kubernetes_components.rst new file mode 100644 index 0000000..f9bd8ec --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/memory_resource_limit_of_kubernetes_components.rst @@ -0,0 +1,22 @@ +:original_name: cce_10_0486.html + +.. _cce_10_0486: + +Memory Resource Limit of Kubernetes Components +============================================== + +Check Item +---------- + +Check whether the resources of Kubernetes components, such as etcd and kube-controller-manager, exceed the upper limit. + +Solution +-------- + +Solution 1: Reducing Kubernetes resources + +Solution 2: :ref:`Expanding cluster scale ` + +|image1| + +.. |image1| image:: /_static/images/en-us_image_0000001579008782.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cceagent_version.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cceagent_version.rst new file mode 100644 index 0000000..acf720c --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cceagent_version.rst @@ -0,0 +1,56 @@ +:original_name: cce_10_0442.html + +.. _cce_10_0442: + +Node CCEAgent Version +===================== + +Check Item +---------- + +Check whether cce-agent on the current node is of the latest version. + +Solution +-------- + +If cce-agent is not of the latest version, the automatic update fails. This problem is usually caused by invalid OBS address or the version of the component is outdated. + +#. Log in to a normal node that passes the check, obtain the path of the cce-agent configuration file, and check the OBS address. + + .. code-block:: + + cat `ps aux | grep cce-agent | grep -v grep | awk -F '-f ''{print $2}'` + + The OBS configuration address field in the configuration file is **packageFrom.addr**. + + |image1| + +#. Log in to an abnormal node where the check fails, obtain the OBS address again by referring to the previous step, and check whether the OBS address is consistent. If they are different, change the OBS address of the abnormal node to the correct address. + +#. Run the following commands to download the latest binary file: + + - ARM + + .. code-block:: + + curl -k "https://{OBS address you have obtained}/cluster-versions/base/cce-agent-arm" > /tmp/cce-agent-arm + +#. Replace the original cce-agent binary file. + + - ARM + + .. code-block:: + + mv -f /tmp/cce-agent-arm /usr/local/bin/cce-agent-arm + chmod 750 /usr/local/bin/cce-agent-arm + chown root:root /usr/local/bin/cce-agent-arm + +#. Restart cce-agent. + + .. code-block:: + + systemctl restart cce-agent + + If you have any questions about the preceding operations, contact technical support. + +.. |image1| image:: /_static/images/en-us_image_0000001517903052.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_clock_synchronization_server.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_clock_synchronization_server.rst new file mode 100644 index 0000000..7b229c5 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_clock_synchronization_server.rst @@ -0,0 +1,37 @@ +:original_name: cce_10_0450.html + +.. _cce_10_0450: + +Node Clock Synchronization Server +================================= + +Check Item +---------- + +Check whether the clock synchronization server ntpd or chronyd of the node is running properly. + +Solution +-------- + +- **Scenario 1: ntpd is running abnormally.** + + Log in to the node and run the **systemctl status ntpd** command to query the running status of ntpd. If the command output is abnormal, run the **systemctl restart ntpd** command and query the status again. + + The normal command output is as follows: + + |image1| + + If the problem persists after ntpd is restarted, contact technical support. + +- **Scenario 2: chronyd is running abnormally.** + + Log in to the node and run the **systemctl status chronyd** command to query the running status of chronyd. If the command output is abnormal, run the **systemctl restart chronyd** command and query the status again. + + The normal command output is as follows: + + |image2| + + If the problem persists after chronyd is restarted, contact technical support. + +.. |image1| image:: /_static/images/en-us_image_0000001568902509.png +.. |image2| image:: /_static/images/en-us_image_0000001518062624.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_configuration_consistency.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_configuration_consistency.rst new file mode 100644 index 0000000..5955129 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_configuration_consistency.rst @@ -0,0 +1,60 @@ +:original_name: cce_10_0491.html + +.. _cce_10_0491: + +Node Configuration Consistency +============================== + +Check Item +---------- + +When you upgrade a CCE cluster to v1.19 or later, the system checks whether the following configuration files have been modified in the background: + +- /opt/cloud/cce/kubernetes/kubelet/kubelet +- /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml +- /opt/cloud/cce/kubernetes/kube-proxy/kube-proxy +- /etc/containerd/default_runtime_spec.json +- /etc/sysconfig/docker +- /etc/default/docker +- /etc/docker/daemon.json + +If you modify some parameters in these files, the cluster upgrade may fail or services may be abnormal after the upgrade. If you confirm that the modification does not affect services, continue the upgrade. + +.. note:: + + CCE uses the standard image script to check node configuration consistency. If you use other custom images, the check may fail. + +The expected modification will not be intercepted. The following table lists the parameters that can be modified. + +.. table:: **Table 1** Parameters that can be modified + + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | Component | Configuration File | Parameter | Upgrade Version | + +===========+=======================================================+=======================+==================+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | cpuManagerPolicy | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | maxPods | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | kubeAPIQPS | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | kubeAPIBurst | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | podPidsLimit | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | topologyManagerPolicy | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | resolvConf | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | eventRecordQPS | Later than v1.21 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | topologyManagerScope | Later than v1.21 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | kubelet | /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | allowedUnsafeSysctls | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + | Docker | /etc/docker/daemon.json | dm.basesize | Later than v1.19 | + +-----------+-------------------------------------------------------+-----------------------+------------------+ + +Solution +-------- + +If you modify some parameters in these files, exceptions may occur after the upgrade. If you are not sure whether the modified parameters will affect the upgrade, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_configuration_file.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_configuration_file.rst new file mode 100644 index 0000000..2c9be9b --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_configuration_file.rst @@ -0,0 +1,32 @@ +:original_name: cce_10_0492.html + +.. _cce_10_0492: + +Node Configuration File +======================= + +Check Item +---------- + +Check whether the configuration files of key components exist on the node. + +The following table lists the files to be checked. + ++-------------------------------------------------------+--------------------------------------------+------------------------------------------------------------------+ +| File Name | File Content | Remarks | ++=======================================================+============================================+==================================================================+ +| /opt/cloud/cce/kubernetes/kubelet/kubelet | kubelet command line startup parameters | ``-`` | ++-------------------------------------------------------+--------------------------------------------+------------------------------------------------------------------+ +| /opt/cloud/cce/kubernetes/kubelet/kubelet_config.yaml | kubelet startup parameters | ``-`` | ++-------------------------------------------------------+--------------------------------------------+------------------------------------------------------------------+ +| /opt/cloud/cce/kubernetes/kube-proxy/kube-proxy | kube-proxy command line startup parameters | ``-`` | ++-------------------------------------------------------+--------------------------------------------+------------------------------------------------------------------+ +| /etc/sysconfig/docker | Docker configuration file | Not checked when containerd or the Debain-Group machine is used. | ++-------------------------------------------------------+--------------------------------------------+------------------------------------------------------------------+ +| /etc/default/docker | Docker configuration file | Not checked when containerd or the Centos-Group machine is used. | ++-------------------------------------------------------+--------------------------------------------+------------------------------------------------------------------+ + +Solution +-------- + +Contact technical support to restore the configuration file and then perform the upgrade. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cpu_count.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cpu_count.rst new file mode 100644 index 0000000..6b3bbb7 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cpu_count.rst @@ -0,0 +1,16 @@ +:original_name: cce_10_0452.html + +.. _cce_10_0452: + +Node CPU Count +============== + +Check Item +---------- + +Check whether the number of CPUs on the master node is greater than 2. + +Solution +-------- + +If the number of CPUs on the master node is 2, contact technical support to expand the number to 4 or more. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cpu_usage.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cpu_usage.rst new file mode 100644 index 0000000..61e3d71 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_cpu_usage.rst @@ -0,0 +1,17 @@ +:original_name: cce_10_0443.html + +.. _cce_10_0443: + +Node CPU Usage +============== + +Check Item +---------- + +Check whether the CPU usage of the node exceeds 90%. + +Solution +-------- + +- **Upgrade the cluster during off-peak hours.** +- Check whether too many pods are deployed on the node. If yes, reschedule pods to other idle nodes. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_disk.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_disk.rst new file mode 100644 index 0000000..8630d9a --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_disk.rst @@ -0,0 +1,55 @@ +:original_name: cce_10_0445.html + +.. _cce_10_0445: + +Node Disk +========= + +Check Item +---------- + +Check the following aspects: + +- Check whether the key data disks on the node meet the upgrade requirements. +- Check whether the **/tmp** directory has 500 MB available space. + +Solution +-------- + +During the node upgrade, the key disks store the upgrade component package, and the **/tmp** directory stores temporary files. + +- **Scenario 1: Check whether the disk meets the upgrade requirements.** + + Run the following command to check the usage of each key disk. After ensuring that the available space meets the requirements and check again. If the space of the master node is insufficient, contact technical support. + + - Disk partition of Docker: 2 GB for master nodes and 1 GB for worker nodes + + .. code-block:: + + df -h /var/lib/docker + + - Disk partition of containerd: 2 GB for master nodes and 1 GB for worker nodes + + .. code-block:: + + df -h /var/lib/containerd + + - Disk partition of kubelet: 2 GB for master nodes and 1 GB for worker nodes + + .. code-block:: + + df -h /var/lib/docker + + - System disk: 10 GB for master nodes and 2 GB for worker nodes + + .. code-block:: + + df -h / + +- **Scenario 2: The /tmp directory space is insufficient.** + + Run the following command to check the space usage of the file system where the /tmp directory is located. Ensure that the space is greater than 500 MB and check again. + + .. code-block:: + + df -h /tmp diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_dns.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_dns.rst new file mode 100644 index 0000000..69ab0fb --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_dns.rst @@ -0,0 +1,19 @@ +:original_name: cce_10_0446.html + +.. _cce_10_0446: + +Node DNS +======== + +Check Item +---------- + +Check the following aspects: + +- Check whether the DNS configuration of the current node can resolve the OBS address. +- Check whether the current node can access the OBS address of the storage upgrade component package. + +Solution +-------- + +During the node upgrade, you need to obtain the upgrade component package from OBS. If this check fails, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_id_file.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_id_file.rst new file mode 100644 index 0000000..8649314 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_id_file.rst @@ -0,0 +1,39 @@ +:original_name: cce_10_0490.html + +.. _cce_10_0490: + +Node ID File +============ + +Check Item +---------- + +Check the ID file format. + +Solution +-------- + +#. On the **Nodes** page of the CCE console, click the name of the abnormal node to go to the ECS page. + + |image1| + +#. Copy the node ID and save it to the local host. + + |image2| + +#. Log in to the abnormal node and back up files. + + .. code-block:: + + cp /var/lib/cloud/data/instance-id /tmp/instance-id + cp /var/paas/conf/server.conf /tmp/server.conf + +#. Log in to the abnormal node and write the obtained node ID to the file. + + .. code-block:: + + echo "Node ID" >> /var/lib/cloud/data/instance-id + echo "Node ID" >> /var/paas/conf/server.conf + +.. |image1| image:: /_static/images/en-us_image_0000001504661902.png +.. |image2| image:: /_static/images/en-us_image_0000001504821802.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_journald.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_journald.rst new file mode 100644 index 0000000..843a433 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_journald.rst @@ -0,0 +1,24 @@ +:original_name: cce_10_0456.html + +.. _cce_10_0456: + +Node journald +============= + +Check Item +---------- + +Check whether journald of a node is normal. + +Solution +-------- + +Log in to the node and run the **systemctl is-active systemd-journald** command to query the running status of journald. If the command output is abnormal, run the **systemctl restart systemd-journald** command and query the status again. + +The normal command output is as follows: + +|image1| + +If the problem persists after journald is restarted, contact technical support. + +.. |image1| image:: /_static/images/en-us_image_0000001517903128.png diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_key_directory_file_permissions.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_key_directory_file_permissions.rst new file mode 100644 index 0000000..e862e80 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_key_directory_file_permissions.rst @@ -0,0 +1,20 @@ +:original_name: cce_10_0447.html + +.. _cce_10_0447: + +Node Key Directory File Permissions +=================================== + +Check Item +---------- + +Check whether the key directory **/var/paas** on the nodes contain files with abnormal owners or owner groups. + +Solution +-------- + +CCE uses the **/var/paas** directory to manage nodes and store file data whose owner and owner group are both paas. + +During the current cluster upgrade, the owner and owner group of the files in the **/var/paas** directory are reset to paas. + +Check whether file data is stored in the **/var/paas** directory. If yes, do not use this directory, remove abnormal files from this directory, and check again. Otherwise, the upgrade is prohibited. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_memory.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_memory.rst new file mode 100644 index 0000000..9411066 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_memory.rst @@ -0,0 +1,17 @@ +:original_name: cce_10_0449.html + +.. _cce_10_0449: + +Node Memory +=========== + +Check Item +---------- + +Check whether the memory usage of the node exceeds 90%. + +Solution +-------- + +- **Upgrade the cluster during off-peak hours.** +- Check whether too many pods are deployed on the node. If yes, reschedule pods to other idle nodes. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_mount_point.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_mount_point.rst new file mode 100644 index 0000000..f454540 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_mount_point.rst @@ -0,0 +1,41 @@ +:original_name: cce_10_0459.html + +.. _cce_10_0459: + +Node Mount Point +================ + +Check Item +---------- + +Check whether inaccessible mount points exist on the node. + +Solution +-------- + +**Scenario: There are inaccessible mount points on the node.** + +If network NFS (such as OBS, SFS, and SFS) is used by the node and the node is disconnected with the NFS server, the mount point would be inaccessible and all processes that access this mount point are suspended. + +#. Log in to the node. + +#. Run the following commands on the node in sequence: + + .. code-block:: + + - df -h + - for dir in `df -h | grep -v "Mounted on" | awk "{print \\$NF}"`;do cd $dir; done && echo "ok" + +#. If **ok** is returned, no problem occurs. + + Otherwise, start another terminal and run the following command to check whether the previous command is in the D state: + + .. code-block:: + + - ps aux | grep "D " + +#. If a process is in the D state, the problem occurs.You can only reset the node to solve the problem. Reset the node and upgrade the cluster again. For details about how to reset a node, see :ref:`Resetting a Node `. + + .. note:: + + Resetting a node will reset all node labels, which may affect workload scheduling. Before resetting a node, check and retain the labels that you have manually added to the node. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_networkmanager.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_networkmanager.rst new file mode 100644 index 0000000..310ae79 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_networkmanager.rst @@ -0,0 +1,18 @@ +:original_name: cce_10_0489.html + +.. _cce_10_0489: + +Node NetworkManager +=================== + +Check Item +---------- + +Check whether NetworkManager of a node is normal. + +Solution +-------- + +Log in to the node and run the **systemctl is-active NetworkManager** command to query the running status of NetworkManager. If the command output is abnormal, run the **systemctl restart NetworkManager** command and query the status again. + +If the problem persists after NetworkManager is restarted, contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_os.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_os.rst new file mode 100644 index 0000000..70cdbad --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_os.rst @@ -0,0 +1,18 @@ +:original_name: cce_10_0451.html + +.. _cce_10_0451: + +Node OS +======= + +Check Item +---------- + +Check whether the OS kernel version of the node is supported by CCE. + +Solution +-------- + +Running nodes depend on the initial standard kernel version when they are created. CCE has performed comprehensive compatibility tests based on this kernel version. A non-standard kernel version may cause unexpected compatibility issues during node upgrade and the node upgrade may fail. For details, see :ref:`High-Risk Operations and Solutions `. + +Currently, this type of nodes should not be upgraded. You are advised to reset the node to the standard kernel version before the upgrade by following the instructions in :ref:`Resetting a Node `. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_python_command.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_python_command.rst new file mode 100644 index 0000000..e2493dc --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_python_command.rst @@ -0,0 +1,26 @@ +:original_name: cce_10_0453.html + +.. _cce_10_0453: + +Node Python Command +=================== + +Check Item +---------- + +Check whether the Python commands are available on a node. + +Check Method +------------ + +.. code-block:: + + /usr/bin/python --version + echo $? + +If the command output is not 0, the check fails. + +Solution +-------- + +Install Python before the upgrade. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_readiness.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_readiness.rst new file mode 100644 index 0000000..03e0274 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/node_readiness.rst @@ -0,0 +1,25 @@ +:original_name: cce_10_0455.html + +.. _cce_10_0455: + +Node Readiness +============== + +Check Item +---------- + +Check whether the nodes in the cluster are ready. + +Solution +-------- + +- **Scenario 1: The nodes are in the unavailable status.** + + Log in to the CCE console and access the cluster console. Choose **Nodes** in the navigation pane and filter out unavailable nodes, rectify the faulty nodes by referring to the suggestions provided by the console, and check again. + +- **Scenario 2: The displayed node status is inconsistent with the actual status.** + + The possible causes are as follows: + + #. The node status is normal on the nodes page, but the check result shows that the node is not ready. Check again. + #. The node is not found on the nodes page, but the check result shows that the node is in the cluster. Contact technical support. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/performing_pre-upgrade_check.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/performing_pre-upgrade_check.rst new file mode 100644 index 0000000..e64d1fd --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/performing_pre-upgrade_check.rst @@ -0,0 +1,104 @@ +:original_name: cce_10_0549.html + +.. _cce_10_0549: + +Performing Pre-upgrade Check +============================ + +The system performs a comprehensive pre-upgrade check before the cluster upgrade. If the cluster does not meet the pre-upgrade check conditions, the upgrade cannot continue. To prevent upgrade risks, you can perform pre-upgrade check according to the check items provided by this section. + +.. table:: **Table 1** Check items + + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Check Item | Description | + +=====================================================================+===========================================================================================================================================================================================================================+ + | :ref:`Checking the Node ` | - Check whether the node is available. | + | | - Check whether the node OS supports the upgrade. | + | | - Check whether there are unexpected node pool tags in the node. | + | | - Check whether the Kubernetes node name is consistent with the ECS name. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking the Blocklist ` | Check whether the current user is in the upgrade blocklist. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking the Add-on ` | - Check whether the add-on status is normal. | + | | - Check whether the add-on support the target version. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking the Helm Chart ` | Check whether the current HelmRelease record contains discarded Kubernetes APIs that are not supported by the target cluster version. If yes, the Helm chart may be unavailable after the upgrade. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking the Master Node SSH Connectivity ` | Check whether CCE can connect to your master nodes. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking the Node Pool ` | - Check the node status. | + | | - Check whether the auto scaling function of the node pool is disabled. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking the Security Group ` | Check whether the security group allows the master node to access nodes using ICMP. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`To-Be-Migrated Node ` | Check whether the node needs to be migrated. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Discarded Kubernetes Resource ` | Check whether there are discarded resources in the clusters. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Compatibility Risk ` | Read the version compatibility differences and ensure that they are not affected. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node CCEAgent Version ` | Check whether cce-agent on the current node is of the latest version. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node CPU Usage ` | Check whether the CPU usage of the node exceeds 90%. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`CRD Check ` | - Check whether the key CRD **packageversions.version.cce.io** of the cluster is deleted. | + | | - Check whether the cluster key CRD **network-attachment-definitions.k8s.cni.cncf.io** is deleted. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Disk ` | - Check whether the key data disks on the node meet the upgrade requirements. | + | | - Check whether the **/tmp** directory has 500 MB available space. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node DNS ` | - Check whether the DNS configuration of the current node can resolve the OBS address. | + | | - Check whether the current node can access the OBS address of the storage upgrade component package. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Key Directory File Permissions ` | Check whether the key directory **/var/paas** on the nodes contain files with abnormal owners or owner groups. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Kubelet ` | Check whether the kubelet on the node is running properly. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Memory ` | Check whether the memory usage of the node exceeds 90%. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Clock Synchronization Server ` | Check whether the clock synchronization server ntpd or chronyd of the node is running properly. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node OS ` | Check whether the OS kernel version of the node is supported by CCE. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node CPU Count ` | Check whether the number of CPUs on the master node is greater than 2. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Python Command ` | Check whether the Python commands are available on a node. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Readiness ` | Check whether the nodes in the cluster are ready. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node journald ` | Check whether journald of a node is normal. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`containerd.sock Check ` | Check whether the containerd.sock file exists on the node. This file affects the startup of container runtime in the Euler OS. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Internal Error ` | Before the upgrade, check whether an internal error occurs. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Mount Point ` | Check whether inaccessible mount points exist on the node. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Kubernetes Node Taint ` | Check whether the taint needed for cluster upgrade exists on the node. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`everest Restriction Check ` | Check whether the current everest add-on has compatibility restrictions. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`cce-hpa-controller Restriction Check ` | Check whether the current cce-controller-hpa add-on has compatibility restrictions. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Enhanced CPU Management Policy ` | Check whether the current cluster version and the target version support enhanced CPU policy. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`User Node Components Health ` | Check whether the container runtime and network components on the user node are healthy. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Controller Node Components Health ` | Check whether the Kubernetes, container runtime, and network components of the controller node are healthy. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Memory Resource Limit of Kubernetes Components ` | Check whether the resources of Kubernetes components, such as etcd and kube-controller-manager, exceed the upper limit. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking Deprecated Kubernetes APIs ` | Check whether the called API has been discarded in the target Kubernetes version. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`IPv6 Capability of a CCE Turbo Cluster ` | If IPv6 is enabled for a CCE Turbo cluster, check whether the target cluster version supports IPv6. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node NetworkManager ` | Check whether NetworkManager of a node is normal. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node ID File ` | Check the ID file format. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Configuration Consistency ` | When you upgrade a CCE cluster to v1.19 or later, the system checks whether the following configuration files have been modified in the background: | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Node Configuration File ` | Check whether the configuration files of key components exist on the node. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Checking CoreDNS Configuration Consistency ` | Check whether the current CoreDNS key configuration Corefile is different from the Helm release record. The difference may be overwritten during the add-on upgrade, **affecting domain name resolution in the cluster**. | + +---------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/to-be-migrated_node.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/to-be-migrated_node.rst new file mode 100644 index 0000000..247cdb3 --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/to-be-migrated_node.rst @@ -0,0 +1,28 @@ +:original_name: cce_10_0439.html + +.. _cce_10_0439: + +To-Be-Migrated Node +=================== + +Check Item +---------- + +Check whether the node needs to be migrated. + +Solution +-------- + +For the 1.15 cluster that is upgraded from 1.13 in rolling mode, you need to migrate (reset or create and replace) all nodes before performing the upgrade again. + +**Solution 1** + +Go the CCE console and access the cluster console. Choose **Nodes** in the navigation pane and click **More** > **Reset Node** in the **Operation** column of the corresponding node. For details, see :ref:`Resetting a Node `. After the node is reset, retry the check task. + +.. note:: + + Resetting a node will reset all node labels, which may affect workload scheduling. Before resetting a node, check and retain the labels that you have manually added to the node. + +**Solution 2** + +After creating a node, delete the faulty node. diff --git a/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/user_node_components_health.rst b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/user_node_components_health.rst new file mode 100644 index 0000000..8d972ec --- /dev/null +++ b/umn/source/clusters/upgrading_a_cluster/troubleshooting_for_pre-upgrade_check_exceptions/user_node_components_health.rst @@ -0,0 +1,16 @@ +:original_name: cce_10_0484.html + +.. _cce_10_0484: + +User Node Components Health +=========================== + +Check Item +---------- + +Check whether the container runtime and network components on the user node are healthy. + +Solution +-------- + +If a component is abnormal, log in to the node to check the status of the abnormal component and rectify the fault. diff --git a/umn/source/clusters/upgrading_a_cluster/upgrade_overview.rst b/umn/source/clusters/upgrading_a_cluster/upgrade_overview.rst index c5efbf0..7664f48 100644 --- a/umn/source/clusters/upgrading_a_cluster/upgrade_overview.rst +++ b/umn/source/clusters/upgrading_a_cluster/upgrade_overview.rst @@ -32,19 +32,27 @@ The following table describes the target version to which each cluster version c .. table:: **Table 1** Cluster upgrade paths and impacts - +-----------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Source Version | Target Version | Upgrade Modes | Impacts | - +=================+=================+==================+================================================================================================================================================================+ - | v1.19 | v1.21 | In-place upgrade | You need to learn about the differences between versions. For details, see :ref:`Precautions for Major Version Upgrade `. | - +-----------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | v1.17 | v1.19 | In-place upgrade | You need to learn about the differences between versions. For details, see :ref:`Precautions for Major Version Upgrade `. | - | | | | | - | v1.15 | | | | - +-----------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | v1.13 | v1.15 | Rolling upgrade | - **proxy** in the coredns add-on cannot be configured and needs to be replaced with **forward**. | - | | | | - The storage add-on is changed from storage-driver to everest. | - | | | Replace upgrade | | - +-----------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +-----------------------+-----------------------+-----------------------+ + | Source Version | Target Version | Upgrade Modes | + +=======================+=======================+=======================+ + | v1.23 | v1.25 | In-place upgrade | + +-----------------------+-----------------------+-----------------------+ + | v1.21 | v1.25 | In-place upgrade | + | | | | + | | v1.23 | | + +-----------------------+-----------------------+-----------------------+ + | v1.19 | v1.23 | In-place upgrade | + | | | | + | | v1.21 | | + +-----------------------+-----------------------+-----------------------+ + | v1.17 | v1.19 | In-place upgrade | + | | | | + | v1.15 | | | + +-----------------------+-----------------------+-----------------------+ + | v1.13 | v1.15 | Rolling upgrade | + | | | | + | | | Replace upgrade | + +-----------------------+-----------------------+-----------------------+ Upgrade Modes ------------- @@ -68,56 +76,3 @@ The upgrade processes are the same for master nodes. The differences between the +----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | **Replace upgrade** | The latest worker node image is used to reset the node OS. | This is the fastest upgrade mode and requires few manual interventions. | Data or configurations on the node will be lost, and services will be interrupted for a period of time. | +----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - -.. _cce_10_0197__section191131551162610: - -Precautions for Major Version Upgrade -------------------------------------- - -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Upgrade Path | Difference | Self-Check | -+=======================+==============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ -| v1.19 to v1.21 | The bug of **exec probe timeouts** is fixed in Kubernetes 1.21. Before this bug fix, the exec probe does not consider the **timeoutSeconds** field. Instead, the probe will run indefinitely, even beyond its configured deadline. It will stop until the result is returned. If this field is not specified, the default value **1** is used. This field takes effect after the upgrade. If the probe runs over 1 second, the application health check may fail and the application may restart frequently. | Before the upgrade, check whether the timeout is properly set for the exec probe. | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| | kube-apiserver of CCE 1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly. | Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server. | -| | | | -| | Root cause: X.509 `CommonName `__ is discarded in Go 1.15. kube-apiserver of CCE 1.19 is compiled using Go 1.15. If your webhook certificate does not have SANs, kube-apiserver does not process the **CommonName** field of the X.509 certificate as the host name by default. As a result, the authentication fails. | - If you do not have your own webhook server, you can skip this check. | -| | | - If the field is not set, you are advised to use the SAN field to specify the IP address and domain name supported by the certificate. | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| v1.15 to v1.19 | The control plane of CCE 1.19 is incompatible with Kubelet 1.15. If the master node fails to be upgraded or the node to be upgraded restarts after the master node is successfully upgraded, there is a high probability that the node is in the **NotReady** status. | #. In normal cases, this scenario is not triggered. | -| | | #. After the master node is upgraded, do not suspend the upgrade. Upgrade the node quickly. | -| | There is a high probability that kubelet restarts on the node that fails to be upgraded, triggering the node registration process. The default registration labels of kubelet 1.15 (**failure-domain.beta.kubernetes.io/is-baremetal** and **kubernetes.io/availablezone**) are regarded as an invalid label by kube-apiserver 1.19. | #. If a node fails to be upgraded and cannot be restored, evict applications on the node as soon as possible. Contact technical support and skip the node upgrade. After the upgrade is complete, reset the node. | -| | | | -| | The valid labels in v1.19 are **node.kubernetes.io/baremetal** and **failure-domain.beta.kubernetes.io/zone**. | | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| | In CCE 1.15 and 1.19 clusters, the Docker storage driver file system is switched from XFS to Ext4. As a result, the import package sequence in the pods of the upgraded Java application may be abnormal, causing pod exceptions. | Before the upgrade, check the Docker configuration file **/etc/docker/daemon.json** on the node. Check whether the value of **dm.fs** is **xfs**. | -| | | | -| | | - If the value is **ext4** or the storage driver is Overlay, you can skip the next steps. | -| | | - If the value is **xfs**, you are advised to deploy applications in the cluster of the new version in advance to test whether the applications are compatible with the new cluster version. | -| | | | -| | | .. code-block:: | -| | | | -| | | { | -| | | "storage-driver": "devicemapper", | -| | | "storage-opts": [ | -| | | "dm.thinpooldev=/dev/mapper/vgpaas-thinpool", | -| | | "dm.use_deferred_removal=true", | -| | | "dm.fs=xfs", | -| | | "dm.use_deferred_deletion=true" | -| | | ] | -| | | } | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| | kube-apiserver of CCE 1.19 or later requires that the Subject Alternative Names (SANs) field be configured for the certificate of your webhook server. Otherwise, kube-apiserver fails to call the webhook server after the upgrade, and containers cannot be started properly. | Before the upgrade, check whether the SAN field is configured in the certificate of your webhook server. | -| | | | -| | Root cause: X.509 `CommonName `__ is discarded in Go 1.15. kube-apiserver of CCE 1.19 is compiled using Go 1.15. The **CommonName** field is processed as the host name. As a result, the authentication fails. | - If you do not have your own webhook server, you can skip this check. | -| | | - If the field is not set, you are advised to use the SAN field to specify the IP address and domain name supported by the certificate. | -| | | | -| | | .. important:: | -| | | | -| | | NOTICE: | -| | | To mitigate the impact of version differences on cluster upgrade, CCE performs special processing during the upgrade from 1.15 to 1.19 and still supports certificates without SANs. However, no special processing is required for subsequent upgrades. You are advised to rectify your certificate as soon as possible. | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| | In clusters of v1.17.17 and later, CCE automatically creates pod security policies (PSPs) for you, which restrict the creation of pods with unsafe configurations, for example, pods for which **net.core.somaxconn** under a sysctl is configured in the security context. | After an upgrade, you can allow insecure system configurations as required. For details, see :ref:`Configuring a Pod Security Policy `. | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| v1.13 to v1.15 | After a VPC network cluster is upgraded, the master node occupies an extra CIDR block due to the upgrade of network components. If no container CIDR block is available for the new node, the pod scheduled to the node cannot run. | Generally, this problem occurs when the nodes in the cluster are about to fully occupy the container CIDR block. For example, the container CIDR block is 10.0.0.0/16, the number of available IP addresses is 65,536, and the VPC network is allocated a CIDR block with the fixed size (using the mask to determine the maximum number of container IP addresses allocated to each node). If the upper limit is 128, the cluster supports a maximum of 512 (65536/128) nodes, including the three master nodes. After the cluster is upgraded, each of the three master nodes occupies one CIDR block. As a result, 506 nodes are supported. | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/umn/source/getting_started/creating_a_deployment_nginx_from_an_image.rst b/umn/source/getting_started/creating_a_deployment_nginx_from_an_image.rst index 5e9296f..6ac3750 100644 --- a/umn/source/getting_started/creating_a_deployment_nginx_from_an_image.rst +++ b/umn/source/getting_started/creating_a_deployment_nginx_from_an_image.rst @@ -48,6 +48,7 @@ The following is the procedure for creating a containerized workload from a cont - **Workload Type**: Select **Deployment**. - **Workload Name**: Set it to **nginx**. + - **Namespace**: Select **default**. - **Pods**: Set the quantity of pods to **1**. **Container Settings** @@ -59,9 +60,9 @@ The following is the procedure for creating a containerized workload from a cont Click the plus sign (+) to create a Service for accessing the workload from an external network. In this example, create a LoadBalancer Service. Set the following parameters: - **Service Name**: name of the Service exposed to external networks. In this example, the Service name is **nginx**. - - **Access Type**: Select **LoadBalancer**. + - **Service Type**: Select **LoadBalancer**. - **Service Affinity**: Retain the default value. - - **Load Balancer**: If a load balancer is available, select an existing load balancer. If not, click **Create Load Balancer** to create one on the ELB console. + - **Load Balancer**: If a load balancer is available, select an existing load balancer. If not, choose **Auto create** to create one on the ELB console. - **Port**: - **Protocol**: Select **TCP**. diff --git a/umn/source/getting_started/creating_a_kubernetes_cluster.rst b/umn/source/getting_started/creating_a_kubernetes_cluster.rst index ab978c7..5393d63 100644 --- a/umn/source/getting_started/creating_a_kubernetes_cluster.rst +++ b/umn/source/getting_started/creating_a_kubernetes_cluster.rst @@ -51,7 +51,9 @@ Creating a Cluster +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | \* Master Node Subnet | Subnet where master nodes of the cluster are located. | +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | \*Container Network Segment | Retain the default value. | + | \* Container CIDR Block | Retain the default value. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | \* IPv4 Service CIDR Block | CIDR block for Services used by containers in the same cluster to access each other. The value determines the maximum number of Services you can create. The value cannot be changed after creation. | +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ #. Click **Next: Add-on Configuration**. Retain the default settings. @@ -80,13 +82,10 @@ After a cluster is created, you need to create nodes in the cluster to run workl - **AZ**: Retain the default value. - **Node Type**: Select **Elastic Cloud Server (VM)**. - **Specifications**: Select node specifications that fit your business needs. + - **Container Engine**: Select a container engine as required. - **OS**: Select the operating system (OS) of the nodes to be created. - **Node Name**: Enter a node name. - - **Login Mode**: Use a password or key pair to log in to the node. - - - If the login mode is **Password**: The default username is **root**. Enter the password for logging to the node and confirm the password. - - Please remember the node login password. If you forget the password, the system is unable to retrieve your password and you will have to reset the password. + - **Login Mode**: - If the login mode is **Key pair**, select a key pair for logging to the node and select the check box to acknowledge that you have obtained the key file and without this file you will not be able to log in to the node. @@ -101,6 +100,8 @@ After a cluster is created, you need to create nodes in the cluster to run workl - **VPC**: Use the default value, that is, the subnet selected during cluster creation. - **Node Subnet**: Select a subnet in which the node runs. + - **Node IP**: IP address of the specified node. + - **EIP**: The default value is **Do not use**. You can select **Use existing** and **Auto create**. #. At the bottom of the page, select the node quantity, and click **Next: Confirm**. diff --git a/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_1_create_a_mysql_workload.rst b/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_1_create_a_mysql_workload.rst index d3da0e0..83e3bc6 100644 --- a/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_1_create_a_mysql_workload.rst +++ b/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_1_create_a_mysql_workload.rst @@ -48,7 +48,7 @@ Creating a MySQL Workload Click the plus sign (+) to create a Service for accessing MySQL from WordPress. - Select **ClusterIP** for **Access Type**, set **Service Name** to **mysql**, set both the **Container Port** and **Service Port** to **3306**, and click **OK**. + Select **ClusterIP** for **Service Type**, set **Service Name** to **mysql**, set both the **Container Port** and **Service Port** to **3306**, and click **OK**. The default access port in the MySQL image is 3306. In this example, both the container port and Service port are set to **3306** for convenience. The access port can be changed to another port. diff --git a/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_2_create_a_wordpress_workload.rst b/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_2_create_a_wordpress_workload.rst index 62ae677..2982f2b 100644 --- a/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_2_create_a_wordpress_workload.rst +++ b/umn/source/getting_started/deploying_wordpress_and_mysql_that_depend_on_each_other/step_2_create_a_wordpress_workload.rst @@ -54,9 +54,9 @@ Creating a WordPress Blog Website Click the plus sign (+) to create a Service for accessing the workload from an external network. In this example, create a LoadBalancer Service. Set the following parameters: - **Service Name**: name of the Service exposed to external networks. In this example, the Service name is **wordpress**. - - **Access Type**: Select **LoadBalancer**. + - **Service Type**: Select **LoadBalancer**. - **Service Affinity**: Retain the default value. - - **Load Balancer**: If a load balancer is available, select an existing load balancer. If not, click **Create Load Balancer** to create one on the ELB console. + - **Load Balancer**: If a load balancer is available, select an existing load balancer. If not, choose **Auto create** to create one on the ELB console. - **Port**: - **Protocol**: Select **TCP**. diff --git a/umn/source/getting_started/introduction.rst b/umn/source/getting_started/introduction.rst index fa2bff7..7c677fb 100644 --- a/umn/source/getting_started/introduction.rst +++ b/umn/source/getting_started/introduction.rst @@ -18,9 +18,9 @@ Complete the following tasks to get started with CCE. **Figure 1** Procedure for getting started with CCE -#. **Register a Huawei Cloud account and grant permissions to IAM users.** +#. **Register an account and grant permissions to IAM users.** - Huawei Cloud accounts have the permissions to use CCE. However, IAM users created by a Huawei Cloud account do not have the permission. You need to manually grant the permission to IAM users. For details, see . + An account has the permissions to use CCE. However, IAM users created by an account do not have the permission. You need to manually grant the permission to IAM users. For details, see . #. **Create a cluster.** @@ -54,6 +54,6 @@ FAQs #. **How can I allow multiple workloads in the same cluster to access each other?** - Select the access type ClusterIP, which allows workloads in the same cluster to use their cluster-internal domain names to access each other. + Set **Service Type** to **ClusterIP**, which allows workloads in the same cluster to use their cluster-internal domain names to access each other. Cluster-internal domain names are in the format of ..svc.cluster.local:. For example, nginx.default.svc.cluster.local:80. diff --git a/umn/source/getting_started/preparations.rst b/umn/source/getting_started/preparations.rst index 4d62cbe..0a7c2bc 100644 --- a/umn/source/getting_started/preparations.rst +++ b/umn/source/getting_started/preparations.rst @@ -17,7 +17,7 @@ Before using CCE, you need to make the following preparations: Creating an IAM user -------------------- -If you want to allow multiple users to manage your resources without sharing your password or keys, you can create users using IAM and grant permissions to the users. These users can use specified links and their own accounts to access Huawei Cloud and help you manage resources efficiently. You can also configure account security policies to ensure the security of these accounts. +If you want to allow multiple users to manage your resources without sharing your password or keys, you can create users using IAM and grant permissions to the users. These users can use specified links and their own accounts to access the cloud and manage resources efficiently. You can also configure account security policies to ensure the security of these accounts. Your accounts have the permissions to use CCE. However, IAM users created by your accounts do not have the permissions. You need to manually assign the permissions to IAM users. diff --git a/umn/source/high-risk_operations_and_solutions.rst b/umn/source/high-risk_operations_and_solutions.rst index 9372656..44a5f1d 100644 --- a/umn/source/high-risk_operations_and_solutions.rst +++ b/umn/source/high-risk_operations_and_solutions.rst @@ -15,7 +15,7 @@ Clusters and Nodes +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Category | Operation | Impact | Solution | +=================+=======================================================================================================+======================================================================================================================================================================================================================================================================================+===================================================================================================================================================+ - | Master node | Modifying the security group of a node in a cluster | The master node may be unavailable. | Restore the security group by referring to :ref:`Creating a CCE Cluster ` and allow traffic from the security group to pass through. | + | Master node | Modifying the security group of a node in a cluster | The master node may be unavailable. | Restore the security group by referring to the security group of the new cluster and allow traffic from the security group to pass through. | | | | | | | | | .. note:: | | | | | | | diff --git a/umn/source/networking/container_network_models/overview.rst b/umn/source/networking/container_network_models/overview.rst index 491b6a9..8f1bf08 100644 --- a/umn/source/networking/container_network_models/overview.rst +++ b/umn/source/networking/container_network_models/overview.rst @@ -7,7 +7,7 @@ Overview The container network assigns IP addresses to pods in a cluster and provides networking services. In CCE, you can select the following network models for your cluster: -- :ref:`Tunnel network ` +- :ref:`Container tunnel network ` - :ref:`VPC network ` - :ref:`Cloud Native Network 2.0 ` @@ -25,7 +25,7 @@ Network Model Comparison .. table:: **Table 1** Network model comparison +------------------------+-----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - | Dimension | Tunnel Network | VPC Network | Cloud Native Network 2.0 | + | Dimension | Container Tunnel Network | VPC Network | Cloud Native Network 2.0 | +========================+===================================================================================================================================+======================================================================================================================================================+============================================================================================================+ | Application scenarios | - Common container service scenarios | - Scenarios that have high requirements on network latency and bandwidth | - Scenarios that have high requirements on network latency, bandwidth, and performance | | | - Scenarios that do not have high requirements on network latency and bandwidth | - Containers can communicate with VMs using a microservice registration framework, such as Dubbo and CSE. | - Containers can communicate with VMs using a microservice registration framework, such as Dubbo and CSE. | diff --git a/umn/source/networking/index.rst b/umn/source/networking/index.rst index 8f25e95..b8697dc 100644 --- a/umn/source/networking/index.rst +++ b/umn/source/networking/index.rst @@ -13,7 +13,6 @@ Networking - :ref:`Configuring Intra-VPC Access ` - :ref:`Accessing Public Networks from a Container ` - :ref:`Network Policies ` -- :ref:`NetworkAttachmentDefinition ` - :ref:`Host Network ` .. toctree:: @@ -28,5 +27,4 @@ Networking configuring_intra-vpc_access accessing_public_networks_from_a_container network_policies - networkattachmentdefinition host_network diff --git a/umn/source/networking/network_policies.rst b/umn/source/networking/network_policies.rst index 5ad3576..c31529a 100644 --- a/umn/source/networking/network_policies.rst +++ b/umn/source/networking/network_policies.rst @@ -5,9 +5,7 @@ Network Policies ================ -NetworkPolicy is a Kubernetes object used to restrict pod access. In CCE, by setting network policies, you can define ingress rules specifying the addresses to access pods or egress rules specifying the addresses pods can access. This is equivalent to setting up a firewall at the application layer to further ensure network security. - -Network policies depend on the networking add-on of the cluster to which the policies apply. +Network policies are designed by Kubernetes to restrict pod access. It is equivalent to a firewall at the application layer to enhance network security. The capabilities supported by network policies depend on the capabilities of the network add-ons of the cluster. By default, if a namespace does not have any policy, pods in the namespace accept traffic from any source and send traffic to any destination. @@ -20,28 +18,35 @@ Network policy rules are classified into the following types: Notes and Constraints --------------------- -- Only clusters that use the tunnel network model support network policies. +- Only clusters that use the tunnel network model support network policies. Network policies are classified into the following types: + + - Ingress: All versions support this type. + + - Egress: Only clusters of v1.23 or later support egress rules. + + Egress rules are supported only in the following OSs: + + +-----------------------------------+-------------------------------------------+ + | OS | Verified Kernel Version | + +===================================+===========================================+ + | CentOS | 3.10.0-1062.18.1.el7.x86_64 | + | | | + | | 3.10.0-1127.19.1.el7.x86_64 | + | | | + | | 3.10.0-1160.25.1.el7.x86_64 | + | | | + | | 3.10.0-1160.76.1.el7.x86_64 | + +-----------------------------------+-------------------------------------------+ + | EulerOS 2.5 | 3.10.0-862.14.1.5.h591.eulerosv2r7.x86_64 | + | | | + | | 3.10.0-862.14.1.5.h687.eulerosv2r7.x86_64 | + +-----------------------------------+-------------------------------------------+ + | EulerOS 2.9 | 4.18.0-147.5.1.6.h541.eulerosv2r9.x86_64 | + | | | + | | 4.18.0-147.5.1.6.h766.eulerosv2r9.x86_64 | + +-----------------------------------+-------------------------------------------+ - Network isolation is not supported for IPv6 addresses. - -- Network policies only allow clusters of v1.23 or later to set ingress and egress rules. Egress rules cannot be set on clusters of other versions. - - Egress rules are supported only in the following operating systems: - - +-----------------------------------+-------------------------------------------+ - | OS | Kernel Version | - +===================================+===========================================+ - | CentOS | 3.10.0-1062.18.1.el7.x86_64 | - | | | - | | 3.10.0-1127.19.1.el7.x86_64 | - | | | - | | 3.10.0-1160.25.1.el7.x86_64 | - +-----------------------------------+-------------------------------------------+ - | EulerOS 2.5 | 3.10.0-862.14.1.5.h591.eulerosv2r7.x86_64 | - +-----------------------------------+-------------------------------------------+ - | EulerOS 2.9 | 4.18.0-147.5.1.6.h541.eulerosv2r9.x86_64 | - +-----------------------------------+-------------------------------------------+ - - If a cluster is upgraded to v1.23 in in-place mode, you cannot use egress rules because the node OS is not upgraded. In this case, reset the node. Using Ingress Rules diff --git a/umn/source/networking/networkattachmentdefinition.rst b/umn/source/networking/networkattachmentdefinition.rst deleted file mode 100644 index f85b33a..0000000 --- a/umn/source/networking/networkattachmentdefinition.rst +++ /dev/null @@ -1,190 +0,0 @@ -:original_name: cce_10_0196.html - -.. _cce_10_0196: - -NetworkAttachmentDefinition -=========================== - -Scenario --------- - -In a CCE Turbo cluster, you can set the subnet and security group for a container by namespace using NetworkAttachmentDefinition, a `CRD `__ resource in the cluster. After NetworkAttachmentDefinition is configured for a namespace, pods in the namespace support the following functions: - -- Binding a container with a subnet: The pod IP address is restricted in a specific CIDR block. Different namespaces can be isolated from each other. -- Binding a container with a security group: Security group rules can be set for pods in the same namespace to customize access policies. - -Constraints ------------ - -- NetworkAttachmentDefinition is available only in CCE Turbo clusters of v1.23.8-r0, v1.25.3-r0, and later. -- Only **default-network** supports ENI preheating. User-defined container subnets do not support ENI preheating. If ENI preheating is not enabled, workload instance creation slows down. Therefore, this function is not applicable to high-performance pod creation scenarios. -- To delete a NetworkAttachmentDefinition, delete pods (with the annotation named **cni.yangtse.io/network-status**) created using the configuration in the corresponding namespace first. For details, see :ref:`Deleting a Network Configuration `. - -Using the CCE Console ---------------------- - -#. Log in to the CCE console. -#. Click the cluster name to access the cluster console. Choose **System Configuration** in the navigation pane and click the **Network Configuration** tab. - - .. note:: - - Each cluster has a **default-network** for namespaces with no container subnets. The default container subnet displayed in the network information on the networking configuration area is the container subnet in **default-network**. The **default-network** cannot be deleted. - -#. Click **Create Network Configurations** in the upper right corner. Configure the basic parameters in the displayed dialog box. - - - **Name**: Enter a user-defined name. The name can contain a maximum of 253 characters. Do not use **default-network**, **default**, **mgnt0**, and **mgnt1**. - - **Namespace**: Select a namespace. The namespaces of different configurations must be unique. - - **Subnet**: Select a subnet. If no subnet is available, click **Create Subnet** to create a subnet. After the subnet is created, click the refresh button. A maximum of 20 subnets can be selected. - - **Associate Security Group**: The default value is the container ENI security group. You can also click **Create Security Group** to create one. After the security group is created, click the refresh button. - -#. Click **Create**. After the creation is complete, you will be redirected to the network configuration list. You can see that the newly added subnet is in the list. - -Using kubectl -------------- - -This section describes how to create an NAD using kubectl. - -#. Use kubectl to connect to the cluster. For details, see :ref:`Connecting to a Cluster Using kubectl `. - -#. Modify the **networkattachment-test.yaml** file. - - **vi networkattachment-test.yaml** - - .. code-block:: - - apiVersion: k8s.cni.cncf.io/v1 - kind: NetworkAttachmentDefinition - metadata: - annotations: - yangtse.io/project-id: 05e38** - name: example - namespace: kube-system - spec: - config: ' - { - "type":"eni-neutron", - "args":{ - "securityGroups":"41891**", - "subnets":[ - { - "subnetID":"27d95**" - } - ] - }, - "selector":{ - "namespaceSelector":{ - "matchLabels":{ - "kubernetes.io/metadata.name":"default" - } - } - } - }' - - .. table:: **Table 1** Key parameters - - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - | Parameter | Mandatory | Type | Description | - +=======================+===========+=====================================================================================+==========================================================================================+ - | apiVersion | Yes | String | API version. The value is fixed at **k8s.cni.cncf.io/v1**. | - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - | kind | Yes | String | Type of the object to be created. The value is fixed at **NetworkAttachmentDefinition**. | - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - | yangtse.io/project-id | Yes | String | Project ID. | - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - | name | Yes | String | Configuration item name. | - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - | namespace | Yes | String | Namespace of the configuration resource. The value is fixed to **kube-system**. | - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - | config | Yes | :ref:`Table 2 ` object | Configuration content, which is a string in JSON format. | - +-----------------------+-----------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+ - - .. _cce_10_0196__en-us_topic_0000001199021176_table452992692116: - - .. table:: **Table 2** config parameters - - +-----------------+-----------------+-------------------------------------------------------------------------------------+----------------------------------------------------+ - | Parameter | Mandatory | Type | Description | - +=================+=================+=====================================================================================+====================================================+ - | type | Yes | String | The value is fixed at **eni-neutron**. | - +-----------------+-----------------+-------------------------------------------------------------------------------------+----------------------------------------------------+ - | args | No | :ref:`Table 3 ` | Configuration parameters. | - | | | | | - | | | object | | - +-----------------+-----------------+-------------------------------------------------------------------------------------+----------------------------------------------------+ - | selector | No | :ref:`Table 4 ` object | Namespace on which the configuration takes effect. | - +-----------------+-----------------+-------------------------------------------------------------------------------------+----------------------------------------------------+ - - .. _cce_10_0196__en-us_topic_0000001199021176_table1253012616211: - - .. table:: **Table 3** args parameters - - +-----------------+-----------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Mandatory | Type | Description | - +=================+=================+===========================+=========================================================================================================================================================================================+ - | securityGroups | No | String | Security group ID. If no security group is planned, select the same security group as that in **default-network**. | - | | | | | - | | | | Obtaining the value: | - | | | | | - | | | | Log in to the VPC console. In the navigation pane, choose **Access Control** > **Security Groups**. Click the target security group name and copy the ID on the **Summary** tab page. | - +-----------------+-----------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | subnets | Yes | Array of subnetID Objects | List of container subnet IDs. At least one subnet ID must be entered. The format is as follows: | - | | | | | - | | | | .. code-block:: | - | | | | | - | | | | [{"subnetID":"27d95**"},{"subnetID":"827bb**"},{"subnetID":"bdd6b**"}] | - | | | | | - | | | | Subnet ID not used by the cluster in the same VPC. | - | | | | | - | | | | Obtaining the value: | - | | | | | - | | | | Log in to the VPC console. In the navigation pane, choose **Virtual Private Cloud** > **Subnets**. Click the target subnet name and copy the **Subnet ID** on the **Summary** tab page. | - +-----------------+-----------------+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - - .. _cce_10_0196__en-us_topic_0000001199021176_table696412574307: - - .. table:: **Table 4** selector parameters - - +-------------------+-----------------+--------------------+------------------------------------------------------------------------------------------------+ - | Parameter | Mandatory | Type | Description | - +===================+=================+====================+================================================================================================+ - | namespaceSelector | No | matchLabels Object | A Kubernetes standard selector. You need to enter the namespace label in the following format: | - | | | | | - | | | | .. code-block:: | - | | | | | - | | | | "matchLabels":{ | - | | | | "kubernetes.io/metadata.name":"default" | - | | | | } | - | | | | | - | | | | Namespaces of different configurations cannot overlap. | - +-------------------+-----------------+--------------------+------------------------------------------------------------------------------------------------+ - -#. Create a NetworkAttachmentDefinition. - - **kubectl create -f networkattachment-test.yaml** - - If information similar to the following is displayed, the NetworkAttachmentDefinition has been created. - - .. code-block:: - - networkattachmentdefinition.k8s.cni.cncf.io/example created - -.. _cce_10_0196__en-us_topic_0000001199021176_section2314125415245: - -Deleting a Network Configuration --------------------------------- - -You can delete the new network configuration or view its YAML file. - -.. note:: - - Before deleting a network configuration, delete the container corresponding to the configuration. Otherwise, the deletion fails. - - #. Run the following command to filter the pod that uses the configuration in the cluster (**example** is an example configuration name and you should replace it): - - .. code-block:: - - kubectl get po -A -o=jsonpath="{.items[?(@.metadata.annotations.cni\.yangtse\.io/network-status=='[{\"name\":\"example\"}]')]['metadata.namespace', 'metadata.name']}" - - The command output contains the pod name and namespace associated with the configuration. - - #. Delete the owner of the pod. The owner may be a Deployment, StatefulSet, DaemonSet, or Job. diff --git a/umn/source/networking/services/configuring_health_check_for_multiple_ports.rst b/umn/source/networking/services/configuring_health_check_for_multiple_ports.rst index 66fc911..914eac4 100644 --- a/umn/source/networking/services/configuring_health_check_for_multiple_ports.rst +++ b/umn/source/networking/services/configuring_health_check_for_multiple_ports.rst @@ -15,6 +15,7 @@ Constraints - v1.19: v1.19.16-r5 or later - v1.21: v1.21.8-r0 or later - v1.23: v1.23.6-r0 or later + - v1.25: v1.25.2-r0 or later - **kubernetes.io/elb.health-check-option** and **kubernetes.io/elb.health-check-options** cannot be configured at the same time. - The **target_service_port** field is mandatory and must be unique. diff --git a/umn/source/networking/services/loadbalancer.rst b/umn/source/networking/services/loadbalancer.rst index 91d455a..ee1859c 100644 --- a/umn/source/networking/services/loadbalancer.rst +++ b/umn/source/networking/services/loadbalancer.rst @@ -631,7 +631,7 @@ This is because when the LoadBalancer Service is created, kube-proxy adds the EL When the value of **externalTrafficPolicy** is **Local**, the situation varies according to the container network model and service forwarding mode. The details are as follows: +---------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------+-------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+ -| Server | Client | Tunnel Network Cluster (IPVS) | VPC Network Cluster (IPVS) | Tunnel Network Cluster (iptables) | VPC Network Cluster (iptables) | +| Server | Client | Container Tunnel Network Cluster (IPVS) | VPC Network Cluster (IPVS) | Container Tunnel Network Cluster (iptables) | VPC Network Cluster (iptables) | +---------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------+-------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+ | NodePort Service | Same node | OK. The node where the pod runs is accessible, not any other nodes. | OK. The node where the pod runs is accessible. | OK. The node where the pod runs is accessible. | OK. The node where the pod runs is accessible. | +---------------------------------------------------------------------------+-----------------------------+---------------------------------------------------------------------+-------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+ diff --git a/umn/source/node_pools/creating_a_node_pool.rst b/umn/source/node_pools/creating_a_node_pool.rst index ccb6e46..82f34aa 100644 --- a/umn/source/node_pools/creating_a_node_pool.rst +++ b/umn/source/node_pools/creating_a_node_pool.rst @@ -93,12 +93,10 @@ Procedure | Node Type | CCE cluster: | | | | | | - ECS (VM): Containers run on ECSs. | - | | - ECS (physical): Containers run on servers using the QingTian architecture. | | | | | | CCE Turbo cluster: | | | | | | - ECS (VM): Containers run on ECSs. Only Trunkport ECSs (models that can be bound with multiple elastic network interfaces (ENIs)) are supported. | - | | - ECS (physical): Containers run on servers using the QingTian architecture. | +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Container Engine | CCE clusters support Docker and containerd in some scenarios. | | | | diff --git a/umn/source/node_pools/managing_a_node_pool/index.rst b/umn/source/node_pools/managing_a_node_pool/index.rst index 9a240d8..39fe66d 100644 --- a/umn/source/node_pools/managing_a_node_pool/index.rst +++ b/umn/source/node_pools/managing_a_node_pool/index.rst @@ -10,210 +10,6 @@ Notes and Constraints The default node pool DefaultPool does not support the following management operations. -Editing a Node Pool -------------------- - -.. important:: - - - When editing the container engine, OS, resource tags, pre-installation and post-installation scripts, and data disk space allocation of the node pool. The modified configuration takes effect only for new nodes. To synchronize the configuration to the existing nodes, you need to manually reset the existing nodes. - - The modification of the system disk or data disk size of a node pool takes effect only for new nodes. The configuration cannot be synchronized even if the existing nodes are reset. - - Updates of kubernetes labels and taints are automatically synchronized to existing nodes. You do not need to reset nodes. - -#. Log in to the CCE console. - -#. Click the cluster name and access the cluster console. Choose **Nodes** in the navigation pane and click the **Node Pools** tab on the right. - -#. Click **Edit** next to the name of the node pool you will edit. Edit the parameters in the displayed **Edit Node Pool** page. - - **Basic Settings** - - .. table:: **Table 1** Basic settings - - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Description | - +===================================+=================================================================================================================================================================================================================================================================================================================================================================================================================================================+ - | Node Pool Name | Name of the node pool. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Auto Scaling | By default, this parameter is disabled. | - | | | - | | After you enable autoscaler by clicking |image1|, nodes in the node pool are automatically created or deleted based on service requirements. | - | | | - | | - **Maximum Nodes** and **Minimum Nodes**: You can set the maximum and minimum number of nodes to ensure that the number of nodes to be scaled is within a proper range. | - | | - **Priority**: A larger value indicates a higher priority. For example, if this parameter is set to **1** and **4** respectively for node pools A and B, B has a higher priority than A, and auto scaling is first triggered for B. If the priorities of multiple node pools are set to the same value, for example, **2**, the node pools are not prioritized and the system performs scaling based on the minimum resource waste principle. | - | | - **Cooldown Period**: Required. The unit is minute. This parameter indicates the interval between the previous scale-out action and the next scale-in action. | - | | | - | | If the **Autoscaler** field is set to on, install the :ref:`autoscaler add-on ` to use the autoscaler feature. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - - **Compute Settings** - - .. table:: **Table 2** Configuration parameters - - +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Description | - +===================================+====================================================================================================================================================================================================================================+ - | AZ | AZ where the node is located. Nodes in a cluster can be created in different AZs for higher reliability. The value cannot be changed after the node is created. | - | | | - | | You are advised to select **Random** to deploy your node in a random AZ based on the selected node flavor. | - | | | - | | An AZ is a physical region where resources use independent power supply and networks. AZs are physically isolated but interconnected through an internal network. To enhance workload availability, create nodes in different AZs. | - | | | - | | .. note:: | - | | | - | | The modification of AZ configuration takes effect only for new nodes. AZs of existing nodes cannot be modified. | - +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Node Type | CCE cluster: | - | | | - | | - ECS (VM): Containers run on ECSs. | - | | - ECS (physical): Containers run on servers using the QingTian architecture. | - | | | - | | CCE Turbo cluster: | - | | | - | | - ECS (VM): Containers run on ECSs. Only Trunkport ECSs (models that can be bound with multiple elastic network interfaces (ENIs)) are supported. | - | | - ECS (physical): Containers run on servers using the QingTian architecture. | - | | | - | | .. note:: | - | | | - | | This setting cannot be modified now. | - +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Container Engine | CCE clusters support Docker and containerd in some scenarios. | - | | | - | | - VPC network clusters of v1.23 and later versions support containerd. Container tunnel network clusters of v1.23.2-r0 and later versions support containerd. | - | | - For a CCE Turbo cluster, both **Docker** and **containerd** are supported. For details, see :ref:`Mapping between Node OSs and Container Engines `. | - | | | - | | .. note:: | - | | | - | | After the container engine is modified, the modification automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the modification to take effect. | - +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Specifications | Select node specifications that best fit your business needs. | - +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | OS | Select an OS type. Different types of nodes support different OSs. For details, see :ref:`Supported Node Specifications `. | - | | | - | | **Public image**: Select an OS for the node. | - | | | - | | **Private image**: Private images are supported. | - | | | - | | .. note:: | - | | | - | | After the OS is modified, the modification automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the modification to take effect. | - +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - - **Storage Settings** - - .. note:: - - - After the data disk space allocation of a node is modified, the modification automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the modification to take effect. - - The modification of the disk type and size of the system disk or data disk takes effect only for new nodes. The configuration cannot be synchronized even if the existing nodes are reset. - - .. table:: **Table 3** Parameters for storage settings - - +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Description | - +===================================+===============================================================================================================================================================================================================================================================================================+ - | System Disk | System disk used by the node OS. The value ranges from 40 GB to 1,024 GB. The default value is 50 GB. | - | | | - | | **Encryption**: Data disk encryption safeguards your data. Snapshots generated from encrypted disks and disks created using these snapshots automatically inherit the encryption function. **This function is available only in certain regions.** | - | | | - | | - **Encryption** is not selected by default. | - | | - After you select **Encryption**, you can select an existing key in the displayed dialog box. If no key is available, click **View Key List** to create a key. After the key is created, click the refresh icon. | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Data Disk | **At least one data disk is required** for the container runtime and kubelet. **The data disk cannot be deleted or uninstalled. Otherwise, the node will be unavailable.** | - | | | - | | - First data disk: used for container runtime and kubelet components. The value ranges from 20 GB to 32,768 GB. The default value is 100 GB. | - | | - Other data disks: You can set the data disk size to a value ranging from 10 GB to 32,768 GB. The default value is 100 GB. | - | | | - | | **Advanced Settings** | - | | | - | | Click **Expand** to set the following parameters: | - | | | - | | - **Allocate Disk Space**: Select this option to define the disk space occupied by the container runtime to store the working directories, container image data, and image metadata. For details about how to allocate data disk space, see :ref:`Data Disk Space Allocation `. | - | | - **Encryption**: Data disk encryption safeguards your data. Snapshots generated from encrypted disks and disks created using these snapshots automatically inherit the encryption function. **This function is available only in certain regions.** | - | | | - | | - **Encryption** is not selected by default. | - | | - After you select **Encryption**, you can select an existing key in the displayed dialog box. If no key is available, click **View Key List** to create a key. After the key is created, click the refresh icon. | - | | | - | | **Adding Multiple Data Disks** | - | | | - | | A maximum of four data disks can be added. By default, raw disks are created without any processing. You can also click **Expand** and select any of the following options: | - | | | - | | - **Default**: By default, a raw disk is created without any processing. | - | | - **Mount Disk**: The data disk is attached to a specified directory. | - | | | - | | **Local Disk Description** | - | | | - | | If the node flavor is disk-intensive or ultra-high I/O, one data disk can be a local disk. | - | | | - | | Local disks may break down and do not ensure data reliability. It is recommended that you store service data in EVS disks, which are more reliable than local disks. | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - - **Advanced Settings** - - .. table:: **Table 4** Advanced settings - - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Description | - +===================================+================================================================================================================================================================================================================================================================+ - | Kubernetes Label | Click **Add Label** to set the key-value pair attached to the Kubernetes objects (such as pods). A maximum of 20 labels can be added. | - | | | - | | Labels can be used to distinguish nodes. With workload affinity settings, container pods can be scheduled to a specified node. For more information, see `Labels and Selectors `__. | - | | | - | | .. note:: | - | | | - | | After a **K8s label** is modified, the inventory nodes in the node pool are updated synchronously. | - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Resource Tag | You can add resource tags to classify resources. | - | | | - | | You can create **predefined tags** in Tag Management Service (TMS). Predefined tags are visible to all service resources that support the tagging function. You can use these tags to improve tagging and resource migration efficiency. | - | | | - | | CCE will automatically create the "CCE-Dynamic-Provisioning-Node=\ *node id*" tag. | - | | | - | | .. note:: | - | | | - | | After a **resource tag** is modified, the modification automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the modification to take effect. | - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Taint | This field is left blank by default. You can add taints to set anti-affinity for the node. A maximum of 10 taints are allowed for each node. Each taint contains the following parameters: | - | | | - | | - **Key**: A key must contain 1 to 63 characters starting with a letter or digit. Only letters, digits, hyphens (-), underscores (_), and periods (.) are allowed. A DNS subdomain name can be used as the prefix of a key. | - | | - **Value**: A value must start with a letter or digit and can contain a maximum of 63 characters, including letters, digits, hyphens (-), underscores (_), and periods (.). | - | | - **Effect**: Available options are **NoSchedule**, **PreferNoSchedule**, and **NoExecute**. | - | | | - | | For details, see :ref:`Managing Node Taints `. | - | | | - | | .. note:: | - | | | - | | After a **taint** is modified, the inventory nodes in the node pool are updated synchronously. | - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Edit Key pair | Only node pools that use key pairs for login support key pair editing. You can select another key pair. | - | | | - | | .. note:: | - | | | - | | The edited key pair automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the key pair to take effect. | - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Pre-installation Command | Enter commands. A maximum of 1,000 characters are allowed. | - | | | - | | The script will be executed before Kubernetes software is installed. Note that if the script is incorrect, Kubernetes software may fail to be installed. | - | | | - | | .. note:: | - | | | - | | The modified pre-installation command automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the modification to take effect. | - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Post-installation Command | Enter commands. A maximum of 1,000 characters are allowed. | - | | | - | | The script will be executed after Kubernetes software is installed and will not affect the installation. | - | | | - | | .. note:: | - | | | - | | The modified post-installation command automatically takes effect when a node is added. For existing nodes, you need to manually reset the nodes for the modification to take effect. | - +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - -#. Click **OK**. - - After the node pool parameters are updated, go to the **Nodes** page to check whether the node to which the node pool belongs is updated. You can reset the node to synchronize the configuration updates for the node pool. - - .. important:: - - The modification of the system disk or data disk size of a node pool takes effect only for new nodes. The configuration cannot be synchronized even if the existing nodes are reset. - Deleting a Node Pool -------------------- @@ -238,33 +34,14 @@ You can copy the configuration of an existing node pool to create a new node poo #. The configurations of the selected node pool are replicated to the **Clone Node Pool** page. You can edit the configurations as required and click **Next: Confirm**. #. On the **Confirm** page, confirm the node pool configuration and click **Create Now**. Then, a new node pool is created based on the edited configuration. -Migrating a Node ----------------- - -Nodes in a node pool can be migrated. Currently, nodes in a node pool can be migrated only to the default node pool (defaultpool) in the same cluster. - -#. Log in to the CCE console and click the cluster name to access the cluster. - -#. In the navigation pane, choose **Nodes** and switch to the **Node Pools** tab page. - -#. Click **View Node** in the **Operation** column of the node pool to be migrated. - -#. Select the nodes to be migrated and choose **More** > **Migrate** to migrate the nodes to the default node pool in batches. - - You can also choose **More** > **Migrate** in the **Operation** column of a single node to migrate the node. - -#. In the displayed **Migrate Node** window, confirm the information. - - .. note:: - - The migration has no impacts on the original resource tags, Kubernetes labels, and taints of the node. - - :ref:`Configuring a Node Pool ` - -.. |image1| image:: /_static/images/en-us_image_0000001568822861.png +- :ref:`Synchronizing Node Pools ` +- :ref:`Upgrading the OS ` .. toctree:: :maxdepth: 1 :hidden: configuring_a_node_pool + synchronizing_node_pools + upgrading_the_os diff --git a/umn/source/node_pools/managing_a_node_pool/synchronizing_node_pools.rst b/umn/source/node_pools/managing_a_node_pool/synchronizing_node_pools.rst new file mode 100644 index 0000000..83fbff9 --- /dev/null +++ b/umn/source/node_pools/managing_a_node_pool/synchronizing_node_pools.rst @@ -0,0 +1,39 @@ +:original_name: cce_10_0654.html + +.. _cce_10_0654: + +Synchronizing Node Pools +======================== + +After the configuration of a node pool is updated, some configurations cannot be automatically synchronized for existing nodes. You can manually synchronize configurations for these nodes. + +.. important:: + + - Do not delete or reset nodes during batch synchronization. Otherwise, the synchronization of node pool configuration may fail. + - This operation involves resetting nodes. **Workload services running on the nodes may be interrupted due to single-instance deployment or insufficient schedulable resources.** Evaluate the upgrade risks and perform the upgrade during off-peak hours. You can also `specify a disruption budget for your application `__) to ensure the availability of key services during the upgrade. + - During configuration synchronization for existing nodes, the system disk and data disk will be cleared. Back up important data before the synchronization. + - Only some node pool parameters can be synchronized by resetting nodes. The constraints are as follows: + + - When editing the resource tags of the node pool. The modified configuration takes effect only for new nodes. To synchronize the configuration to the existing nodes, you need to manually reset the existing nodes. + - Updates of kubernetes labels and taints are automatically synchronized to existing nodes. You do not need to reset nodes. + +Synchronizing a Single Node +--------------------------- + +#. Log in to the CCE console. +#. Access the cluster console, choose **Nodes** in the navigation pane, and click the **Node Pools** tab on the right. +#. **upgrade** is displayed in the **Node Pool** column of the existing nodes in the node pool. +#. Click **update**. In the dialog box that is displayed, confirm whether to reset the node immediately. + +Batch Synchronization +--------------------- + +#. Log in to the CCE console. +#. Click the cluster name and access the cluster console. Choose **Nodes** in the navigation pane and click the **Node Pools** tab on the right. +#. Choose **More > Synchronize** in the **Operation** column of the target node pool. +#. In the **Batch Synchronization** window, set the synchronization parameters. + + - **Synchronization Policy**: **Node Reset** is supported. + - **Max. Nodes for Batch Synchronize**: Nodes will be unavailable during synchronization in **Node Reset** mode. Set this parameter properly to prevent pod scheduling failures caused by too many unavailable nodes in the cluster. + +#. In the node list, select the nodes to be synchronized and click **OK** to start the synchronization. diff --git a/umn/source/node_pools/managing_a_node_pool/upgrading_the_os.rst b/umn/source/node_pools/managing_a_node_pool/upgrading_the_os.rst new file mode 100644 index 0000000..a745e31 --- /dev/null +++ b/umn/source/node_pools/managing_a_node_pool/upgrading_the_os.rst @@ -0,0 +1,59 @@ +:original_name: cce_10_0660.html + +.. _cce_10_0660: + +Upgrading the OS +================ + +When CCE releases a new OS image, existing nodes cannot be automatically upgraded. You can manually upgrade them in batches. + +.. important:: + + - This operation will upgrade the OS by resetting the node. **Workloads running on the node may be interrupted due to single-instance deployment or insufficient schedulable resources.** Evaluate the upgrade risks and upgrade the OS during off-peak hours, you can also set the Pod Disruption Budget (PDB, that is `disruption budget `__) policy for important applications to ensure their availability. + - Nodes that use private images cannot be upgraded. + +Procedure +--------- + +**Default node pool** + +#. Log in to the CCE console. +#. Click the cluster name and access the cluster console. Choose **Nodes** in the navigation pane and click the **Node Pools** tab on the right. +#. Click **Upgrade** next to the default node pool +#. In the displayed **Operating system upgrade** window, set upgrade parameters. + + - **Max. Unavailable Nodes**: specifies the maximum number of unavailable nodes during node synchronization. + + - **Target Operating System**: You do not need to set this parameter. It is used to display the image information of the target version. + + - **Node List**: Select the nodes to be upgraded. + + - Login Mode: + + - **Key Pair** + + Select the key pair used to log in to the node. You can select a shared key. + + A key pair is used for identity authentication when you remotely log in to a node. If no key pair is available, click **Create Key Pair**. + + - **Pre-installation Command**: Enter a maximum of 1,000 characters. + + The script will be executed before Kubernetes software is installed. Note that if the script is incorrect, Kubernetes software may fail to be installed. + + - **Post-installation Command**: Enter a maximum of 1,000 characters. + + The script will be executed after Kubernetes software is installed and will not affect the installation. + +#. Click **OK**. + +**Non-default node pool** + +#. Log in to the CCE console. +#. Click the cluster name and access the cluster console. Choose **Nodes** in the navigation pane and click the **Node Pools** tab on the right. +#. Choose **More > Synchronize** next to a node pool name. +#. In the displayed **Batch synchronization** dialog box, set upgrade parameters. + + - **Max. Unavailable Nodes**: specifies the maximum number of unavailable nodes during node synchronization. + - **Node List**: This parameter cannot be set. By default, nodes that can be upgraded are selected. + +#. Click **OK**. diff --git a/umn/source/nodes/creating_a_node.rst b/umn/source/nodes/creating_a_node.rst index 320b73c..04c26a4 100644 --- a/umn/source/nodes/creating_a_node.rst +++ b/umn/source/nodes/creating_a_node.rst @@ -47,12 +47,10 @@ After a cluster is created, you can create nodes for the cluster. | Node Type | CCE cluster: | | | | | | - ECS (VM): Containers run on ECSs. | - | | - ECS (physical): Containers run on servers using the QingTian architecture. | | | | | | CCE Turbo cluster: | | | | - | | - ECS (VM): Containers run on ECSs. Only Trunkport ECSs (models that can be bound with multiple elastic network interfaces (ENIs)) are supported. | - | | - ECS (physical): Containers run on servers using the QingTian architecture. | + | | - ECS (VM): Containers run on ECSs. Only Trunkport ECSs (models that can be bound with multiple elastic network interfaces) are supported. | +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Container Engine | CCE clusters support Docker and containerd in some scenarios. | | | | @@ -91,7 +89,7 @@ After a cluster is created, you can create nodes for the cluster. +===================================+===============================================================================================================================================================================================================================================================================================+ | System Disk | System disk used by the node OS. The value ranges from 40 GB to 1,024 GB. The default value is 50 GB. | | | | - | | **Encryption**: Data disk encryption safeguards your data. Snapshots generated from encrypted disks and disks created using these snapshots automatically inherit the encryption function. **This function is available only in certain regions.** | + | | **Encryption**: System disk encryption safeguards your data. Snapshots generated from encrypted disks and disks created using these snapshots automatically inherit the encryption function. **This function is available only in certain regions.** | | | | | | - **Encryption** is not selected by default. | | | - After you select **Encryption**, you can select an existing key in the displayed dialog box. If no key is available, click **View Key List** to create a key. After the key is created, click the refresh icon. | diff --git a/umn/source/nodes/index.rst b/umn/source/nodes/index.rst index acf84f2..28f26de 100644 --- a/umn/source/nodes/index.rst +++ b/umn/source/nodes/index.rst @@ -17,7 +17,6 @@ Nodes - :ref:`Deleting a Node ` - :ref:`Stopping a Node ` - :ref:`Performing Rolling Upgrade for Nodes ` -- :ref:`Node Fault Detection Policy ` .. toctree:: :maxdepth: 1 @@ -35,4 +34,3 @@ Nodes deleting_a_node stopping_a_node performing_rolling_upgrade_for_nodes - node_fault_detection_policy diff --git a/umn/source/nodes/node_fault_detection_policy.rst b/umn/source/nodes/node_fault_detection_policy.rst deleted file mode 100644 index bf36adc..0000000 --- a/umn/source/nodes/node_fault_detection_policy.rst +++ /dev/null @@ -1,343 +0,0 @@ -:original_name: cce_10_0659.html - -.. _cce_10_0659: - -Node Fault Detection Policy -=========================== - -The node fault detection function depends on the :ref:`node-problem-detector (npd) ` add-on. The add-on instances run on nodes and monitor nodes. This section describes how to enable node fault detection. - -Prerequisites -------------- - -The :ref:`npd ` add-on has been installed in the cluster. - -Enabling Node Fault Detection ------------------------------ - -#. Log in to the CCE console and click the cluster name to access the cluster console. - -#. In the navigation pane on the left, choose **Nodes**. Check whether the npd add-on has been installed in the cluster or whether the add-on has been upgraded to the latest version. After the npd add-on has been installed, you can use the fault detection function. - - |image1| - -#. If the npd add-on is running properly, click **Node Fault Detection Policy** to view the current fault detection items. For details about the npd check item list, see :ref:`npd Check Items `. - -#. If the check result of the current node is abnormal, a message is displayed in the node list, indicating that the metric is abnormal. - - |image2| - -#. You can click **Abnormal metrics** and rectify the fault as prompted. - - |image3| - -Customized Check Items ----------------------- - -#. Log in to the CCE console and click the cluster name to access the cluster console. - -#. Choose Node Management on the left and click **Node Fault Detection Policy**. - -#. On the displayed page, view the current check items. Click **Edit** in the **Operation** column and edit checks. - - Currently, the following configurations are supported: - - - **Enable/Disable**: Enable or disable a check item. - - - **Target Node**: By default, check items run on all nodes. You can change the fault threshold based on special scenarios. For example, the spot price ECS interruption reclamation check runs only on the spot price ECS node. - - |image4| - - - **Trigger Threshold**: The default thresholds match common fault scenarios. You can customize and modify the fault thresholds as required. For example, change the threshold for triggering connection tracking table exhaustion from 90% to 80%. - - |image5| - - - **Check Period**: The default check period is 30 seconds. You can modify this parameter as required. - - |image6| - - - **Troubleshooting Strategy**: After a fault occurs, you can select the strategies listed in the following table. - - .. table:: **Table 1** Troubleshooting strategies - - +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Troubleshooting Strategy | Effect | - +==========================+======================================================================================================================================================================================================+ - | Prompting Exception | Reports the Kuberentes events. | - +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Disabling scheduling | Reports the Kuberentes events and adds the **NoSchedule** taint to the node. | - +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Evict Node Load | Reports the Kuberentes events and adds the **NoExecute** taint to the node. This operation will evict workloads on the node and interrupt services. Exercise caution when performing this operation. | - +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - -.. _cce_10_0659__en-us_topic_0000001519314622_section321984418184: - -npd Check Items ---------------- - -.. note:: - - Check items are supported only in 1.16.0 and later versions. - -Check items cover events and statuses. - -- Event-related - - For event-related check items, when a problem occurs, npd reports an event to the API server. The event type can be **Normal** (normal event) or **Warning** (abnormal event). - - .. table:: **Table 2** Event-related check items - - +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ - | Check Item | Function | Description | - +=======================+==============================================================================================================================================================================================================================================================+=======================================================================================================+ - | OOMKilling | Listen to the kernel logs and check whether OOM events occur and are reported. | Warning event | - | | | | - | | Typical scenario: When the memory usage of a process in a container exceeds the limit, OOM is triggered and the process is terminated. | Listening object: **/dev/kmsg** | - | | | | - | | | Matching rule: "Killed process \\\\d+ (.+) total-vm:\\\\d+kB, anon-rss:\\\\d+kB, file-rss:\\\\d+kB.*" | - +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ - | TaskHung | Listen to the kernel logs and check whether taskHung events occur and are reported. | Warning event | - | | | | - | | Typical scenario: Disk I/O suspension causes process suspension. | Listening object: **/dev/kmsg** | - | | | | - | | | Matching rule: "task \\\\S+:\\\\w+ blocked for more than \\\\w+ seconds\\\\." | - +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ - | ReadonlyFilesystem | Check whether the **Remount root filesystem read-only** error occurs in the system kernel by listening to the kernel logs. | Warning event | - | | | | - | | Typical scenario: A user detaches a data disk from a node by mistake on the ECS, and applications continuously write data to the mount point of the data disk. As a result, an I/O error occurs in the kernel and the disk is remounted as a read-only disk. | Listening object: **/dev/kmsg** | - | | | | - | | | Matching rule: **Remounting filesystem read-only** | - +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ - -- Status-related - - For status-related check items, when a problem occurs, npd reports an event to the API server and changes the node status synchronously. This function can be used together with :ref:`Node-problem-controller fault isolation ` to isolate nodes. - - **If the check period is not specified in the following check items, the default period is 30 seconds.** - - .. table:: **Table 3** Checking system components - - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Check Item | Function | Description | - +===================================+===========================================================================================================+=========================================================================================================================================+ - | Container network component error | Check the status of the CNI components (container network components). | None | - | | | | - | CNIProblem | | | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Container runtime component error | Check the status of Docker and containerd of the CRI components (container runtime components). | Check object: Docker or containerd | - | | | | - | CRIProblem | | | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Frequent restarts of Kubelet | Periodically backtrack system logs to check whether the key component Kubelet restarts frequently. | - Default threshold: 10 restarts within 10 minutes | - | | | | - | FrequentKubeletRestart | | If Kubelet restarts for 10 times within 10 minutes, it indicates that the system restarts frequently and a fault alarm is generated. | - | | | | - | | | - Listening object: logs in the **/run/log/journal** directory | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Frequent restarts of Docker | Periodically backtrack system logs to check whether the container runtime Docker restarts frequently. | | - | | | | - | FrequentDockerRestart | | | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Frequent restarts of containerd | Periodically backtrack system logs to check whether the container runtime containerd restarts frequently. | | - | | | | - | FrequentContainerdRestart | | | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | kubelet error | Check the status of the key component Kubelet. | None | - | | | | - | KubeletProblem | | | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | kube-proxy error | Check the status of the key component kube-proxy. | None | - | | | | - | KubeProxyProblem | | | - +-----------------------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - - .. table:: **Table 4** Checking system metrics - - +--------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - | Check Item | Function | Description | - +================================+==============================================================================================================================+============================================================================================================+ - | Conntrack table full | Check whether the conntrack table is full. | - Default threshold: 90% | - | | | | - | ConntrackFullProblem | | - Usage: **nf_conntrack_count** | - | | | - Maximum value: **nf_conntrack_max** | - +--------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - | Insufficient disk resources | Check the usage of the system disk and CCE data disks (including the CRI logical disk and kubelet logical disk) on the node. | - Default threshold: 90% | - | | | | - | DiskProblem | | - Source: | - | | | | - | | | .. code-block:: | - | | | | - | | | df -h | - | | | | - | | | Currently, additional data disks are not supported. | - +--------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - | Insufficient file handles | Check whether FD file handles are used up. | - Default threshold: 90% | - | | | - Usage: the first value in **/proc/sys/fs/file-nr** | - | FDProblem | | - Maximum value: the third value in **/proc/sys/fs/file-nr** | - +--------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - | Insufficient node memory | Check whether memory is used up. | - Default threshold: 80% | - | | | - Usage: **MemTotal-MemAvailable** in **/proc/meminfo** | - | MemoryProblem | | - Maximum value: **MemTotal** in **/proc/meminfo** | - +--------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - | Insufficient process resources | Check whether PID process resources are exhausted. | - Default threshold: 90% | - | | | - Usage: **nr_threads in /proc/loadavg** | - | PIDProblem | | - Maximum value: smaller value between **/proc/sys/kernel/pid_max** and **/proc/sys/kernel/threads-max**. | - +--------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+ - - .. table:: **Table 5** Checking the storage - - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Check Item | Function | Description | - +================================+====================================================================================================================================================================================================================================================================================================================================================================================================+=======================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ - | Disk read-only | Periodically perform read and write tests on the system disk and CCE data disks (including the CRI logical disk and Kubelet logical disk) of the node to check the availability of key disks. | Detection paths: | - | | | | - | DiskReadonly | | - /mnt/paas/kubernetes/kubelet/ | - | | | - /var/lib/docker/ | - | | | - /var/lib/containerd/ | - | | | - /var/paas/sys/log/cceaddon-npd/ | - | | | | - | | | The temporary file **npd-disk-write-ping** is generated in the detection path. | - | | | | - | | | Currently, additional data disks are not supported. | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Insufficient disk resources | Check the usage of the system disk and CCE data disks (including the CRI logical disk and kubelet logical disk) on the node. | - Default threshold: 90% | - | | | | - | DiskProblem | | - Source: | - | | | | - | | | .. code-block:: | - | | | | - | | | df -h | - | | | | - | | | Currently, additional data disks are not supported. | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | emptyDir storage pool error | Check whether the ephemeral volume group on the node is normal. | - Detection period: 30s | - | | | | - | EmptyDirVolumeGroupStatusError | Impact: The pod that depends on the storage pool cannot write data to the temporary volume. The temporary volume is remounted as a read-only file system by the kernel due to an I/O error. | - Source: | - | | | | - | | Typical scenario: When creating a node, a user configures two data disks as a temporary volume storage pool. The user deletes some data disks by mistake. As a result, the storage pool becomes abnormal. | .. code-block:: | - | | | | - | | | vgs -o vg_name, vg_attr | - | | | | - | | | - Principle: Check whether the VG (storage pool) is in the P state. If yes, some PVs (data disks) are lost. | - | | | | - | | | - Joint scheduling: The scheduler can automatically identify a PV storage pool error and prevent pods that depend on the storage pool from being scheduled to the node. | - | | | | - | | | - Exceptional scenario: The npd add-on cannot detect the loss of all PVs (data disks), resulting in the loss of VGs (storage pools). In this case, kubelet automatically isolates the node, detects the loss of VGs (storage pools), and updates the corresponding resources in **nodestatus.allocatable** to **0**. This prevents pods that depend on the storage pool from being scheduled to the node. The damage of a single PV cannot be detected by this check item, but by the ReadonlyFilesystem check item. | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | PV storage pool error | Check the PV group on the node. | | - | | | | - | LocalPvVolumeGroupStatusError | Impact: Pods that depend on the storage pool cannot write data to the persistent volume. The persistent volume is remounted as a read-only file system by the kernel due to an I/O error. | | - | | | | - | | Typical scenario: When creating a node, a user configures two data disks as a persistent volume storage pool. Some data disks are deleted by mistake. | | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Mount point error | Check the mount point on the node. | Alternatively, you can run the following command: | - | | | | - | MountPointProblem | Exceptional definition: You cannot access the mount point by running the **cd** command. | .. code-block:: | - | | | | - | | Typical scenario: Network File System (NFS), for example, obsfs and s3fs is mounted to a node. When the connection is abnormal due to network or peer NFS server exceptions, all processes that access the mount point are suspended. For example, during a cluster upgrade, a kubelet is restarted, and all mount points are scanned. If the abnormal mount point is detected, the upgrade fails. | for dir in `df -h | grep -v "Mounted on" | awk "{print \\$NF}"`;do cd $dir; done && echo "ok" | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Suspended disk I/O | Check whether I/O suspension occurs on all disks on the node, that is, whether I/O read and write operations are not responded. | - Check object: all data disks | - | | | | - | DiskHung | Definition of I/O suspension: The system does not respond to disk I/O requests, and some processes are in the D state. | - Source: | - | | | | - | | Typical scenario: Disks cannot respond due to abnormal OS hard disk drivers or severe faults on the underlying network. | /proc/diskstat | - | | | | - | | | Alternatively, you can run the following command: | - | | | | - | | | .. code-block:: | - | | | | - | | | iostat -xmt 1 | - | | | | - | | | - Threshold: | - | | | | - | | | - Average usage: ioutil >= 0.99 | - | | | - Average I/O queue length: avgqu-sz >= 1 | - | | | - Average I/O transfer volume: iops (w/s) + ioth (wMB/s) <= 1 | - | | | | - | | | .. note:: | - | | | | - | | | In some OSs, no data changes during I/O. In this case, calculate the CPU I/O time usage. The value of iowait should be greater than 0.8. | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Slow disk I/O | Check whether all disks on the node have slow I/Os, that is, whether I/Os respond slowly. | - Check object: all data disks | - | | | | - | DiskSlow | Typical scenario: EVS disks have slow I/Os due to network fluctuation. | - Source: | - | | | | - | | | /proc/diskstat | - | | | | - | | | Alternatively, you can run the following command: | - | | | | - | | | .. code-block:: | - | | | | - | | | iostat -xmt 1 | - | | | | - | | | - Default threshold: | - | | | | - | | | Average I/O latency: await >= 5000 ms | - | | | | - | | | .. note:: | - | | | | - | | | If I/O requests are not responded and the **await** data is not updated, this check item is invalid. | - +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - - .. table:: **Table 6** Other check items - - +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Check Item | Function | Description | - +==========================+=========================================================================================================================================================================================================+=========================================================================================================================================+ - | Abnormal NTP | Check whether the node clock synchronization service ntpd or chronyd is running properly and whether a system time drift is caused. | Default clock offset threshold: 8000 ms | - | | | | - | NTPProblem | | | - +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Process D error | Check whether there is a process D on the node. | Default threshold: 10 abnormal processes detected for three consecutive times | - | | | | - | ProcessD | | Source: | - | | | | - | | | - /proc/{PID}/stat | - | | | - Alternately, you can run the **ps aux** command. | - | | | | - | | | Exceptional scenario: ProcessD ignores the resident D processes (heartbeat and update) on which the SDI driver on the BMS node depends. | - +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Process Z error | Check whether the node has processes in Z state. | | - | | | | - | ProcessZ | | | - +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | ResolvConf error | Check whether the ResolvConf file is lost. | Object: **/etc/resolv.conf** | - | | | | - | ResolvConfFileProblem | Check whether the ResolvConf file is normal. | | - | | | | - | | Exceptional definition: No upstream domain name resolution server (nameserver) is included. | | - +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - | Existing scheduled event | Check whether scheduled live migration events exist on the node. A live migration plan event is usually triggered by a hardware fault and is an automatic fault rectification method at the IaaS layer. | Source: | - | | | | - | ScheduledEvent | Typical scenario: The host is faulty. For example, the fan is damaged or the disk has bad sectors. As a result, live migration is triggered for VMs. | - http://169.254.169.254/meta-data/latest/events/scheduled | - | | | | - | | | This check item is an Alpha feature and is disabled by default. | - +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ - - The kubelet component has the following default check items, which have bugs or defects. You can fix them by upgrading the cluster or using npd. - - .. table:: **Table 7** Default kubelet check items - - +-----------------------------+------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Check Item | Function | Description | - +=============================+========================================================================+==========================================================================================================================================================================================================================================================================================================================+ - | Insufficient PID resources | Check whether PIDs are sufficient. | - Interval: 10 seconds | - | | | - Threshold: 90% | - | PIDPressure | | - Defect: In community version 1.23.1 and earlier versions, this check item becomes invalid when over 65535 PIDs are used. For details, see `issue 107107 `__. In community version 1.24 and earlier versions, thread-max is not considered in this check item. | - +-----------------------------+------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Insufficient memory | Check whether the allocable memory for the containers is sufficient. | - Interval: 10 seconds | - | | | - Threshold: max. 100 MiB | - | MemoryPressure | | - Allocable = Total memory of a node - Reserved memory of a node | - | | | - Defect: This check item checks only the memory consumed by containers, and does not consider that consumed by other elements on the node. | - +-----------------------------+------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Insufficient disk resources | Check the disk usage and inodes usage of the kubelet and Docker disks. | - Interval: 10 seconds | - | | | - Threshold: 90% | - | DiskPressure | | | - +-----------------------------+------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - -.. |image1| image:: /_static/images/en-us_image_0000001519067438.png -.. |image2| image:: /_static/images/en-us_image_0000001520080400.png -.. |image3| image:: /_static/images/en-us_image_0000001571360421.png -.. |image4| image:: /_static/images/en-us_image_0000001570344789.png -.. |image5| image:: /_static/images/en-us_image_0000001519063542.png -.. |image6| image:: /_static/images/en-us_image_0000001519544422.png diff --git a/umn/source/permissions_management/namespace_permissions_kubernetes_rbac-based.rst b/umn/source/permissions_management/namespace_permissions_kubernetes_rbac-based.rst index 45b3a32..4c45b1c 100644 --- a/umn/source/permissions_management/namespace_permissions_kubernetes_rbac-based.rst +++ b/umn/source/permissions_management/namespace_permissions_kubernetes_rbac-based.rst @@ -30,6 +30,8 @@ On the CCE console, you can assign permissions to a user or user group to access - edit (development): read and write permissions on most resources in all or selected namespaces. If this ClusterRole is configured for all namespaces, its capability is the same as the O&M permission. - admin (O&M): read and write permissions on most resources in all namespaces, and read-only permission on nodes, storage volumes, namespaces, and quota management. - cluster-admin (administrator): read and write permissions on all resources in all namespaces. +- drainage-editor: drain a node. +- drainage-viewer: view the nodal drainage status but cannot drain a node. .. _cce_10_0189__section207514572488: diff --git a/umn/source/product_bulletin/os_patch_notes_for_cluster_nodes.rst b/umn/source/product_bulletin/os_patch_notes_for_cluster_nodes.rst index a8540e0..aa6534c 100644 --- a/umn/source/product_bulletin/os_patch_notes_for_cluster_nodes.rst +++ b/umn/source/product_bulletin/os_patch_notes_for_cluster_nodes.rst @@ -8,7 +8,7 @@ OS Patch Notes for Cluster Nodes Nodes in Hybrid Clusters ------------------------ -CCE nodes in Hybrid clusters can run on EulerOS 2.5, EulerOS 2.9, CentOS 7.7and Ubuntu 22.04. The following table lists the supported patches for these OSs. +CCE nodes in Hybrid clusters can run on EulerOS 2.5, EulerOS 2.9, CentOS 7.7 and Ubuntu 22.04. The following table lists the supported patches for these OSs. .. table:: **Table 1** Node OS patches @@ -41,8 +41,6 @@ CCE nodes in Hybrid clusters can run on EulerOS 2.5, EulerOS 2.9, CentOS 7.7and +--------------------------+-----------------+-------------------------------------------+ | Ubuntu 22.04 | v1.25 | 5.15.0-53-generic | +--------------------------+-----------------+-------------------------------------------+ - | | v1.23 | 5.15.0-53-generic | - +--------------------------+-----------------+-------------------------------------------+ .. table:: **Table 2** Mappings between BMS node OS versions and cluster versions @@ -61,8 +59,6 @@ CCE nodes in Hybrid clusters can run on EulerOS 2.5, EulerOS 2.9, CentOS 7.7and +==========================+=================+=============+================+==========================+ | Ubuntu 22.04 | v1.25 | Y | x | Y | +--------------------------+-----------------+-------------+----------------+--------------------------+ - | | v1.23 | Y | x | Y | - +--------------------------+-----------------+-------------+----------------+--------------------------+ | CentOS Linux release 7.7 | v1.25 | Y | Y | Y | +--------------------------+-----------------+-------------+----------------+--------------------------+ | | v1.23 | Y | Y | Y | diff --git a/umn/source/service_overview/application_scenarios/hybrid_cloud_architecture.rst b/umn/source/service_overview/application_scenarios/hybrid_cloud_architecture.rst index 500f4d8..e7f088b 100644 --- a/umn/source/service_overview/application_scenarios/hybrid_cloud_architecture.rst +++ b/umn/source/service_overview/application_scenarios/hybrid_cloud_architecture.rst @@ -54,7 +54,7 @@ Related Services Elastic Cloud Server (ECS), Direct Connect (DC), Virtual Private Network (VPN), SoftWare Repository for Container (SWR) -.. figure:: /_static/images/en-us_image_0000001499725826.png +.. figure:: /_static/images/en-us_image_0000001626725269.png :alt: **Figure 1** How hybrid cloud works **Figure 1** How hybrid cloud works diff --git a/umn/source/service_overview/basic_concepts/regions_and_azs.rst b/umn/source/service_overview/basic_concepts/regions_and_azs.rst index 0f76817..c34cde6 100644 --- a/umn/source/service_overview/basic_concepts/regions_and_azs.rst +++ b/umn/source/service_overview/basic_concepts/regions_and_azs.rst @@ -13,15 +13,7 @@ A region and availability zone (AZ) identify the location of a data center. You - Regions are divided based on geographical location and network latency. Public services, such as Elastic Cloud Server (ECS), Elastic Volume Service (EVS), Object Storage Service (OBS), Virtual Private Cloud (VPC), Elastic IP (EIP), and Image Management Service (IMS), are shared within the same region. Regions are classified as universal regions and dedicated regions. A universal region provides universal cloud services for common domains. A dedicated region provides services of the same type only or for specific domains. - An AZ contains one or more physical data centers. Each AZ has independent cooling, fire extinguishing, moisture-proof, and electricity facilities. Within an AZ, computing, network, storage, and other resources are logically divided into multiple clusters. AZs in a region are interconnected through high-speed optic fibers. This is helpful if you will deploy systems across AZs to achieve higher availability. -shows the relationship between the region and AZ. - - -.. figure:: /_static/images/en-us_image_0000001550365693.png - :alt: **Figure 1** Regions and AZs - - **Figure 1** Regions and AZs - -Huawei Cloud provides services in many regions around the world. You can select a region and AZ as needed. +Cloud services are available in many regions around the world. You can select a region and AZ as needed. How to Select a Region? ----------------------- diff --git a/umn/source/service_overview/notes_and_constraints.rst b/umn/source/service_overview/notes_and_constraints.rst index 87ffb6c..b19d8c7 100644 --- a/umn/source/service_overview/notes_and_constraints.rst +++ b/umn/source/service_overview/notes_and_constraints.rst @@ -21,7 +21,7 @@ Clusters and Nodes - Underlying resources, such as ECSs (nodes), are limited by quotas and their inventory. Therefore, only some nodes may be successfully created during cluster creation, cluster scaling, or auto scaling. - The ECS (node) specifications must be higher than 2 cores and 4 GB memory. - To access a CCE cluster through a VPN, ensure that the VPN CIDR block does not conflict with the VPC CIDR block where the cluster resides and the container CIDR block. -- Ubuntu 22.04 does not support the container tunnel network model. +- Ubuntu 22.04 does not support the tunnel network model. Networking ---------- diff --git a/umn/source/service_overview/permissions.rst b/umn/source/service_overview/permissions.rst index f5e4176..5b1d666 100644 --- a/umn/source/service_overview/permissions.rst +++ b/umn/source/service_overview/permissions.rst @@ -164,12 +164,13 @@ Role and ClusterRole specify actions that can be performed on specific resources **Figure 1** Role binding -On the CCE console, you can assign permissions to a user or user group to access resources in one or multiple namespaces. By default, the CCE console provides the following five ClusterRoles: +On the CCE console, you can assign permissions to a user or user group to access resources in one or all namespaces. By default, the CCE console provides the following ClusterRoles: -- view: has the permission to view namespace resources. -- edit: has the permission to modify namespace resources. -- admin: has all permissions on the namespace. -- cluster-admin: has all permissions on the cluster. -- psp-global: controls sensitive security aspects of the pod specification. +- view (read-only): read-only permission on most resources in all or selected namespaces. +- edit (development): read and write permissions on most resources in all or selected namespaces. If this ClusterRole is configured for all namespaces, its capability is the same as the O&M permission. +- admin (O&M): read and write permissions on most resources in all namespaces, and read-only permission on nodes, storage volumes, namespaces, and quota management. +- cluster-admin (administrator): read and write permissions on all resources in all namespaces. +- drainage-editor: drain a node. +- drainage-viewer: view the nodal drainage status but cannot drain a node. In addition to cluster-admin, admin, edit, and view, you can define Roles and RoleBindings to configure the permissions to add, delete, modify, and query resources, such as pods, Deployments, and Services, in the namespace. diff --git a/umn/source/storage/storageclass.rst b/umn/source/storage/storageclass.rst index ccdd24f..3739a38 100644 --- a/umn/source/storage/storageclass.rst +++ b/umn/source/storage/storageclass.rst @@ -191,36 +191,6 @@ Other types of storage resources can be defined in the similar way. You can use reclaimPolicy: Delete volumeBindingMode: Immediate -Specifying an Enterprise Project for Storage Classes ----------------------------------------------------- - -CCE allows you to specify an enterprise project when creating EVS disks and OBS PVCs. The created storage resources (EVS disks and OBS) belong to the specified enterprise project. **The enterprise project can be the enterprise project to which the cluster belongs or the default enterprise project.** - -If you do no specify any enterprise project, the enterprise project in StorageClass is used by default. The created storage resources by using the csi-disk and csi-obs storage classes of CCE belong to the default enterprise project. - -If you want the storage resources created from the storage classes to be in the same enterprise project as the cluster, you can customize a storage class and specify the enterprise project ID, as shown below. - -.. note:: - - To use this function, the everest add-on must be upgraded to 1.2.33 or later. - -.. code-block:: - - kind: StorageClass - apiVersion: storage.k8s.io/v1 - metadata: - name: csi-disk-epid #Customize a storage class name. - provisioner: everest-csi-provisioner - parameters: - csi.storage.k8s.io/csi-driver-name: disk.csi.everest.io - csi.storage.k8s.io/fstype: ext4 - everest.io/disk-volume-type: SAS - everest.io/enterprise-project-id: 86bfc701-9d9e-4871-a318-6385aa368183 #Specify the enterprise project ID. - everest.io/passthrough: 'true' - reclaimPolicy: Delete - allowVolumeExpansion: true - volumeBindingMode: Immediate - Setting a Default Storage Class ------------------------------- diff --git a/umn/source/workloads/volcano_scheduling/hybrid_deployment_of_online_and_offline_jobs.rst b/umn/source/workloads/volcano_scheduling/hybrid_deployment_of_online_and_offline_jobs.rst index d60a02f..03b7138 100644 --- a/umn/source/workloads/volcano_scheduling/hybrid_deployment_of_online_and_offline_jobs.rst +++ b/umn/source/workloads/volcano_scheduling/hybrid_deployment_of_online_and_offline_jobs.rst @@ -83,14 +83,15 @@ Notes and Constraints - Kubernetes version: - - 1.19: 1.19.16-r4 or later - - 1.21: 1.21.7-r0 or later - - 1.23: 1.23.5-r0 or later + - v1.19: v1.19.16-r4 or later + - v1.21: v1.21.7-r0 or later + - v1.23: v1.23.5-r0 or later + - v1.25 or later -- Cluster Type: CCE or CCE Turbo -- Node OS: EulerOS 2.9 (kernel-4.18.0-147.5.1.6.h729.6.eulerosv2r9.x86_64) or Huawei Cloud EulerOS 2.0 -- Node Type: ECS -- The volcano add-on version: 1.7.0 or later +- Cluster type: CCE or CCE Turbo +- Node OS: EulerOS 2.9 (kernel-4.18.0-147.5.1.6.h729.6.eulerosv2r9.x86_64) +- Node type: ECS +- volcano add-on version: 1.7.0 or later **Constraints** @@ -505,6 +506,132 @@ The following uses an example to describe how to deploy online and offline jobs online-6f44bb68bd-b8z9p 1/1 Running 0 24m 192.168.10.18 192.168.0.173 online-6f44bb68bd-g6xk8 1/1 Running 0 24m 192.168.10.69 192.168.0.173 +#. Log in to the CCE console and access the cluster console. + +#. In the navigation pane on the left, choose **Nodes**. Click the **Node Pools** tab. When creating or updating a node pool, enable hybrid deployment of online and offline services in **Advanced Settings**. + +#. In the navigation pane on the left, choose **Add-ons**. Click **Install** under volcano. In the **Advanced Settings** area, set **colocation_enable** to **true** to enable hybrid deployment of online and offline services. For details about the installation, see :ref:`volcano `. + + If the volcano add-on has been installed, click **Edit** to view or modify the parameter **colocation_enable**. + +#. Enable CPU Burst. + + After confirming that the volcano add-on is working, run the following command to edit the parameter **configmap** of **volcano-agent-configuration** in the namespace **kube-system**. If **enable** is set to **true**, CPU Burst is enabled. If **enable** is set to **false**, CPU Burst is disabled. + + .. code-block:: + + kubectl edit configmap -nkube-system volcano-agent-configuration + + Example: + + .. code-block:: + + cpuBurstConfig: + enable: true + +#. Deploy a workload in a node pool where hybrid deployment has been enabled. Take Nginx as an example. Set **cpu** under **requests** to **2** and **cpu** under **limits** to **4**, and create a Service that can be accessed in the cluster for the workload. + + .. code-block:: + + apiVersion: apps/v1 + kind: Deployment + metadata: + name: nginx + namespace: default + spec: + replicas: 2 + selector: + matchLabels: + app: nginx + template: + metadata: + labels: + app: nginx + annotations: + volcano.sh/enable-quota-burst=true + volcano.sh/quota-burst-time=200000 + spec: + containers: + - name: container-1 + image: nginx:latest + resources: + limits: + cpu: "4" + requests: + cpu: "2" + imagePullSecrets: + - name: default-secret + --- + apiVersion: v1 + kind: Service + metadata: + name: nginx + namespace: default + labels: + app: nginx + spec: + selector: + app: nginx + ports: + - name: cce-service-0 + targetPort: 80 + nodePort: 0 + port: 80 + protocol: TCP + type: ClusterIP + + +------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Annotation | Mandatory | Description | + +====================================+=======================+=================================================================================================================================================================================================================================================================================================================================================+ + | volcano.sh/enable-quota-burst=true | Yes | CPU Burst is enabled for the workload. | + +------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | volcano.sh/quota-burst-time=200000 | No | To ensure CPU scheduling stability and reduce contention when multiple containers encounter CPU bursts at the same time, the default **CPU Burst** value is the same as the **CPU Quota** value. That is, a container can use a maximum of twice the **CPU Limit** value. By default, **CPU Burst** is set for all service containers in a pod. | + | | | | + | | | In this example, the **CPU Limit** of the container is **4**, that is, the default value is **400,000** (1 core = 100,000), indicating that a maximum of four additional cores can be used after the **CPU Limit** value is reached. | + +------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Verify CPU Burst. + + You can use the wrk tool to increase load of the workload and observe the service latency, traffic limiting, and CPU limit exceeding when CPU Burst is enabled and disabled, respectively. + + a. Run the following command to increase load of the pod. *$service_ip* indicates the service IP address associated with the pod. + + .. code-block:: + + # You need to download and install the wrk tool on the node. + # The Gzip compression module is enabled in the Apache configuration to simulate the computing logic for the server to process requests. + # Run the following command to increase the load. Note that you need to change the IP address of the target application. + wrk -H "Accept-Encoding: deflate, gzip" -t 4 -c 28 -d 120 --latency --timeout 2s http://$service_ip + + b. Obtain the pod ID. + + .. code-block:: + + kubectl get pods -n -o jsonpath='{.metadata.uid}' + + c. You can run the following command on the node to view the traffic limiting status and CPU limit exceeding status. In the command, *$PodID* indicates the pod ID. + + .. code-block:: + + $cat /sys/fs/cgroup/cpuacct/kubepods/$PodID/cpu.stat + nr_periods 0 # Number of scheduling periods + nr_throttled 0 # Traffic limiting times + throttled_time 0 # Traffic limiting duration (ns) + nr_bursts 0 # CPU Limit exceeding times + burst_time 0 # Total Limit exceeding duration + + .. table:: **Table 3** Result summary in this example + + +-----------------------+-------------+------------------------+---------------------------+-----------------------+--------------------------------+ + | CPU Burst | P99 Latency | nr_throttled | throttled_time | nr_bursts | bursts_time | + | | | | | | | + | | | Traffic Limiting Times | Traffic Limiting Duration | Limit Exceeding Times | Total Limit Exceeding Duration | + +=======================+=============+========================+===========================+=======================+================================+ + | CPU Burst not enabled | 2.96 ms | 986 | 14.3s | 0 | 0 | + +-----------------------+-------------+------------------------+---------------------------+-----------------------+--------------------------------+ + | CPU Burst enabled | 456 µs | 0 | 0 | 469 | 3.7s | + +-----------------------+-------------+------------------------+---------------------------+-----------------------+--------------------------------+ + Handling Suggestions --------------------