:original_name: cce_10_0054.html

.. _cce_10_0054:

High-Risk Operations and Solutions
==================================

During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks.

Clusters and Nodes
------------------

.. table:: **Table 1** High-risk operations and solutions

   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Category        | Operation                                                                                             | Impact                                                                                                                                                                                                                                                                               | Solution                                                                                                                                          |
   +=================+=======================================================================================================+======================================================================================================================================================================================================================================================================================+===================================================================================================================================================+
   | Master node     | Modifying the security group of a node in a cluster                                                   | The master node may be unavailable.                                                                                                                                                                                                                                                  | Restore the security group by referring to the security group of the new cluster and allow traffic from the security group to pass through.       |
   |                 |                                                                                                       |                                                                                                                                                                                                                                                                                      |                                                                                                                                                   |
   |                 |                                                                                                       | .. note::                                                                                                                                                                                                                                                                            |                                                                                                                                                   |
   |                 |                                                                                                       |                                                                                                                                                                                                                                                                                      |                                                                                                                                                   |
   |                 |                                                                                                       |    Naming rule of a master node: *Cluster name*\ ``-``\ **cce-control**\ ``-``\ *Random number*                                                                                                                                                                                      |                                                                                                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Letting the node expire or destroying the node                                                        | The master node will be unavailable.                                                                                                                                                                                                                                                 | This operation cannot be undone.                                                                                                                  |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Reinstalling the OS                                                                                   | Components on the master node will be deleted.                                                                                                                                                                                                                                       | This operation cannot be undone.                                                                                                                  |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Upgrading components on the master or etcd node                                                       | The cluster may be unavailable.                                                                                                                                                                                                                                                      | Roll back to the original version.                                                                                                                |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Deleting or formatting core directory data such as **/etc/kubernetes** on the node                    | The master node will be unavailable.                                                                                                                                                                                                                                                 | This operation cannot be undone.                                                                                                                  |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Changing the node IP address                                                                          | The master node will be unavailable.                                                                                                                                                                                                                                                 | Change the IP address back to the original one.                                                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Modifying parameters of core components (such as etcd, kube-apiserver, and docker)                    | The master node may be unavailable.                                                                                                                                                                                                                                                  | Restore the parameter settings to the recommended values. For details, see :ref:`Cluster Configuration Management <cce_10_0213>`.                 |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Replacing the master or etcd certificate                                                              | The cluster may become unavailable.                                                                                                                                                                                                                                                  | This operation cannot be undone.                                                                                                                  |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Worker node     | Modifying the security group of a node in a cluster                                                   | The node may be unavailable.                                                                                                                                                                                                                                                         | Restore the security group by referring to :ref:`Creating a CCE Cluster <cce_10_0028>` and allow traffic from the security group to pass through. |
   |                 |                                                                                                       |                                                                                                                                                                                                                                                                                      |                                                                                                                                                   |
   |                 |                                                                                                       | .. note::                                                                                                                                                                                                                                                                            |                                                                                                                                                   |
   |                 |                                                                                                       |                                                                                                                                                                                                                                                                                      |                                                                                                                                                   |
   |                 |                                                                                                       |    Naming rule of a worker node: *Cluster name*\ ``-``\ **cce-node**\ ``-``\ *Random number*                                                                                                                                                                                         |                                                                                                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Deleting the node                                                                                     | The node will become unavailable.                                                                                                                                                                                                                                                    | This operation cannot be undone.                                                                                                                  |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Reinstalling the OS                                                                                   | Node components are deleted, and the node becomes unavailable.                                                                                                                                                                                                                       | Reset the node. For details, see :ref:`Resetting a Node <cce_10_0003>`.                                                                           |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Upgrading the node kernel                                                                             | The node may be unavailable or the network may be abnormal.                                                                                                                                                                                                                          | For details, see :ref:`Resetting a Node <cce_10_0003>`.                                                                                           |
   |                 |                                                                                                       |                                                                                                                                                                                                                                                                                      |                                                                                                                                                   |
   |                 |                                                                                                       | .. note::                                                                                                                                                                                                                                                                            |                                                                                                                                                   |
   |                 |                                                                                                       |                                                                                                                                                                                                                                                                                      |                                                                                                                                                   |
   |                 |                                                                                                       |    Node running depends on the system kernel version. Do not use the **yum update** command to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.) |                                                                                                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Changing the node IP address                                                                          | The node will become unavailable.                                                                                                                                                                                                                                                    | Change the IP address back to the original one.                                                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Modifying parameters of core components (such as kubelet and kube-proxy)                              | The node may become unavailable, and components may be insecure if security-related configurations are modified.                                                                                                                                                                     | Restore the parameter settings to the recommended values. For details, see :ref:`Configuring a Node Pool <cce_10_0652>`.                          |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Modifying OS configuration                                                                            | The node may be unavailable.                                                                                                                                                                                                                                                         | Restore the configuration items or reset the node. For details, see :ref:`Resetting a Node <cce_10_0003>`.                                        |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Deleting or modifying the **/opt/cloud/cce** and **/var/paas** directories, and delete the data disk. | The node will become unready.                                                                                                                                                                                                                                                        | You can reset the node. For details, see :ref:`Resetting a Node <cce_10_0003>`.                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Modifying the node directory permission and the container directory permission                        | The permissions will be abnormal.                                                                                                                                                                                                                                                    | You are not advised to modify the permissions. Restore the permissions if they are modified.                                                      |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Formatting or partitioning system disks, Docker disks, and kubelet disks on nodes.                    | The node may be unavailable.                                                                                                                                                                                                                                                         | You can reset the node. For details, see :ref:`Resetting a Node <cce_10_0003>`.                                                                   |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Installing other software on nodes                                                                    | This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable.                                                                                                                                                                             | Uninstall the software that has been installed and restore or reset the node. For details, see :ref:`Resetting a Node <cce_10_0003>`.             |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Modifying NetworkManager configurations                                                               | The node will become unavailable.                                                                                                                                                                                                                                                    | Reset the node. For details, see :ref:`Resetting a Node <cce_10_0003>`.                                                                           |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   |                 | Delete system images such as **cfe-pause** from the node.                                             | Containers cannot be created and system images cannot be pulled.                                                                                                                                                                                                                     | Copy the image from another normal node for restoration.                                                                                          |
   +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+

Networking and Load Balancing
-----------------------------

.. table:: **Table 2** High-risk operations and solutions

   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Operation                                                                                                         | Impact                                                                     | How to Avoid/Fix                                                                                                                                  |
   +===================================================================================================================+============================================================================+===================================================================================================================================================+
   | Changing the value of the kernel parameter **net.ipv4.ip_forward** to **0**                                       | The network becomes inaccessible.                                          | Change the value to **1**.                                                                                                                        |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Changing the value of the kernel parameter **net.ipv4.tcp_tw_recycle** to **1**                                   | The NAT service becomes abnormal.                                          | Change the value to **0**.                                                                                                                        |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Changing the value of the kernel parameter **net.ipv4.tcp_tw_reuse** to **1**                                     | The network becomes abnormal.                                              | Change the value to **0**.                                                                                                                        |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block  | The DNS in the cluster cannot work properly.                               | Restore the security group by referring to :ref:`Creating a CCE Cluster <cce_10_0028>` and allow traffic from the security group to pass through. |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Creating a custom listener on the ELB console for the load balancer managed by CCE                                | The modified items are reset by CCE or the ingress is faulty.              | Use the YAML file of the Service to automatically create a listener.                                                                              |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Binding a user-defined backend on the ELB console to the load balancer managed by CCE.                            |                                                                            | Do not manually bind any backend.                                                                                                                 |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Changing the ELB certificate on the ELB console for the load balancer managed by CCE.                             |                                                                            | Use the YAML file of the ingress to automatically manage certificates.                                                                            |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Changing the listener name on the ELB console for the ELB listener managed by CCE.                                |                                                                            | Do not change the name of the ELB listener managed by CCE.                                                                                        |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Changing the description of load balancers, listeners, and forwarding policies managed by CCE on the ELB console. |                                                                            | Do not modify the description of load balancers, listeners, or forwarding policies managed by CCE.                                                |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
   | Delete CRD resources of network-attachment-definitions of default-network.                                        | The container network is disconnected, or the cluster fails to be deleted. | If the resources are deleted by mistake, use the correct configurations to create the default-network resources.                                  |
   +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+

Logs
----

.. table:: **Table 3** High-risk operations and solutions

   +------------------------------------------------------------------------------+--------------------------------+----------+
   | Operation                                                                    | Impact                         | Solution |
   +==============================================================================+================================+==========+
   | Deleting the **/tmp/ccs-log-collector/pos** directory on the host machine    | Logs are collected repeatedly. | None     |
   +------------------------------------------------------------------------------+--------------------------------+----------+
   | Deleting the **/tmp/ccs-log-collector/buffer** directory of the host machine | Logs are lost.                 | None     |
   +------------------------------------------------------------------------------+--------------------------------+----------+

EVS Disks
---------

.. table:: **Table 4** High-risk operations and solutions

   +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+
   | Operation                                      | Impact                                                                     | Solution                                                        | Remarks                                                                   |
   +================================================+============================================================================+=================================================================+===========================================================================+
   | Manually unmounting an EVS disk on the console | An I/O error is reported when the pod data is being written into the disk. | Delete the mount path from the node and schedule the pod again. | The file in the pod records the location where files are to be collected. |
   +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+
   | Unmounting the disk mount path on the node     | Pod data is written into a local disk.                                     | Remount the corresponding path to the pod.                      | The buffer contains log cache files to be consumed.                       |
   +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+
   | Operating EVS disks on the node                | Pod data is written into a local disk.                                     | None                                                            | None                                                                      |
   +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+