:original_name: cce_10_0054.html
.. _cce_10_0054:
High-Risk Operations and Solutions
==================================
During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks.
Clusters and Nodes
------------------
.. table:: **Table 1** High-risk operations and solutions
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Category | Operation | Impact | Solution |
+=================+=======================================================================================================+======================================================================================================================================================================================================================================================================================+===================================================================================================================================================+
| Master node | Modifying the security group of a node in a cluster | The master node may be unavailable. | Restore the security group by referring to the security group of the new cluster and allow traffic from the security group to pass through. |
| | | | |
| | | .. note:: | |
| | | | |
| | | Naming rule of a master node: *Cluster name*\ ``-``\ **cce-control**\ ``-``\ *Random number* | |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Letting the node expire or destroying the node | The master node will be unavailable. | This operation cannot be undone. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Reinstalling the OS | Components on the master node will be deleted. | This operation cannot be undone. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Upgrading components on the master or etcd node | The cluster may be unavailable. | Roll back to the original version. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Deleting or formatting core directory data such as **/etc/kubernetes** on the node | The master node will be unavailable. | This operation cannot be undone. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Changing the node IP address | The master node will be unavailable. | Change the IP address back to the original one. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Modifying parameters of core components (such as etcd, kube-apiserver, and docker) | The master node may be unavailable. | Restore the parameter settings to the recommended values. For details, see :ref:`Cluster Configuration Management `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Replacing the master or etcd certificate | The cluster may become unavailable. | This operation cannot be undone. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Worker node | Modifying the security group of a node in a cluster | The node may be unavailable. | Restore the security group by referring to :ref:`Creating a CCE Cluster ` and allow traffic from the security group to pass through. |
| | | | |
| | | .. note:: | |
| | | | |
| | | Naming rule of a worker node: *Cluster name*\ ``-``\ **cce-node**\ ``-``\ *Random number* | |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Deleting the node | The node will become unavailable. | This operation cannot be undone. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Reinstalling the OS | Node components are deleted, and the node becomes unavailable. | Reset the node. For details, see :ref:`Resetting a Node `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Upgrading the node kernel | The node may be unavailable or the network may be abnormal. | For details, see :ref:`Resetting a Node `. |
| | | | |
| | | .. note:: | |
| | | | |
| | | Node running depends on the system kernel version. Do not use the **yum update** command to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.) | |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Changing the node IP address | The node will become unavailable. | Change the IP address back to the original one. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Modifying parameters of core components (such as kubelet and kube-proxy) | The node may become unavailable, and components may be insecure if security-related configurations are modified. | Restore the parameter settings to the recommended values. For details, see :ref:`Configuring a Node Pool `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Modifying OS configuration | The node may be unavailable. | Restore the configuration items or reset the node. For details, see :ref:`Resetting a Node `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Deleting or modifying the **/opt/cloud/cce** and **/var/paas** directories, and delete the data disk. | The node will become unready. | You can reset the node. For details, see :ref:`Resetting a Node `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Modifying the node directory permission and the container directory permission | The permissions will be abnormal. | You are not advised to modify the permissions. Restore the permissions if they are modified. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Formatting or partitioning system disks, Docker disks, and kubelet disks on nodes. | The node may be unavailable. | You can reset the node. For details, see :ref:`Resetting a Node `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Installing other software on nodes | This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable. | Uninstall the software that has been installed and restore or reset the node. For details, see :ref:`Resetting a Node `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Modifying NetworkManager configurations | The node will become unavailable. | Reset the node. For details, see :ref:`Resetting a Node `. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| | Delete system images such as **cfe-pause** from the node. | Containers cannot be created and system images cannot be pulled. | Copy the image from another normal node for restoration. |
+-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
Networking and Load Balancing
-----------------------------
.. table:: **Table 2** High-risk operations and solutions
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Operation | Impact | How to Avoid/Fix |
+===================================================================================================================+============================================================================+===================================================================================================================================================+
| Changing the value of the kernel parameter **net.ipv4.ip_forward** to **0** | The network becomes inaccessible. | Change the value to **1**. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Changing the value of the kernel parameter **net.ipv4.tcp_tw_recycle** to **1** | The NAT service becomes abnormal. | Change the value to **0**. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Changing the value of the kernel parameter **net.ipv4.tcp_tw_reuse** to **1** | The network becomes abnormal. | Change the value to **0**. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block | The DNS in the cluster cannot work properly. | Restore the security group by referring to :ref:`Creating a CCE Cluster ` and allow traffic from the security group to pass through. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Creating a custom listener on the ELB console for the load balancer managed by CCE | The modified items are reset by CCE or the ingress is faulty. | Use the YAML file of the Service to automatically create a listener. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Binding a user-defined backend on the ELB console to the load balancer managed by CCE. | | Do not manually bind any backend. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Changing the ELB certificate on the ELB console for the load balancer managed by CCE. | | Use the YAML file of the ingress to automatically manage certificates. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Changing the listener name on the ELB console for the ELB listener managed by CCE. | | Do not change the name of the ELB listener managed by CCE. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Changing the description of load balancers, listeners, and forwarding policies managed by CCE on the ELB console. | | Do not modify the description of load balancers, listeners, or forwarding policies managed by CCE. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Delete CRD resources of network-attachment-definitions of default-network. | The container network is disconnected, or the cluster fails to be deleted. | If the resources are deleted by mistake, use the correct configurations to create the default-network resources. |
+-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
Logs
----
.. table:: **Table 3** High-risk operations and solutions
+------------------------------------------------------------------------------+--------------------------------+----------+
| Operation | Impact | Solution |
+==============================================================================+================================+==========+
| Deleting the **/tmp/ccs-log-collector/pos** directory on the host machine | Logs are collected repeatedly. | None |
+------------------------------------------------------------------------------+--------------------------------+----------+
| Deleting the **/tmp/ccs-log-collector/buffer** directory of the host machine | Logs are lost. | None |
+------------------------------------------------------------------------------+--------------------------------+----------+
EVS Disks
---------
.. table:: **Table 4** High-risk operations and solutions
+------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+
| Operation | Impact | Solution | Remarks |
+================================================+============================================================================+=================================================================+===========================================================================+
| Manually unmounting an EVS disk on the console | An I/O error is reported when the pod data is being written into the disk. | Delete the mount path from the node and schedule the pod again. | The file in the pod records the location where files are to be collected. |
+------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+
| Unmounting the disk mount path on the node | Pod data is written into a local disk. | Remount the corresponding path to the pod. | The buffer contains log cache files to be consumed. |
+------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+
| Operating EVS disks on the node | Pod data is written into a local disk. | None | None |
+------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+