:original_name: cce_10_0054.html .. _cce_10_0054: High-Risk Operations and Solutions ================================== During service deployment or running, you may trigger high-risk operations at different levels, causing service faults or interruption. To help you better estimate and avoid operation risks, this section introduces the consequences and solutions of high-risk operations from multiple dimensions, such as clusters, nodes, networking, load balancing, logs, and EVS disks. Clusters and Nodes ------------------ .. table:: **Table 1** High-risk operations and solutions +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Category | Operation | Impact | Solution | +=================+=======================================================================================================+======================================================================================================================================================================================================================================================================================+===================================================================================================================================================+ | Master node | Modifying the security group of a node in a cluster | The master node may be unavailable. | Restore the security group by referring to the security group of the new cluster and allow traffic from the security group to pass through. | | | | | | | | | .. note:: | | | | | | | | | | Naming rule of a master node: *Cluster name*\ ``-``\ **cce-control**\ ``-``\ *Random number* | | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Letting the node expire or destroying the node | The master node will be unavailable. | This operation cannot be undone. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Reinstalling the OS | Components on the master node will be deleted. | This operation cannot be undone. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Upgrading components on the master or etcd node | The cluster may be unavailable. | Roll back to the original version. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Deleting or formatting core directory data such as **/etc/kubernetes** on the node | The master node will be unavailable. | This operation cannot be undone. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Changing the node IP address | The master node will be unavailable. | Change the IP address back to the original one. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Modifying parameters of core components (such as etcd, kube-apiserver, and docker) | The master node may be unavailable. | Restore the parameter settings to the recommended values. For details, see :ref:`Cluster Configuration Management `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Replacing the master or etcd certificate | The cluster may become unavailable. | This operation cannot be undone. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Worker node | Modifying the security group of a node in a cluster | The node may be unavailable. | Restore the security group by referring to :ref:`Creating a CCE Cluster ` and allow traffic from the security group to pass through. | | | | | | | | | .. note:: | | | | | | | | | | Naming rule of a worker node: *Cluster name*\ ``-``\ **cce-node**\ ``-``\ *Random number* | | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Deleting the node | The node will become unavailable. | This operation cannot be undone. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Reinstalling the OS | Node components are deleted, and the node becomes unavailable. | Reset the node. For details, see :ref:`Resetting a Node `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Upgrading the node kernel | The node may be unavailable or the network may be abnormal. | For details, see :ref:`Resetting a Node `. | | | | | | | | | .. note:: | | | | | | | | | | Node running depends on the system kernel version. Do not use the **yum update** command to update or reinstall the operating system kernel of a node unless necessary. (Reinstalling the operating system kernel using the original image or other images is a risky operation.) | | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Changing the node IP address | The node will become unavailable. | Change the IP address back to the original one. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Modifying parameters of core components (such as kubelet and kube-proxy) | The node may become unavailable, and components may be insecure if security-related configurations are modified. | Restore the parameter settings to the recommended values. For details, see :ref:`Configuring a Node Pool `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Modifying OS configuration | The node may be unavailable. | Restore the configuration items or reset the node. For details, see :ref:`Resetting a Node `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Deleting or modifying the **/opt/cloud/cce** and **/var/paas** directories, and delete the data disk. | The node will become unready. | You can reset the node. For details, see :ref:`Resetting a Node `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Modifying the node directory permission and the container directory permission | The permissions will be abnormal. | You are not advised to modify the permissions. Restore the permissions if they are modified. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Formatting or partitioning system disks, Docker disks, and kubelet disks on nodes. | The node may be unavailable. | You can reset the node. For details, see :ref:`Resetting a Node `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Installing other software on nodes | This may cause exceptions on Kubernetes components installed on the node, and make the node unavailable. | Uninstall the software that has been installed and restore or reset the node. For details, see :ref:`Resetting a Node `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Modifying NetworkManager configurations | The node will become unavailable. | Reset the node. For details, see :ref:`Resetting a Node `. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | | Delete system images such as **cfe-pause** from the node. | Containers cannot be created and system images cannot be pulled. | Copy the image from another normal node for restoration. | +-----------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ Networking and Load Balancing ----------------------------- .. table:: **Table 2** High-risk operations and solutions +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Operation | Impact | How to Avoid/Fix | +===================================================================================================================+============================================================================+===================================================================================================================================================+ | Changing the value of the kernel parameter **net.ipv4.ip_forward** to **0** | The network becomes inaccessible. | Change the value to **1**. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Changing the value of the kernel parameter **net.ipv4.tcp_tw_recycle** to **1** | The NAT service becomes abnormal. | Change the value to **0**. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Changing the value of the kernel parameter **net.ipv4.tcp_tw_reuse** to **1** | The network becomes abnormal. | Change the value to **0**. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Not configuring the node security group to allow UDP packets to pass through port 53 of the container CIDR block | The DNS in the cluster cannot work properly. | Restore the security group by referring to :ref:`Creating a CCE Cluster ` and allow traffic from the security group to pass through. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Creating a custom listener on the ELB console for the load balancer managed by CCE | The modified items are reset by CCE or the ingress is faulty. | Use the YAML file of the Service to automatically create a listener. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Binding a user-defined backend on the ELB console to the load balancer managed by CCE. | | Do not manually bind any backend. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Changing the ELB certificate on the ELB console for the load balancer managed by CCE. | | Use the YAML file of the ingress to automatically manage certificates. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Changing the listener name on the ELB console for the ELB listener managed by CCE. | | Do not change the name of the ELB listener managed by CCE. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Changing the description of load balancers, listeners, and forwarding policies managed by CCE on the ELB console. | | Do not modify the description of load balancers, listeners, or forwarding policies managed by CCE. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Delete CRD resources of network-attachment-definitions of default-network. | The container network is disconnected, or the cluster fails to be deleted. | If the resources are deleted by mistake, use the correct configurations to create the default-network resources. | +-------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+ Logs ---- .. table:: **Table 3** High-risk operations and solutions +------------------------------------------------------------------------------+--------------------------------+----------+ | Operation | Impact | Solution | +==============================================================================+================================+==========+ | Deleting the **/tmp/ccs-log-collector/pos** directory on the host machine | Logs are collected repeatedly. | None | +------------------------------------------------------------------------------+--------------------------------+----------+ | Deleting the **/tmp/ccs-log-collector/buffer** directory of the host machine | Logs are lost. | None | +------------------------------------------------------------------------------+--------------------------------+----------+ EVS Disks --------- .. table:: **Table 4** High-risk operations and solutions +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+ | Operation | Impact | Solution | Remarks | +================================================+============================================================================+=================================================================+===========================================================================+ | Manually unmounting an EVS disk on the console | An I/O error is reported when the pod data is being written into the disk. | Delete the mount path from the node and schedule the pod again. | The file in the pod records the location where files are to be collected. | +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+ | Unmounting the disk mount path on the node | Pod data is written into a local disk. | Remount the corresponding path to the pod. | The buffer contains log cache files to be consumed. | +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+ | Operating EVS disks on the node | Pod data is written into a local disk. | None | None | +------------------------------------------------+----------------------------------------------------------------------------+-----------------------------------------------------------------+---------------------------------------------------------------------------+