diff --git a/umn/source/add-ons/dolphin.rst b/umn/source/add-ons/dolphin.rst deleted file mode 100644 index 090746a..0000000 --- a/umn/source/add-ons/dolphin.rst +++ /dev/null @@ -1,189 +0,0 @@ -:original_name: cce_10_0371.html - -.. _cce_10_0371: - -dolphin -======= - -Introduction ------------- - -dolphin is an add-on for monitoring and managing container network traffic. dolphin of the current version can collect traffic statistics of Kata containers in CCE Turbo clusters and common containers of containerd. - -This add-on collects how many IPv4 packets and bytes are received and sent (including those sent to the public network). PodSelectors can be used to select monitoring backends to support multiple monitoring tasks and optional monitoring metrics. You can also obtain label information of pods. The monitoring information has been adapted to the Prometheus format. You can call the Prometheus API to view monitoring data. - -Constraints ------------ - -- This add-on can be installed only in CCE Turbo clusters of version 1.19 or later and cannot be installed on Arm nodes. -- The add-on instances can be deployed on nodes whose container engine is containerd or Docker and OS is EulerOS. In containerd nodes, it can trace pod updates in real time. In Docker nodes, it can query pod updates in polling mode. -- Only traffic statistics of secure containers (Kata as the container runtime) and common containers (runC as the container runtime) in a CCE Turbo cluster can be collected. -- After the add-on is installed, traffic is not monitored by default. You need to create a CR to configure a monitoring task for traffic monitoring. -- Ensure that there are sufficient resources on a node for installing the add-on. -- The source of monitoring labels and user labels must be already available before a pod is created. - -Installing the Add-on ---------------------- - -#. Log in to the CCE console. In the navigation pane, choose **Add-ons**. On the **Add-ons** page, click **Install** under **dolphin**. -#. On the **Install Add-on** page, select a cluster in the **Basic Information** step. - -Delivering a Monitoring Task ----------------------------- - -You can deliver a monitoring task by creating a CR. Currently, a CR can be created by calling an API or using the **kubectl apply** command after logging in to a worker node. In later versions, a CR can be created on the console. A CR represents a monitoring task and provides optional parameters such as **selector**, **podLable**, and **ip4Tx**. For details, see the CR creation template below. - -.. code-block:: - - apiVersion: crd.dolphin.io/v1 - kind: MonitorPolicy - metadata: - name: example-task # Monitoring task name. - namespace: kube-system # The value must be kube-system. This field is mandatory. - spec: - selector: # (Optional) Backend monitored by the dolphin add-on, for example, labelSelector. By default, all containers on the node are monitored. - matchLabels: - app: nginx - matchExpressions: - - key: app - operator: In - values: - - nginx - podLable: [app] # Pod label. This field is optional. - ip4Tx: # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4 bytes. This function is disabled by default. - enable: true - ip4Rx: # (Optional) Indicates whether to collect statistics about the number of received IPv4 packets and the number of received IPv4 bytes. This function is disabled by default. - enable: true - ip4TxInternet: # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4 bytes. This function is disabled by default. - enable: true - -**PodLable**: You can enter the labels of multiple pods and separate them with commas (,), for example, [app, version]. - -Labels must comply with the following rules. The corresponding regular expression is (^[a-zA-Z_]$)|(^([a-zA-Z][a-zA-Z0-9_]|_[a-zA-Z0-9])([a-zA-Z0-9_]){0,254}$). - -- A maximum of five labels can be entered. Each label contains a maximum of 256 characters. -- The value cannot start with a digit or double underscores (_). -- The format of a single label must comply with A-Za-z_0-9. - -Example 1 - -.. code-block:: - - apiVersion: crd.dolphin.io/v1 - kind: MonitorPolicy - metadata: - name: example-task - namespace: kube-system - spec: - podLable: [app] - ip4Tx: - enable: true - -In the preceding example, the monitoring task name is **example-task**, which monitors all pods on a node and generates the number of sent IPv4 packets and the number of sent bytes. If the monitored container contains the **app** label, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is **not found**. - -Example 2 - -.. code-block:: - - apiVersion: crd.dolphin.io/v1 - kind: MonitorPolicy - metadata: - name: example-task - namespace: kube-system - spec: - selector: - matchLabels: - app: nginx - podLable: [test, app] - ip4Tx: - enable: true - ip4Rx: - enable: true - ip4TxInternet: - enable: true - -In the preceding example, the monitoring task name is **example-task**, which monitors all pods that meet the labelselector with app=nginx on a node and generates the fix metrics. If the monitored container contains **test** and **app** labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is **not found**. - -You can create, modify, and delete monitoring tasks in the preceding format. Currently, a maximum of 10 monitoring tasks can be created. When multiple monitoring tasks match the same monitoring backend, each monitoring backend generates the monitoring metric specific to the number of monitoring tasks. - -.. note:: - - - If you modify or delete a monitoring task, monitoring data collected by the monitoring task will be lost. Therefore, exercise caution when performing this operation. - - After the add-on is uninstalled, the CR of the monitoring task is removed together with the add-on. - -Checking Traffic Statistics ---------------------------- - -The monitoring data collected by this add-on is exported in Prometheus exporter format, which can be obtained in the following ways: - -- Install the prometheus add-on, which automatically interconnects with the dolphin add-on and periodically collects monitoring information. -- Directly access service port 10001 provided by the dolphin add-on, for example, http://{POD_IP}:10001/metrics. - -Note that if you access the dolphin service port on a node, you need to allow access from the security group of the node and pod. - -.. table:: **Table 1** Supported monitoring metrics - - ================================================= ====================== - Metric Parameter - ================================================= ====================== - Number of IPv4 packets sent to the public network ip4_send_pkt_internet - Number of IPv4 bytes sent to the public network ip4_send_byte_internet - Number of received IPv4 packets ip4_rcv_pkt - Number of received IPv4 bytes ip4_rcv_byte - Number of sent IPv4 packets ip4_send_pkt - Number of sent IPv4 bytes ip4_send_byte - ================================================= ====================== - -- Example 1 (number of IPv4 packets sent to the public network): - - .. code-block:: - - dolphin_ip4_send_pkt_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 241 - - In the preceding example, the namespace of the pod is **default**, the pod name is **nginx-66c9c65dbf-zjg24**, the label is **app**, and the value is **nginx**. This metric is created by monitoring task **example-task**, and the number of IPv4 packets sent by the pod to the public network is **241**. - -- Example 2 (number of IPv4 bytes sent to the public network): - - .. code-block:: - - dolphin_ip4_send_byte_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task" } 23618 - - In the preceding example, the namespace of the pod is **default**, the pod name is **nginx-66c9c65dbf-zjg24**, the label is **app**, and the value is **nginx**. This metric is created by monitoring task **example-task**, and the number of IPv4 bytes sent by the pod to the public network is **23618**. - -- Example 3 (number of sent IPv4 packets): - - .. code-block:: - - dolphin_ip4_send_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 379 - - In the preceding example, the namespace of the pod is **default**, the pod name is **nginx-66c9c65dbf-zjg24**, the label is **app**, and the value is **nginx**. This metric is created by monitoring task **example-task**, and the number of IPv4 packets sent by the pod is **379**. - -- Example 4 (number of sent IPv4 bytes): - - .. code-block:: - - dolphin_ip4_send_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 33129 - - In the preceding example, the namespace of the pod is **default**, the pod name is **nginx-66c9c65dbf-zjg24**, the label is **app**, and the value is **nginx**. This metric is created by monitoring task **example-task**, and the number of IPv4 bytes sent by the pod is **33129**. - -- Example 5 (number of received IPv4 packets): - - .. code-block:: - - dolphin_ip4_rcv_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 464 - - In the preceding example, the namespace of the pod is **default**, the pod name is **nginx-66c9c65dbf-zjg24**, the label is **app**, and the value is **nginx**. This metric is created by monitoring task **example-task**, and the number of IPv4 packets received by the pod is **464**. - -- Example 6 (number of received IPv4 bytes): - - .. code-block:: - - dolphin_ip4_rcv_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 34654 - - In the preceding example, the namespace of the pod is **default**, the pod name is **nginx-66c9c65dbf-zjg24**, the label is **app**, and the value is **nginx**. This metric is created by monitoring task **example-task**, and the number of IPv4 bytes received by the pod is **34654**. - -.. note:: - - If the container does not contain the specified label, the label value in the response body is **not found**. The format is as follows: - - dolphin_ip4_send_byte_internet{test="not found", pod="default/nginx-66c9c65dbf-zjg24",task="default" } 23618 diff --git a/umn/source/add-ons/index.rst b/umn/source/add-ons/index.rst index b618d3c..e622c8b 100644 --- a/umn/source/add-ons/index.rst +++ b/umn/source/add-ons/index.rst @@ -14,7 +14,6 @@ Add-ons - :ref:`metrics-server ` - :ref:`gpu-beta ` - :ref:`volcano ` -- :ref:`dolphin ` .. toctree:: :maxdepth: 1 @@ -29,4 +28,3 @@ Add-ons metrics-server gpu-beta volcano - dolphin diff --git a/umn/source/add-ons/overview.rst b/umn/source/add-ons/overview.rst index 725c230..cdb5d04 100644 --- a/umn/source/add-ons/overview.rst +++ b/umn/source/add-ons/overview.rst @@ -9,30 +9,26 @@ CCE provides multiple types of add-ons to extend cluster functions and meet feat .. important:: - CCE uses Helm templates to deploy add-ons. To modify or upgrade an add-on, perform operations on the **Add-ons** page or use open APIs. Exceptions may occur if you modify add-on resources in the background. + CCE uses Helm templates to deploy add-ons. To modify or upgrade an add-on, perform operations on the **Add-ons** page or use open APIs. Do not directly modify resources related to add-ons in the background. Otherwise, add-on exceptions or other unexpected problems may occur. .. table:: **Table 1** Add-on list - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Add-on Name | Introduction | - +=========================================================================+=====================================================================================================================================================================================================================================================================================================================================================================================================================================+ - | :ref:`coredns (System Resource Add-On, Mandatory) ` | The coredns add-on is a DNS server that provides domain name resolution services for Kubernetes clusters. coredns chains plug-ins to provide additional features. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`storage-driver (System Resource Add-On, Discarded) ` | storage-driver is a FlexVolume driver used to support IaaS storage services such as EVS, SFS, and OBS. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`everest (System Resource Add-On, Mandatory) ` | Everest is a cloud native container storage system. Based on the Container Storage Interface (CSI), clusters of Kubernetes v1.15.6 or later obtain access to cloud storage services. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`npd ` | node-problem-detector (npd for short) is an add-on that monitors abnormal events of cluster nodes and connects to a third-party monitoring platform. It is a daemon running on each node. It collects node issues from different daemons and reports them to the API server. The npd add-on can run as a DaemonSet or a daemon. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`autoscaler ` | The autoscaler add-on resizes a cluster based on pod scheduling status and resource usage. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`metrics-server ` | metrics-server is an aggregator for monitoring data of core cluster resources. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`gpu-beta ` | gpu-beta is a device management add-on that supports GPUs in containers. It supports only NVIDIA drivers. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`volcano ` | Volcano provides general-purpose, high-performance computing capabilities, such as job scheduling, heterogeneous chip management, and job running management, serving end users through computing frameworks for different industries, such as AI, big data, gene sequencing, and rendering. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | :ref:`dolphin ` | dolphin is a pod network monitoring add-on, which, in the current version, can be used to collect statistics about public network traffic of Kata containers in CCE Turbo clusters and common containers that use containerd as the runtime. | - | | | - | | This add-on collects how many IPv4 packets and bytes are received and sent (including those sent to the public network). PodSelectors can be used to select monitoring backends to support multiple monitoring tasks and optional monitoring metrics. You can also obtain label information of pods. The monitoring information has been adapted to the Prometheus format. You can call the Prometheus API to view monitoring data. | - +-------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Add-on Name | Introduction | + +=========================================================================+=================================================================================================================================================================================================================================================================================================================================+ + | :ref:`coredns (System Resource Add-On, Mandatory) ` | The coredns add-on is a DNS server that provides domain name resolution services for Kubernetes clusters. coredns chains plug-ins to provide additional features. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`storage-driver (System Resource Add-On, Discarded) ` | storage-driver is a FlexVolume driver used to support IaaS storage services such as EVS, SFS, and OBS. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`everest (System Resource Add-On, Mandatory) ` | Everest is a cloud native container storage system. Based on the Container Storage Interface (CSI), clusters of Kubernetes v1.15.6 or later obtain access to cloud storage services. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`npd ` | node-problem-detector (npd for short) is an add-on that monitors abnormal events of cluster nodes and connects to a third-party monitoring platform. It is a daemon running on each node. It collects node issues from different daemons and reports them to the API server. The npd add-on can run as a DaemonSet or a daemon. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`autoscaler ` | The autoscaler add-on resizes a cluster based on pod scheduling status and resource usage. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`metrics-server ` | metrics-server is an aggregator for monitoring data of core cluster resources. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`gpu-beta ` | gpu-beta is a device management add-on that supports GPUs in containers. It supports only NVIDIA drivers. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`volcano ` | Volcano provides general-purpose, high-performance computing capabilities, such as job scheduling, heterogeneous chip management, and job running management, serving end users through computing frameworks for different industries, such as AI, big data, gene sequencing, and rendering. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/umn/source/best_practice/permission/configuring_kubeconfig_for_fine-grained_management_on_cluster_resources.rst b/umn/source/best_practice/permission/configuring_kubeconfig_for_fine-grained_management_on_cluster_resources.rst index 205aab5..875f596 100644 --- a/umn/source/best_practice/permission/configuring_kubeconfig_for_fine-grained_management_on_cluster_resources.rst +++ b/umn/source/best_practice/permission/configuring_kubeconfig_for_fine-grained_management_on_cluster_resources.rst @@ -29,13 +29,19 @@ Configuration Method #. Set the service account name to **my-sa** and namespace to **test**. - kubectl create sa **my-sa** -n **test** + .. code-block:: + + kubectl create sa my-sa -n test |image1| #. Configure the role table and assign operation permissions to different resources. - vi **role-test.yaml** + .. code-block:: + + vi role-test.yaml + + The content is as follows: .. code-block:: @@ -68,13 +74,21 @@ Configuration Method - watch - create - kubectl create -f **role-test.yaml** + Create a Role. + + .. code-block:: + + kubectl create -f role-test.yaml |image2| #. Create a RoleBinding and bind the service account to the role so that the user can obtain the corresponding permissions. - vi **myrolebinding.yaml** + .. code-block:: + + vi myrolebinding.yaml + + The content is as follows: .. code-block:: @@ -92,7 +106,11 @@ Configuration Method name: my-sa namespace: test - kubectl create -f **myrolebinding.yaml** + Create a RoleBinding. + + .. code-block:: + + kubectl create -f myrolebinding.yaml |image3| @@ -104,23 +122,31 @@ Configuration Method a. Use the sa name **my-sa** to obtain the secret corresponding to the sa. In the following example, **my-sa-token-z4967** in the first column is the secret name. - kubectl get secret -n **test** \|grep **my-sa** + .. code-block:: + + kubectl get secret -n test |grep my-sa |image4| b. Decrypt the **ca.crt** file in the secret and export it. - kubectl get secret **my-sa-token-5gpl4** -n **test** -oyaml \|grep ca.crt:|awk '{print $2}' \|base64 -d > /home/ca.crt + .. code-block:: + + kubectl get secret my-sa-token-5gpl4 -n test -oyaml |grep ca.crt: | awk '{print $2}' |base64 -d > /home/ca.crt c. Set the cluster access mode. **test-arm** indicates the cluster to be accessed, **10.0.1.100** indicates the IP address of the API server in the cluster and **/home/test.config** indicates the path for storing the configuration file. - If the internal API server address is used, run the following command: - kubectl config set-cluster **test-arm** --server=https://**10.0.1.100**:5443 --certificate-authority=/home/ca.crt --embed-certs=true --kubeconfig=\ **/home/test.config** + .. code-block:: + + kubectl config set-cluster test-arm --server=https://10.0.1.100:5443 --certificate-authority=/home/ca.crt --embed-certs=true --kubeconfig=/home/test.config - If the public API server address is used, run the following command: - kubectl config set-cluster **test-arm** --server=https://**10.0.1.100**:5443 --kubeconfig=\ **/home/test.config** --insecure-skip-tls-verify=true + .. code-block:: + + kubectl config set-cluster test-arm --server=https://10.0.1.100:5443 --kubeconfig=/home/test.config --insecure-skip-tls-verify=true |image5| @@ -134,11 +160,15 @@ Configuration Method a. Obtain the cluster token. (If the token is obtained in GET mode, you need to run **based64 -d** to decode the token.) - token=$(kubectl describe secret **my-sa-token-5gpl4** -n **test** \| awk '/token:/{print $2}') + .. code-block:: + + token=$(kubectl describe secret my-sa-token-5gpl4 -n test | awk '/token:/{print $2}') b. Set the cluster user **ui-admin**. - kubectl config set-credentials **ui-admin** --token=$token --kubeconfig=\ **/home/test.config** + .. code-block:: + + kubectl config set-credentials ui-admin --token=$token --kubeconfig=/home/test.config |image6| @@ -146,7 +176,9 @@ Configuration Method Configure the context information for cluster authentication. **ui-admin@test** is the context name. - kubectl config set-context **ui-admin@test** --cluster=\ **test-arm** --user=\ **ui-admin** --kubeconfig=\ **/home/test.config** + .. code-block:: + + kubectl config set-context ui-admin@test --cluster=test-arm --user=ui-admin --kubeconfig=/home/test.config |image7| @@ -154,7 +186,9 @@ Configuration Method Set the context. For details about how to use the context, see :ref:`Permissions Verification `. - kubectl config use-context **ui-admin@test** --kubeconfig=\ **/home/test.config** + .. code-block:: + + kubectl config use-context ui-admin@test --kubeconfig=/home/test.config |image8| @@ -169,7 +203,9 @@ Permissions Verification #. Pods in the **test** namespace cannot access pods in other namespaces. - kubectl get pod -n **test** --kubeconfig=\ **/home/test.config** + .. code-block:: + + kubectl get pod -n test --kubeconfig=/home/test.config |image9| diff --git a/umn/source/change_history.rst b/umn/source/change_history.rst index ac53345..b09ab6f 100644 --- a/umn/source/change_history.rst +++ b/umn/source/change_history.rst @@ -13,7 +13,6 @@ Change History | 2023-05-30 | - Added\ :ref:`Configuring a Node Pool `. | | | - Added\ :ref:`Configuring Health Check for Multiple Ports `. | | | - Added\ :ref:`NetworkAttachmentDefinition `. | - | | - Added\ :ref:`dolphin `. | | | - Updated\ :ref:`Creating a Node `. | | | - Updated\ :ref:`Creating a Node Pool `. | | | - Updated\ :ref:`OS Patch Notes for Cluster Nodes `. | diff --git a/umn/source/conf.py b/umn/source/conf.py index b502764..cba9a5d 100644 --- a/umn/source/conf.py +++ b/umn/source/conf.py @@ -18,7 +18,7 @@ import os import sys extensions = [ - 'otcdocstheme' + 'otcdocstheme', ] otcdocs_auto_name = False diff --git a/umn/source/networking/network_policies.rst b/umn/source/networking/network_policies.rst index bbccf11..5ad3576 100644 --- a/umn/source/networking/network_policies.rst +++ b/umn/source/networking/network_policies.rst @@ -24,7 +24,7 @@ Notes and Constraints - Network isolation is not supported for IPv6 addresses. -- Network policies do not support egress rules except for clusters of v1.23 or later. +- Network policies only allow clusters of v1.23 or later to set ingress and egress rules. Egress rules cannot be set on clusters of other versions. Egress rules are supported only in the following operating systems: diff --git a/umn/source/nodes/node_fault_detection_policy.rst b/umn/source/nodes/node_fault_detection_policy.rst index 9f65a99..bf36adc 100644 --- a/umn/source/nodes/node_fault_detection_policy.rst +++ b/umn/source/nodes/node_fault_detection_policy.rst @@ -5,7 +5,7 @@ Node Fault Detection Policy =========================== -The node fault check function depends on :ref:`node-problem-detector (npd) `. npd is a cluster node monitoring add-on. The add-on instances run on each node. This section describe how to enable node fault detection. +The node fault detection function depends on the :ref:`node-problem-detector (npd) ` add-on. The add-on instances run on nodes and monitor nodes. This section describes how to enable node fault detection. Prerequisites ------------- @@ -15,7 +15,7 @@ The :ref:`npd ` add-on has been installed in the cluster. Enabling Node Fault Detection ----------------------------- -#. Log in to the CCE console and access the cluster console. +#. Log in to the CCE console and click the cluster name to access the cluster console. #. In the navigation pane on the left, choose **Nodes**. Check whether the npd add-on has been installed in the cluster or whether the add-on has been upgraded to the latest version. After the npd add-on has been installed, you can use the fault detection function. @@ -34,7 +34,7 @@ Enabling Node Fault Detection Customized Check Items ---------------------- -#. Log in to the CCE console and access the cluster console. +#. Log in to the CCE console and click the cluster name to access the cluster console. #. Choose Node Management on the left and click **Node Fault Detection Policy**. @@ -65,7 +65,7 @@ Customized Check Items +==========================+======================================================================================================================================================================================================+ | Prompting Exception | Reports the Kuberentes events. | +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | Disabling scheduling | Reports the Kuberentes events and adds the NoSchedule taint to the node. | + | Disabling scheduling | Reports the Kuberentes events and adds the **NoSchedule** taint to the node. | +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Evict Node Load | Reports the Kuberentes events and adds the **NoExecute** taint to the node. This operation will evict workloads on the node and interrupt services. Exercise caution when performing this operation. | +--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+