CCE Node Problem Detector (node-problem-detector, NPD) is an add-on that monitors abnormal events of cluster nodes and connects to a third-party monitoring platform. It is a daemon running on each node. It collects node issues from different daemons and reports them to the API server. It can run as a DaemonSet or a daemon.
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
basic |
No |
object |
Basic configuration parameters, which do not need to be specified |
flavor |
Yes |
Table 2 object |
Flavor parameters |
custom |
Yes |
Table 3 object |
Custom parameters |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
description |
No |
String |
Add-on description |
name |
Yes |
String |
Add-on specification name. The value is fixed at Single-instance. |
replicas |
Yes |
String |
Number of pods. The default value is 1. |
resources |
Yes |
resources object |
Container resource (CPU and memory) quotas |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
feature_gate |
No |
String |
Feature gate, which is used to enable the beta features |
multiAZBalance |
No |
Bool |
Multi AZ deployment |
multiAZEnabled |
No |
Bool |
Whether to deploy the add-on pods in multiple AZs. The default value is false. If this parameter is set to true, cross-AZ deployment is forcibly performed. If this parameter is set to false, cross-AZ deployment is preferred. |
npc |
Yes |
object Table 5 |
node-problem-controller configuration |
tolerations |
No |
List<Object> Table 7 |
Tolerations of the add-on |
node_match_expressions |
No |
List<Object> Table 7 |
Node affinity configuration of the add-on |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
limitsCpu |
Yes |
String |
CPU size limit (unit: m) |
limitsMem |
Yes |
String |
Memory size limit (unit: Mi) |
name |
Yes |
String |
Add-on name. The value is fixed at custom-resources. |
requestsCpu |
Yes |
String |
Requested CPU size (unit: m) |
requestsMem |
Yes |
String |
Requested memory size (unit: Mi) |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
maxTaintedNode |
Yes |
String or Int |
The maximum number of nodes that NPC can add taints to when a single fault occurs on multiple nodes for minimizing impact. The value can be in int or percentage format. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
key |
No |
String |
Taint key |
effect |
No |
String |
Taint policy |
operator |
No |
String |
Operator |
tolerationSeconds |
No |
Int |
Toleration time window |
{ "kind": "Addon", "apiVersion": "v3", "metadata": { "annotations": { "addon.install/type": "install" } }, "spec": { "clusterID": "b78fb690-b82c-11ee-83cf-0255ac100b0f", "version": "1.18.48", "addonTemplateName": "npd", "values": { "basic": { "image_version": "1.18.48", "swr_addr": "***", "swr_user": "***", "rbac_enabled": true, "cluster_version": "v1.23" }, "flavor": { "description": "custom resources", "name": "custom-resources", "replicas": 2, "resources": [ { "limitsCpu": "100m", "limitsMem": "300Mi", "name": "node-problem-controller", "requestsCpu": "30m", "requestsMem": "100Mi" }, { "limitsCpu": "100m", "limitsMem": "300Mi", "name": "node-problem-detector", "requestsCpu": "30m", "requestsMem": "100Mi" } ], "category": [ "CCE", "Turbo" ] }, "custom": { "annotations": {}, "common": {}, "feature_gates": "", "multiAZBalance": false, "multiAZEnabled": false, "node_match_expressions": [], "npc": { "maxTaintedNode": "10%" }, "tolerations": [ { "key": "node.kubernetes.io/not-ready", "operator": "Exists", "effect": "NoExecute", "tolerationSeconds": 60 }, { "key": "node.kubernetes.io/unreachable", "operator": "Exists", "effect": "NoExecute", "tolerationSeconds": 60 } ] } } } }