After changing the cluster scale, adjust the add-on resource quotas based on the cluster scale to ensure that the add-on pods can run properly. For example, if you expand the cluster scale from 50 worker nodes to 200 worker nodes or more, increase the CPU and memory quotas of the add-on pods to avoid exceptions such as OOM caused by too many nodes required for scheduling the add-on pods.
Queries per Second (QPS) of the coredns add-on is positively correlated with the CPU consumption. If the number of nodes or containers in the cluster grows, the coredns pod will bear heavier workloads. Adjust the number of the coredns pods and their CPU and memory quotas based on the cluster scale.
Node |
Recommended Configuration |
Pod |
CPU Request |
CPU Limit |
Memory Request |
Memory Limit |
---|---|---|---|---|---|---|
50 |
2500 QPS |
2 |
500m |
500m |
512Mi |
512Mi |
200 |
5000 QPS |
2 |
1000m |
1000m |
1024Mi |
1024Mi |
1000 |
10,000 QPS |
2 |
2000m |
2000m |
2048Mi |
2048Mi |
2,000 |
20,000 QPS |
4 |
2000m |
2000m |
2048Mi |
2048Mi |
After the cluster scale is adjusted, the everest specifications need to be modified based on the cluster scale and the number of PVCs. The requested CPU and memory can be increased based on the number of nodes and PVCs. For details, see Table 2.
In non-typical scenarios, the formulas for estimating the limit values are as follows:
Configuration Scenario |
everest-csi-controller |
everest-csi-driver |
||||
---|---|---|---|---|---|---|
Nodes |
PVs/PVCs |
Add-on Instances |
vCPUs (Limit = Requested) |
Memory (Limit = Requested) |
vCPUs (Limit = Requested) |
Memory (Limit = Requested) |
50 |
1000 |
2 |
250m |
600 MiB |
300m |
300 MiB |
200 |
1000 |
2 |
250m |
1 GiB |
300m |
300 MiB |
1000 |
1000 |
2 |
350m |
2 GiB |
500m |
600 MiB |
1000 |
5000 |
2 |
450m |
3 GiB |
500m |
600 MiB |
2000 |
5000 |
2 |
550m |
4 GiB |
800m |
900 MiB |
2000 |
10000 |
2 |
650m |
5 GiB |
800m |
900 MiB |
autoscaler automatically adjusts the number of nodes in a cluster based on workloads. Adjust the number of add-on pods and their CPU and memory quotas based on the cluster scale.
Node |
Pod |
CPU Request |
CPU Limit |
Memory Request |
Memory Limit |
---|---|---|---|---|---|
50 |
2 |
1000m |
1000m |
1000Mi |
1000Mi |
200 |
2 |
4000m |
4000m |
2000Mi |
2000Mi |
1,000 |
2 |
8000m |
8000m |
8000Mi |
8000Mi |
2,000 |
2 |
8000m |
8000m |
8000Mi |
8000Mi |
After the cluster scale is increased, the resource quotas required by volcano need to be modified based on the cluster scale.
Formulas for calculating the requests:
For example, for 2000 nodes (20,000 pods), the product of the number of nodes multiplied by the number of pods is 40 million, which is close to 700/70,000 in the specification (Number of nodes x Number of pods = 49 million). Set the CPU request to 4000m and the limit to 5500m.
Memory request = Number of nodes/1000 x 2.4 GiB + Number of pods/10000 x 1 GiB
For example, for 2000 nodes and 20,000 pods, the memory request value is 6.8 GiB (2000/1000 x 2.4 GiB + 20000/10000 x 1 GiB).
Nodes/Pods in a Cluster |
Requested vCPUs (m) |
vCPU Limit (m) |
Requested Memory (MiB) |
Memory Limit (MiB) |
---|---|---|---|---|
50/5000 |
500 |
2000 |
500 |
2000 |
100/10,000 |
1000 |
2500 |
1500 |
2500 |
200/20,000 |
1500 |
3000 |
2500 |
3500 |
300/30,000 |
2000 |
3500 |
3500 |
4500 |
400/40,000 |
2500 |
4000 |
4500 |
5500 |
500/50,000 |
3000 |
4500 |
5500 |
6500 |
600/60,000 |
3500 |
5000 |
6500 |
7500 |
700/70,000 |
4000 |
5500 |
7500 |
8500 |
Resource quotas of other add-ons may also be insufficient due to cluster scale expansion. If, for example, the CPU or memory usage of the add-on pods increases and even OOM occurs, modify the resource quotas as required.
For example, the resources occupied by the kube-prometheus-stack add-ons are related to the number of pods in the cluster. If the cluster scale is expanded, the number of pods may also grow. In this case, increase the resource quotas of the prometheus pods.