doc-exports/docs/cce/umn/cce_faq_00209.html
Dong, Qiu Jian 86fb05065f CCE UMN for 24.2.0 version -20240428
Reviewed-by: Eotvos, Oliver <oliver.eotvos@t-systems.com>
Co-authored-by: Dong, Qiu Jian <qiujiandong1@huawei.com>
Co-committed-by: Dong, Qiu Jian <qiujiandong1@huawei.com>
2024-06-10 08:19:07 +00:00

110 lines
20 KiB
HTML

<a name="cce_faq_00209"></a><a name="cce_faq_00209"></a>
<h1 class="topictitle1">What Should I Do If a Pod Fails to Be Evicted?</h1>
<div id="body1570879278266"><div class="section" id="cce_faq_00209__section1752181225011"><h4 class="sectiontitle">Principle of Eviction</h4><p id="cce_faq_00209__p12520182365019">When a node is abnormal, Kubernetes will evict pods on the node to ensure workload availability.</p>
<p id="cce_faq_00209__p3106163118227">In Kubernetes, both kube-controller-manager and kubelet can evict pods.</p>
<ul id="cce_faq_00209__ul1691275455017"><li id="cce_faq_00209__li16326151759"><strong id="cce_faq_00209__b522101312445">Eviction implemented by kube-controller-manager</strong><p id="cce_faq_00209__p1351151910510">kube-controller-manager consists of multiple controllers, and eviction is implemented by node controller. node controller periodically checks the status of all nodes. If a node is in the <strong id="cce_faq_00209__b27657912467">NotReady</strong> state for a period of time, all pods on the node will be evicted.</p>
<p id="cce_faq_00209__p1273514211758">kube-controller-manager supports the following startup parameters:</p>
<ul id="cce_faq_00209__ul163137267515"><li id="cce_faq_00209__li79139545507"><strong id="cce_faq_00209__b1858115472511">pod-eviction-timeout</strong>: indicates an interval when a node is down, after which pods on that node are evicted. The default interval is 5 minutes.</li><li id="cce_faq_00209__li15913175495017"><strong id="cce_faq_00209__b1323015918546">node-eviction-rate</strong>: indicates the number of nodes to be evicted per second. The default value is <strong id="cce_faq_00209__b20261459545">0.1</strong>, indicating that pods are evicted from one node every 10 seconds.</li><li id="cce_faq_00209__li169131354125015"><strong id="cce_faq_00209__b16691626103814">secondary-node-eviction-rate</strong>: specifies a rate at which nodes are evicted in the second grade. If a large number of nodes are down in the cluster, the eviction rate will be reduced to <strong id="cce_faq_00209__b56851721055">secondary-node-eviction-rate</strong>. The default value is <strong id="cce_faq_00209__b16650105712315">0.01</strong>.</li><li id="cce_faq_00209__li59133541507"><strong id="cce_faq_00209__b152251526557">unhealthy-zone-threshold</strong>: specifies a threshold for an AZ to be considered unhealthy. The default value is <strong id="cce_faq_00209__b3454161567">0.55</strong>, meaning that if the percentage of faulty nodes in an AZ exceeds 55%, the AZ will be considered unhealthy.</li><li id="cce_faq_00209__li1591317541508"><strong id="cce_faq_00209__b331617397611">large-cluster-size-threshold</strong>: specifies a threshold for a cluster to be considered large. The parameter defaults to <strong id="cce_faq_00209__b1327471311712">50</strong>. If there are more nodes than this threshold, the cluster is considered as a large one. If there are more than 55% faulty nodes in a cluster, the eviction rate is reduced to 0.01. If the cluster is a small one, the eviction rate is reduced to 0, which means, pods running on the nodes in the cluster will not be evicted.</li></ul>
</li><li id="cce_faq_00209__li032181517512"><strong id="cce_faq_00209__b13331173545918">Eviction implemented by kubelet</strong><p id="cce_faq_00209__p175411392815">If resources of a node are to be used up, kubelet executes the eviction policy based on the pod priority, resource usage, and resource request. If pods have the same priority, the pod that uses the most resources or requests for the most resources will be evicted first.</p>
<p id="cce_faq_00209__p163999204318">kube-controller-manager evicts all pods on a faulty node, while kubelet evicts some pods on a faulty node. kubelet periodically checks the memory and disk resources of nodes. If the resources are insufficient, it will evict some pods based on the priority. For details about the pod eviction priority, see <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#pod-selection-for-kubelet-eviction" target="_blank" rel="noopener noreferrer">Pod selection for kubelet eviction</a>.</p>
<p id="cce_faq_00209__p16515800341">There are soft eviction thresholds and hard eviction thresholds.</p>
<ul id="cce_faq_00209__ul11812173581817"><li id="cce_faq_00209__li182751015191119"><strong id="cce_faq_00209__b201251735124811">Soft eviction thresholds</strong>: A grace period is configured for node resources. kubelet will reclaim node resources associated with these thresholds if that grace period elapses. If the node resource usage reaches these thresholds but falls below them before the grace period elapses, kubelet will not evict pods on the node.<div class="p" id="cce_faq_00209__p685635845819">You can configure soft eviction thresholds using the following parameters:<ul id="cce_faq_00209__ul331145815816"><li id="cce_faq_00209__li73117581585"><strong id="cce_faq_00209__b13707113312162">eviction-soft</strong>: indicates a soft eviction threshold. If a node's <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#eviction-signals" target="_blank" rel="noopener noreferrer">eviction signal</a> reaches a certain threshold, for example, <strong id="cce_faq_00209__b205021311194713">memory.available&lt;1.5Gi</strong>, kubelet will not immediately evict some pods on the node but wait for a grace period configured by <strong id="cce_faq_00209__b455815244818">eviction-soft-grace-period</strong>. If the threshold is reached after the grace period elapses, kubelet will evict some pods on the node.</li><li id="cce_faq_00209__li3311458155813"><strong id="cce_faq_00209__b32251651114813">eviction-soft-grace-period</strong>: indicates an eviction grace period. If a pod reaches the soft eviction threshold, it will be terminated after the configured grace period elapses. This parameter indicates the time difference for a terminating pod to respond to the threshold being met. The default grace period is 90 seconds.</li><li id="cce_faq_00209__li7311358155816"><strong id="cce_faq_00209__b134114531504">eviction-max-pod-grace-period</strong>: indicates the maximum allowed grace period to use when terminating pods in response to a soft eviction threshold being met.</li></ul>
</div>
</li><li id="cce_faq_00209__li824223811184"><strong id="cce_faq_00209__b6900142819556">Hard eviction thresholds</strong>: Pods are immediately evicted once these thresholds are reached.<p id="cce_faq_00209__p143161527717">You can configure hard eviction thresholds using the following parameters:</p>
<p id="cce_faq_00209__p420104012817"><strong id="cce_faq_00209__b19400174162217">eviction-hard</strong>: indicates a hard eviction threshold. When the <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#eviction-signals" target="_blank" rel="noopener noreferrer">eviction signal</a> of a node reaches a certain threshold, for example, <strong id="cce_faq_00209__b3845012171413">memory.available&lt;1Gi</strong>, which means, when the available memory of the node is less than 1 GiB, a pod eviction will be triggered immediately.</p>
<p id="cce_faq_00209__p153318421381">kubelet supports the following default hard eviction thresholds:</p>
<ul id="cce_faq_00209__ul15341421684"><li id="cce_faq_00209__li15341042788"><strong id="cce_faq_00209__b1677516490510">memory.available&lt;100Mi</strong></li><li id="cce_faq_00209__li17534542484"><strong id="cce_faq_00209__b8836151125110">nodefs.available&lt;10%</strong></li><li id="cce_faq_00209__li25340427813"><strong id="cce_faq_00209__b18154145305119">imagefs.available&lt;15%</strong></li><li id="cce_faq_00209__li125347426813"><strong id="cce_faq_00209__b1231645717519">nodefs.inodesFree&lt;5%</strong> (for Linux nodes)</li></ul>
</li></ul>
<p id="cce_faq_00209__p13442191315917">kubelet also supports other parameters:</p>
<ul id="cce_faq_00209__ul1147213131198"><li id="cce_faq_00209__li647231320919"><strong id="cce_faq_00209__b9324229135217">eviction-pressure-transition-period</strong>: indicates a period for which the kubelet has to wait before transitioning out of an eviction pressure condition. The default value is 5 minutes. If the time exceeds the threshold, the node is set to <strong id="cce_faq_00209__b161415019191">DiskPressure</strong> or <strong id="cce_faq_00209__b33531130194">MemoryPressure</strong>. Then some pods running on the node will be evicted. This parameter can prevent mistaken eviction decisions when a node is oscillating above and below a soft eviction threshold in some cases.</li><li id="cce_faq_00209__li74722131192"><strong id="cce_faq_00209__b1937517536522">eviction-minimum-reclaim</strong>: indicates the minimum number of resources that must be reclaimed in each eviction. This parameter can prevent kubelet from repeatedly evicting pods because only a small number of resources are reclaimed during pod evictions in some cases.</li></ul>
</li></ul>
</div>
<div class="section" id="cce_faq_00209__section1774111124510"><h4 class="sectiontitle">Fault Locating</h4><p id="cce_faq_00209__p12607113911412">If the pods are not evicted when the node is faulty, perform the following steps to locate the fault:</p>
<p id="cce_faq_00209__p14291135404815">After the following command is run, the command output shows that many pods are in the <strong id="cce_faq_00209__b1556018296580">Evicted</strong> state.</p>
<pre class="screen" id="cce_faq_00209__screen202541196506">kubectl get pods</pre>
<div class="p" id="cce_faq_00209__p163121043963">Check results will be recorded in kubelet logs of the node. You can run the following command to search for the information:<pre class="screen" id="cce_faq_00209__screen1577352152319">cat /var/paas/sys/log/kubernetes/kubelet.log | grep -i Evicted -C3</pre>
</div>
</div>
<div class="section" id="cce_faq_00209__section422013548124"><h4 class="sectiontitle">Troubleshooting Process</h4><p id="cce_faq_00209__p17987205213470">The issues here are described in order of how likely they are to occur.</p>
<p id="cce_faq_00209__p997211345215">Check these causes one by one until you find the cause of the fault.</p>
<ul id="cce_faq_00209__ul13738882414"><li id="cce_faq_00209__li147211240112419"><a href="#cce_faq_00209__section1147819574162">Check Item 1: Whether the Node Is Under Resource Pressure</a></li><li id="cce_faq_00209__li6732862419"><a href="#cce_faq_00209__section156641841181420">Check Item 2: Whether Tolerations Have Been Configured for the Workload</a></li><li id="cce_faq_00209__li13864914122411"><a href="#cce_faq_00209__section9833172419151">Check Item 3: Whether the Conditions for Stopping Pod Eviction Are Met</a></li><li id="cce_faq_00209__li1656991911249"><a href="#cce_faq_00209__section14911135124710">Check Item 4: Whether the Allocated Resources of the Pod Are the Same as Those of the Node</a></li><li id="cce_faq_00209__li9475924172415"><a href="#cce_faq_00209__section127261381585">Check Item 5: Whether the Workload Pod Fails Continuously and Is Redeployed</a></li></ul>
</div>
<div class="section" id="cce_faq_00209__section1147819574162"><a name="cce_faq_00209__section1147819574162"></a><a name="section1147819574162"></a><h4 class="sectiontitle">Check Item 1: Whether the Node Is Under Resource Pressure</h4><p id="cce_faq_00209__p616602719177">If a node suffers resource pressure, kubelet will change the <a href="https://kubernetes.io/docs/reference/node/node-status/#condition" target="_blank" rel="noopener noreferrer">node status</a> and add taints to the node. Perform the following steps to check whether the corresponding taint exists on the node:</p>
<pre class="screen" id="cce_faq_00209__screen8412535161614">$ kubectl describe node 192.168.0.37
Name: 192.168.0.37
...
Taints: key1=value1:NoSchedule
...</pre>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_faq_00209__table845314081915" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Statuses of nodes with resource pressure and solutions</caption><thead align="left"><tr id="cce_faq_00209__row10454840171916"><th align="left" class="cellrowborder" valign="top" width="15.359559007767476%" id="mcps1.3.4.4.2.5.1.1"><p id="cce_faq_00209__p9455114010191">Node Status</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="26.659984966173887%" id="mcps1.3.4.4.2.5.1.2"><p id="cce_faq_00209__p164841248113411">Taint</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="24.05412177399148%" id="mcps1.3.4.4.2.5.1.3"><p id="cce_faq_00209__p5455640141912">Eviction Signal</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.92633425206715%" id="mcps1.3.4.4.2.5.1.4"><p id="cce_faq_00209__p134551407191">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_faq_00209__row1345518409196"><td class="cellrowborder" valign="top" width="15.359559007767476%" headers="mcps1.3.4.4.2.5.1.1 "><p id="cce_faq_00209__p15390122012018">MemoryPressure</p>
</td>
<td class="cellrowborder" valign="top" width="26.659984966173887%" headers="mcps1.3.4.4.2.5.1.2 "><p id="cce_faq_00209__p874614413356">node.kubernetes.io/memory-pressure</p>
</td>
<td class="cellrowborder" valign="top" width="24.05412177399148%" headers="mcps1.3.4.4.2.5.1.3 "><p id="cce_faq_00209__p292432513200">memory.available</p>
</td>
<td class="cellrowborder" valign="top" width="33.92633425206715%" headers="mcps1.3.4.4.2.5.1.4 "><p id="cce_faq_00209__p1649821014209">The available memory on the node reaches the eviction thresholds.</p>
</td>
</tr>
<tr id="cce_faq_00209__row164556404190"><td class="cellrowborder" valign="top" width="15.359559007767476%" headers="mcps1.3.4.4.2.5.1.1 "><p id="cce_faq_00209__p6498141012204">DiskPressure</p>
</td>
<td class="cellrowborder" valign="top" width="26.659984966173887%" headers="mcps1.3.4.4.2.5.1.2 "><p id="cce_faq_00209__p8703162018355">node.kubernetes.io/disk-pressure</p>
</td>
<td class="cellrowborder" valign="top" width="24.05412177399148%" headers="mcps1.3.4.4.2.5.1.3 "><p id="cce_faq_00209__p1549821062017">nodefs.available, nodefs.inodesFree, imagefs.available or imagefs.inodesFree</p>
</td>
<td class="cellrowborder" valign="top" width="33.92633425206715%" headers="mcps1.3.4.4.2.5.1.4 "><p id="cce_faq_00209__p9498810162017">The available disk space and inode on the root file system or image file system of the node reach the eviction thresholds.</p>
</td>
</tr>
<tr id="cce_faq_00209__row195251472016"><td class="cellrowborder" valign="top" width="15.359559007767476%" headers="mcps1.3.4.4.2.5.1.1 "><p id="cce_faq_00209__p10811103882016">PIDPressure</p>
</td>
<td class="cellrowborder" valign="top" width="26.659984966173887%" headers="mcps1.3.4.4.2.5.1.2 "><p id="cce_faq_00209__p15485134893412">node.kubernetes.io/pid-pressure</p>
</td>
<td class="cellrowborder" valign="top" width="24.05412177399148%" headers="mcps1.3.4.4.2.5.1.3 "><p id="cce_faq_00209__p19519428208">pid.available</p>
</td>
<td class="cellrowborder" valign="top" width="33.92633425206715%" headers="mcps1.3.4.4.2.5.1.4 "><p id="cce_faq_00209__p1449831072011">The available process identifier on the node is below the eviction thresholds.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="cce_faq_00209__section156641841181420"><a name="cce_faq_00209__section156641841181420"></a><a name="section156641841181420"></a><h4 class="sectiontitle">Check Item 2: Whether Tolerations Have Been Configured for the Workload</h4><p id="cce_faq_00209__p444410613172">Use kubectl or locate the row containing the target workload and choose <strong id="cce_faq_00209__b670114315258">More</strong> &gt; <strong id="cce_faq_00209__b954618972513">Edit YAML</strong> in the <strong id="cce_faq_00209__b10992351810">Operation</strong> column to check whether tolerance is configured for the workload. For details, see <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/" target="_blank" rel="noopener noreferrer">Taints and Tolerations</a>.</p>
</div>
<div class="section" id="cce_faq_00209__section9833172419151"><a name="cce_faq_00209__section9833172419151"></a><a name="section9833172419151"></a><h4 class="sectiontitle">Check Item 3: Whether the Conditions for Stopping Pod Eviction Are Met</h4><p id="cce_faq_00209__p1552484110155">In a cluster that runs less than 50 worker nodes, if the number of faulty nodes accounts for over 55% of the total nodes, the pod eviction will be suspended. In this case, Kubernetes will not attempt to evict the workload on the faulty node. For details, see <a href="https://kubernetes.io/docs/concepts/architecture/nodes/#rate-limits-on-eviction" target="_blank" rel="noopener noreferrer">Rate limits on eviction</a>.</p>
</div>
<div class="section" id="cce_faq_00209__section14911135124710"><a name="cce_faq_00209__section14911135124710"></a><a name="section14911135124710"></a><h4 class="sectiontitle">Check Item 4: Whether the Allocated Resources of the Pod Are the Same as Those of the Node</h4><p id="cce_faq_00209__p36508494166">An evicted pod will be frequently scheduled to the original node.</p>
<p id="cce_faq_00209__p163181135181619"><strong id="cce_faq_00209__b35763261306">Possible Causes</strong></p>
<p id="cce_faq_00209__p193531013542">Pods on a node are evicted based on the node resource usage. The evicted pods are scheduled based on the allocated node resources. Eviction and scheduling are based on different rules. Therefore, an evicted container may be scheduled to the original node again.</p>
<p id="cce_faq_00209__p288552111614"><strong id="cce_faq_00209__b4868241808">Solution</strong></p>
<p id="cce_faq_00209__p1935313113546">Properly allocate resources to each container.</p>
</div>
<div class="section" id="cce_faq_00209__section127261381585"><a name="cce_faq_00209__section127261381585"></a><a name="section127261381585"></a><h4 class="sectiontitle">Check Item 5: Whether the Workload Pod Fails Continuously and Is Redeployed</h4><p id="cce_faq_00209__p171431685190">A workload pod fails and is being redeployed constantly.</p>
<p id="cce_faq_00209__p12469151461911"><strong id="cce_faq_00209__b677914541303">Analysis</strong></p>
<p id="cce_faq_00209__p7106153118226">After a pod is evicted and scheduled to a new node, if pods in that node are also being evicted, the pod will be evicted again. Pods may be evicted repeatedly.</p>
<p id="cce_faq_00209__p137158248">If a pod is evicted by kube-controller-manager, it would be in the <strong id="cce_faq_00209__b795910245411">Terminating</strong> state. This pod will be automatically deleted only after the node where the container is located is restored. If the node has been deleted or cannot be restored due to other reasons, you can forcibly delete the pod.</p>
<p id="cce_faq_00209__p12633151516465">If a pod is evicted by kubelet, it would be in the <strong id="cce_faq_00209__b1929135135419">Evicted</strong> state. This pod is only used for subsequent fault locating and can be directly deleted.</p>
<p id="cce_faq_00209__p3912520162017"><strong id="cce_faq_00209__b79929211111">Solution</strong></p>
<p id="cce_faq_00209__p121911619815">Run the following command to delete the evicted pods:</p>
<pre class="screen" id="cce_faq_00209__screen1994143612810">kubectl get pods <em id="cce_faq_00209__i4704112544616">&lt;namespace&gt;</em> | grep Evicted | awk '{print $1}' | xargs kubectl delete pod <em id="cce_faq_00209__i11741130114617">&lt;namespace&gt;</em> </pre>
<p id="cce_faq_00209__p18263335194615">In the preceding command, <em id="cce_faq_00209__i17922191193617">&lt;namespace&gt;</em> indicates the namespace name. Configure it based on your requirements.</p>
</div>
<div class="section" id="cce_faq_00209__section125827455817"><h4 class="sectiontitle">References</h4><p id="cce_faq_00209__p179470557224"><a href="https://github.com/kubernetes/kubernetes/issues/55051" target="_blank" rel="noopener noreferrer">Kubelet does not delete evicted pods</a></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="cce_faq_00029.html">Workload Abnormalities</a></div>
</div>
</div>