forked from docs/doc-exports
Reviewed-by: Eotvos, Oliver <oliver.eotvos@t-systems.com> Co-authored-by: Dong, Qiu Jian <qiujiandong1@huawei.com> Co-committed-by: Dong, Qiu Jian <qiujiandong1@huawei.com>
268 lines
30 KiB
HTML
268 lines
30 KiB
HTML
<a name="cce_faq_00098"></a><a name="cce_faq_00098"></a>
|
|
|
|
<h1 class="topictitle1">What Should I Do If Pod Scheduling Fails?</h1>
|
|
<div id="body1527579861399"><div class="section" id="cce_faq_00098__section1678344818322"><h4 class="sectiontitle">Fault Locating</h4><p id="cce_faq_00098__p9305182610535">If the pod is in the <span class="uicontrol" id="cce_faq_00098__uicontrol1116193992210"><b>Pending</b></span> state and the event contains pod scheduling failure information, locate the cause based on the event information. For details about how to view events, see <a href="cce_faq_00134.html">How Do I Use Events to Fix Abnormal Workloads?</a></p>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section1197903153210"><h4 class="sectiontitle">Troubleshooting Process</h4><p id="cce_faq_00098__p997211345215">Determine the cause based on the event information, as listed in <a href="#cce_faq_00098__table230510269532">Table 1</a>.</p>
|
|
|
|
<div class="tablenoborder"><a name="cce_faq_00098__table230510269532"></a><a name="table230510269532"></a><table cellpadding="4" cellspacing="0" summary="" id="cce_faq_00098__table230510269532" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Pod scheduling failure</caption><thead align="left"><tr id="cce_faq_00098__row7305112615310"><th align="left" class="cellrowborder" valign="top" width="49.97%" id="mcps1.3.2.3.2.3.1.1"><p id="cce_faq_00098__p43051526105316">Event Information</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50.029999999999994%" id="mcps1.3.2.3.2.3.1.2"><p id="cce_faq_00098__p5305142645320">Cause and Solution</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="cce_faq_00098__row8305192615537"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p163051326175310">no nodes available to schedule pods.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p030592685319">No node is available in the cluster.</p>
|
|
<p id="cce_faq_00098__p83059266530"><a href="#cce_faq_00098__section133416392418">Check Item 1: Whether a Node Is Available in the Cluster</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row23051626145317"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p16306112665317">0/2 nodes are available: 2 Insufficient cpu.</p>
|
|
<p id="cce_faq_00098__p53066269537">0/2 nodes are available: 2 Insufficient memory.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p13306182615312">Node resources (CPU and memory) are insufficient.</p>
|
|
<p id="cce_faq_00098__p13061926185314"><a href="#cce_faq_00098__section29231833141817">Check Item 2: Whether Node Resources (CPU and Memory) Are Sufficient</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row63061226105316"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p1930622617531">0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod affinity rules, 1 node(s) didn't match pod affinity/anti-affinity.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p73064267535">The node and pod affinity configurations are mutually exclusive. No node meets the pod requirements.</p>
|
|
<p id="cce_faq_00098__p230692616533"><a href="#cce_faq_00098__section794092214205">Check Item 3: Affinity and Anti-Affinity Configuration of the Workload</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row1130662675311"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p030611266535">0/2 nodes are available: 2 node(s) had volume node affinity conflict.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p1830617263533">The EVS volume mounted to the pod and the node are not in the same AZ.</p>
|
|
<p id="cce_faq_00098__p730619263533"><a href="#cce_faq_00098__section197421559143010">Check Item 4: Whether the Workload's Volume and Node Reside in the Same AZ</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row1851432918127"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p0211734171217">0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p72193491210">Taints exist on the node, but the pod cannot tolerate these taints.</p>
|
|
<p id="cce_faq_00098__p1929112012431"><a href="#cce_faq_00098__section188241489126">Check Item 5: Taint Toleration of Pods</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row176641617155715"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p864914153588">0/7 nodes are available: 7 Insufficient ephemeral-storage.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p866517179574">The ephemeral storage space of the node is insufficient.</p>
|
|
<p id="cce_faq_00098__p195391244183213"><a href="#cce_faq_00098__section096718509019">Check Item 6: Ephemeral Volume Usage</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row82521555181218"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p52531255111216">0/1 nodes are available: 1 everest driver not found at node</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p10253155510129">The everest-csi-driver on the node is not in the running state.</p>
|
|
<p id="cce_faq_00098__p14408134117136"><a href="#cce_faq_00098__section136595495137">Check Item 7: Whether everest Works Properly</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row1637354597"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p1280935010598">Failed to create pod sandbox: ...</p>
|
|
<p id="cce_faq_00098__p1563717545913">Create more free space in thin pool or use dm.min_free_space option to change behavior</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p736516467186">The node thin pool space is insufficient.</p>
|
|
<p id="cce_faq_00098__p106376513595"><a href="#cce_faq_00098__section1739734419111">Check Item 8: Thin Pool Space</a></p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00098__row2978152311325"><td class="cellrowborder" valign="top" width="49.97%" headers="mcps1.3.2.3.2.3.1.1 "><p id="cce_faq_00098__p697810233326">0/1 nodes are available: 1 Too many pods.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.029999999999994%" headers="mcps1.3.2.3.2.3.1.2 "><p id="cce_faq_00098__p84242479323">The number of pods scheduled to the node exceeded the maximum number allowed by the node.</p>
|
|
<p id="cce_faq_00098__p842414716327"><a href="#cce_faq_00098__section24491119103316">Check Item 9: Number of Pods Scheduled onto the Node</a></p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section133416392418"><a name="cce_faq_00098__section133416392418"></a><a name="section133416392418"></a><h4 class="sectiontitle">Check Item 1: Whether a Node Is Available in the Cluster</h4><p id="cce_faq_00098__p1841011249415">Log in to the CCE console and check whether the node status is <strong id="cce_faq_00098__b3677194712120">Available</strong>. Alternatively, run the following command to check whether the node status is <strong id="cce_faq_00098__b5221259162114">Ready</strong>:</p>
|
|
<pre class="screen" id="cce_faq_00098__screen699214341475">$ kubectl get node
|
|
NAME STATUS ROLES AGE VERSION
|
|
192.168.0.37 Ready <none> 21d v1.19.10-r1.0.0-source-121-gb9675686c54267
|
|
192.168.0.71 Ready <none> 21d v1.19.10-r1.0.0-source-121-gb9675686c54267</pre>
|
|
<p id="cce_faq_00098__p10911295504">If the status of all nodes is <strong id="cce_faq_00098__b14590718122210">Not Ready</strong>, no node is available in the cluster.</p>
|
|
<p id="cce_faq_00098__p1232104414917"><strong id="cce_faq_00098__b1427122913221">Solution</strong></p>
|
|
<ul id="cce_faq_00098__ul15651150142115"><li id="cce_faq_00098__li1610195433310">Add a node. If an affinity policy is not configured for the workload, the pod will be automatically migrated to the new node to ensure that services are running properly.</li><li id="cce_faq_00098__li856516500210">Locate the unavailable node and rectify the fault. For details, see <a href="cce_faq_00120.html">What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?</a></li><li id="cce_faq_00098__li1556625072113">Reset the unavailable node.</li></ul>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section29231833141817"><a name="cce_faq_00098__section29231833141817"></a><a name="section29231833141817"></a><h4 class="sectiontitle">Check Item 2: Whether Node Resources (CPU and Memory) Are Sufficient</h4><p id="cce_faq_00098__p1498013310339"><strong id="cce_faq_00098__b3729101337">0/2 nodes are available: 2 Insufficient cpu.</strong> This means insufficient CPUs.</p>
|
|
<p id="cce_faq_00098__p2981238337"><strong id="cce_faq_00098__b274181043319">0/2 nodes are available: 2 Insufficient memory.</strong> This means insufficient memory.</p>
|
|
<p id="cce_faq_00098__p12581252181019">If the resources requested by the pod exceed the allocatable resources of the node where the pod runs, the node cannot provide the resources required to run new pods and pod scheduling onto the node will definitely fail.</p>
|
|
<p id="cce_faq_00098__p135656442521"></p>
|
|
<p id="cce_faq_00098__p7462875249">If the number of resources that can be allocated to a node is less than the number of resources that a pod requests, the node does not meet the resource requirements of the pod. As a result, the scheduling fails.</p>
|
|
<p id="cce_faq_00098__p18201251142310"><strong id="cce_faq_00098__b9928102181517">Solution</strong></p>
|
|
<p id="cce_faq_00098__p66471216175920">Add nodes to the cluster. Scale-out is the common solution to insufficient resources.</p>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section794092214205"><a name="cce_faq_00098__section794092214205"></a><a name="section794092214205"></a><h4 class="sectiontitle">Check Item 3: Affinity and Anti-Affinity Configuration of the Workload</h4><p id="cce_faq_00098__p3690103662116">Inappropriate affinity policies will cause pod scheduling to fail.</p>
|
|
<p id="cce_faq_00098__p222814544552">Example:</p>
|
|
<p id="cce_faq_00098__p47431261224">An anti-affinity relationship is established between workload 1 and workload 2. Workload 1 is deployed on node 1 while workload 2 is deployed on node 2.</p>
|
|
<p id="cce_faq_00098__p77501221623">When you try to deploy workload 3 on node 1 and establish an affinity relationship with workload 2, a conflict occurs, resulting in a workload deployment failure.</p>
|
|
<p id="cce_faq_00098__p153961752183918">0/2 nodes are available: 1 node(s) didn't match <strong id="cce_faq_00098__b1569851716404">node selector</strong>, 1 node(s) didn't match <strong id="cce_faq_00098__b1220862012404">pod affinity rules</strong>, 1 node(s) didn't match <strong id="cce_faq_00098__b1287152474010">pod affinity/anti-affinity</strong>.</p>
|
|
<ul id="cce_faq_00098__ul01061846112317"><li id="cce_faq_00098__li310744682318"><strong id="cce_faq_00098__b860392382315">node selector</strong> indicates that the node affinity is not met.</li><li id="cce_faq_00098__li9107046142312"><strong id="cce_faq_00098__b216183816236">pod affinity rules</strong> indicate that the pod affinity is not met.</li><li id="cce_faq_00098__li101071546122316"><strong id="cce_faq_00098__b917834215231">pod affinity/anti-affinity</strong> indicates that the pod affinity/anti-affinity is not met.</li></ul>
|
|
<p id="cce_faq_00098__p5845375279"><strong id="cce_faq_00098__b1896020315359">Solution</strong></p>
|
|
<ul id="cce_faq_00098__ul113635413218"><li id="cce_faq_00098__li7319343142111">When adding workload-workload affinity and workload-node affinity policies, ensure that the two types of policies do not conflict each other. Otherwise, workload deployment will fail.</li><li id="cce_faq_00098__li19819172752717">If the workload has a node affinity policy, make sure that <strong id="cce_faq_00098__b111241146151112">supportContainer</strong> in the label of the affinity node is set to <strong id="cce_faq_00098__b212511461114">true</strong>. Otherwise, pods cannot be scheduled onto the affinity node and the following event is generated:<pre class="screen" id="cce_faq_00098__screen4206153932918">No nodes are available that match all of the following predicates: MatchNode Selector, NodeNotSupportsContainer</pre>
|
|
<p id="cce_faq_00098__p4903951132618">If the value is <strong id="cce_faq_00098__b10771127185718">false</strong>, the scheduling fails.</p>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section197421559143010"><a name="cce_faq_00098__section197421559143010"></a><a name="section197421559143010"></a><h4 class="sectiontitle">Check Item 4: Whether the Workload's Volume and Node Reside in the Same AZ</h4><p id="cce_faq_00098__p090455873013"><strong id="cce_faq_00098__b159047586301">0/2 nodes are available: 2 node(s) had volume node affinity conflict.</strong> An affinity conflict occurs between volumes and nodes. As a result, the scheduling fails.</p>
|
|
<p id="cce_faq_00098__p1725421820296">This is because EVS disks cannot be attached to nodes across AZs. For example, if the EVS volume is located in AZ 1 and the node is located in AZ 2, scheduling fails.</p>
|
|
<p id="cce_faq_00098__p17452132516304">The EVS volume created on CCE has affinity settings by default, as shown below.</p>
|
|
<pre class="screen" id="cce_faq_00098__screen9753862319">kind: PersistentVolume
|
|
apiVersion: v1
|
|
metadata:
|
|
name: pvc-c29bfac7-efa3-40e6-b8d6-229d8a5372ac
|
|
spec:
|
|
...
|
|
nodeAffinity:
|
|
required:
|
|
nodeSelectorTerms:
|
|
- matchExpressions:
|
|
- key: failure-domain.beta.kubernetes.io/zone
|
|
operator: In
|
|
values:
|
|
- </pre>
|
|
<p id="cce_faq_00098__p128612406323"><strong id="cce_faq_00098__b144771634193016">Solution</strong></p>
|
|
<p id="cce_faq_00098__p145982036103213">In the AZ where the workload's node resides, create a volume. Alternatively, create an identical workload and select an automatically assigned cloud storage volume.</p>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section188241489126"><a name="cce_faq_00098__section188241489126"></a><a name="section188241489126"></a><h4 class="sectiontitle">Check Item 5: Taint Toleration of Pods</h4><p id="cce_faq_00098__p199719376274"><strong id="cce_faq_00098__b1385310694014">0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.</strong> This means the node is tainted and the pod cannot be scheduled to the node.</p>
|
|
<p id="cce_faq_00098__p137913268147">Check the taints on the node. If the following information is displayed, taints exist on the node:</p>
|
|
<pre class="screen" id="cce_faq_00098__screen8412535161614">$ kubectl describe node 192.168.0.37
|
|
Name: 192.168.0.37
|
|
...
|
|
Taints: key1=value1:NoSchedule
|
|
...</pre>
|
|
<p id="cce_faq_00098__p194724311342">In some cases, the system automatically adds a taint to a node. The current built-in taints include:</p>
|
|
<ul id="cce_faq_00098__ul2498126173818"><li id="cce_faq_00098__li64721233348"><a name="cce_faq_00098__li64721233348"></a><a name="li64721233348"></a>node.kubernetes.io/not-ready: The node is not ready.</li><li id="cce_faq_00098__li14472835349">node.kubernetes.io/unreachable: The node controller cannot access the node.</li><li id="cce_faq_00098__li54724373414">node.kubernetes.io/memory-pressure: The node has memory pressure.</li><li id="cce_faq_00098__li204721938344">node.kubernetes.io/disk-pressure: The node has disk pressure. Follow the instructions described in <a href="cce_faq_00015.html#cce_faq_00015__section165209286116">Check Item 4: Whether the Node Disk Space Is Insufficient</a> to handle it.</li><li id="cce_faq_00098__li8473834347">node.kubernetes.io/pid-pressure: The node is under PID pressure. </li><li id="cce_faq_00098__li14737353411">node.kubernetes.io/network-unavailable: The node network is unavailable.</li><li id="cce_faq_00098__li94731438340">node.kubernetes.io/unschedulable: The node cannot be scheduled.</li><li id="cce_faq_00098__li4498122618384">node.cloudprovider.kubernetes.io/uninitialized: If an external cloud platform driver is specified when kubelet is started, kubelet adds a taint to the current node and marks it as unavailable. After <strong id="cce_faq_00098__b10407174514374">cloud-controller-manager</strong> initializes the node, kubelet deletes the taint.</li></ul>
|
|
<p id="cce_faq_00098__p1185723212388"><strong id="cce_faq_00098__b185461277385">Solution</strong></p>
|
|
<p id="cce_faq_00098__p857417114176">To schedule the pod to the node, use either of the following methods:</p>
|
|
<ul id="cce_faq_00098__ul86312554206"><li id="cce_faq_00098__li563185572015">If the taint is added by a user, you can delete the taint on the node. If the taint is <a href="#cce_faq_00098__li64721233348">automatically added by the system</a>, the taint will be automatically deleted after the fault is rectified.</li><li id="cce_faq_00098__li9632055132011">Specify a toleration for the pod containing the taint. For details, see <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/" target="_blank" rel="noopener noreferrer">Taints and Tolerations</a>.<pre class="screen" id="cce_faq_00098__screen738310419191">apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: nginx
|
|
spec:
|
|
containers:
|
|
- name: nginx
|
|
image: nginx:alpine
|
|
tolerations:
|
|
- key: "key1"
|
|
operator: "Equal"
|
|
value: "value1"
|
|
effect: "NoSchedule" </pre>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section096718509019"><a name="cce_faq_00098__section096718509019"></a><a name="section096718509019"></a><h4 class="sectiontitle">Check Item 6: Ephemeral Volume Usage</h4><p id="cce_faq_00098__p129009269310"><strong id="cce_faq_00098__b13130173210311">0/7 nodes are available: 7 Insufficient ephemeral-storage.</strong> This means insufficient ephemeral storage of the node.</p>
|
|
<p id="cce_faq_00098__p440516254111">Check whether the size of the ephemeral volume in the pod is limited. If the size of the ephemeral volume required by the application exceeds the existing capacity of the node, the application cannot be scheduled. To solve this problem, change the size of the ephemeral volume or expand the disk capacity of the node.</p>
|
|
<pre class="screen" id="cce_faq_00098__screen44001321042">apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: frontend
|
|
spec:
|
|
containers:
|
|
- name: app
|
|
image: images.my-company.example/app:v4
|
|
resources:
|
|
requests:
|
|
ephemeral-storage: "2Gi"
|
|
limits:
|
|
ephemeral-storage: "4Gi"
|
|
volumeMounts:
|
|
- name: ephemeral
|
|
mountPath: "/tmp"
|
|
volumes:
|
|
- name: ephemeral
|
|
emptyDir: {}</pre>
|
|
</div>
|
|
<p id="cce_faq_00098__p55317134288">To obtain the total capacity (<strong id="cce_faq_00098__b2080611581012">Capacity</strong>) and available capacity (<strong id="cce_faq_00098__b1515672110102">Allocatable</strong>) of the temporary volume mounted to the node, run the <strong id="cce_faq_00098__b1452618115393">kubectl describe node</strong> command, and view the application value and limit value of the temporary volume mounted to the node.</p>
|
|
<p id="cce_faq_00098__p577213581252">The following is an example of the output:</p>
|
|
<pre class="screen" id="cce_faq_00098__screen13581424704">...
|
|
Capacity:
|
|
cpu: 4
|
|
ephemeral-storage: 61607776Ki
|
|
hugepages-1Gi: 0
|
|
hugepages-2Mi: 0
|
|
localssd: 0
|
|
localvolume: 0
|
|
memory: 7614352Ki
|
|
pods: 40
|
|
Allocatable:
|
|
cpu: 3920m
|
|
ephemeral-storage: 56777726268
|
|
hugepages-1Gi: 0
|
|
hugepages-2Mi: 0
|
|
localssd: 0
|
|
localvolume: 0
|
|
memory: 6180752Ki
|
|
pods: 40
|
|
...
|
|
Allocated resources:
|
|
(Total limits may be over 100 percent, i.e., overcommitted.)
|
|
Resource Requests Limits
|
|
-------- -------- ------
|
|
cpu 1605m (40%) 6530m (166%)
|
|
memory 2625Mi (43%) 5612Mi (92%)
|
|
ephemeral-storage 0 (0%) 0 (0%)
|
|
hugepages-1Gi 0 (0%) 0 (0%)
|
|
hugepages-2Mi 0 (0%) 0 (0%)
|
|
localssd 0 0
|
|
localvolume 0 0
|
|
Events: <none></pre>
|
|
<div class="section" id="cce_faq_00098__section136595495137"><a name="cce_faq_00098__section136595495137"></a><a name="section136595495137"></a><h4 class="sectiontitle">Check Item 7: Whether everest Works Properly</h4><p id="cce_faq_00098__p2023592311417"><strong id="cce_faq_00098__b1032816114613">0/1 nodes are available: 1 everest driver not found at node</strong>. This means the everest-csi-driver of everest is not started properly on the node.</p>
|
|
<p id="cce_faq_00098__p1899018297225">Check the daemon named <strong id="cce_faq_00098__b177131644765">everest-csi-driver</strong> in the kube-system namespace and check whether the pod is started properly. If not, delete the pod. The daemon will restart the pod.</p>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section1739734419111"><a name="cce_faq_00098__section1739734419111"></a><a name="section1739734419111"></a><h4 class="sectiontitle">Check Item 8: Thin Pool Space</h4><p id="cce_faq_00098__p19689520219">A data disk dedicated for kubelet and the container engine will be attached to a new node. If the data disk space is insufficient, the pod cannot be created.</p>
|
|
<p id="cce_faq_00098__p152190294356"><strong id="cce_faq_00098__b645112201592">Solution 1: Clearing images</strong></p>
|
|
<div class="p" id="cce_faq_00098__p8482201996">Perform the following operations to clear unused images:<ul id="cce_faq_00098__ul921533814520"><li id="cce_faq_00098__li2046965217521">Nodes that use containerd<ol id="cce_faq_00098__ol578417521309"><li id="cce_faq_00098__li1278419521014">Obtain local images on the node.<pre class="screen" id="cce_faq_00098__screen678410524011">crictl images -v</pre>
|
|
</li><li id="cce_faq_00098__li478485214010">Delete the images that are not required by image ID.<pre class="screen" id="cce_faq_00098__screen2784352202">crictl rmi <em id="cce_faq_00098__i1669994917149">Image ID</em></pre>
|
|
</li></ol>
|
|
</li><li id="cce_faq_00098__li1121533815523">Nodes that use Docker<ol id="cce_faq_00098__ol1723885415319"><li id="cce_faq_00098__li0238185415314">Obtain local images on the node.<pre class="screen" id="cce_faq_00098__screen230933054816">docker images</pre>
|
|
</li><li id="cce_faq_00098__li11135715134912">Delete the images that are not required by image ID.<pre class="screen" id="cce_faq_00098__screen754875916481">docker rmi <em id="cce_faq_00098__i12578102141511">Image ID</em></pre>
|
|
</li></ol>
|
|
</li></ul>
|
|
<div class="note" id="cce_faq_00098__note1327520159174"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="cce_faq_00098__p927612158179">Do not delete system images such as the cce-pause image. Otherwise, pods may fail to be created.</p>
|
|
</div></div>
|
|
</div>
|
|
<p id="cce_faq_00098__p132271148115716"><strong id="cce_faq_00098__b11877191512189">Solution 2: Expanding the disk capacity</strong></p>
|
|
<p id="cce_faq_00098__p4914182563411">To expand a disk capacity, perform the following steps:</p>
|
|
<ol id="cce_faq_00098__ol11506174116352"><li id="cce_faq_00098__cce_bestpractice_00198_en-us_topic_0196817407_li1091823811013"><span>Expand the capacity of the data disk on the EVS console.</span></li><li id="cce_faq_00098__cce_bestpractice_00198_li15327184914542"><span>Log in to the CCE console and click the cluster. In the navigation pane, choose <strong id="cce_faq_00098__cce_bestpractice_00198_b176491516203817">Nodes</strong>. Click <strong id="cce_faq_00098__cce_bestpractice_00198_b464971673810">More</strong> > <strong id="cce_faq_00098__cce_bestpractice_00198_b9649161615380">Sync Server Data</strong> in the row containing the target node.</span></li><li id="cce_faq_00098__cce_bestpractice_00198_en-us_topic_0196817407_li209187382011"><span>Log in to the target node.</span></li><li id="cce_faq_00098__cce_bestpractice_00198_li128005014232"><span>Run the <strong id="cce_faq_00098__cce_bestpractice_00198_b6455184022316">lsblk</strong> command to check the block device information of the node.</span><p><p id="cce_faq_00098__cce_bestpractice_00198_p980018092312">A data disk is divided depending on the container storage <strong id="cce_faq_00098__cce_bestpractice_00198_b687813596016">Rootfs</strong>:</p>
|
|
<ul id="cce_faq_00098__cce_bestpractice_00198_ul89731919102417"><li id="cce_faq_00098__cce_bestpractice_00198_li1536084418247">Overlayfs: No independent thin pool is allocated. Image data is stored in the <strong id="cce_faq_00098__cce_bestpractice_00198_b14504233414">dockersys</strong> disk.<pre class="screen" id="cce_faq_00098__cce_bestpractice_00198_screen736044442417"># lsblk
|
|
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
|
vda 8:0 0 50G 0 disk
|
|
└─vda1 8:1 0 50G 0 part /
|
|
<strong id="cce_faq_00098__cce_bestpractice_00198_b9542144551613">vdb</strong> 8:16 0 200G 0 disk
|
|
├─vgpaas-dockersys 253:0 0 90G 0 lvm /var/lib/docker # Space used by the container engine
|
|
└─vgpaas-kubernetes 253:1 0 10G 0 lvm /mnt/paas/kubernetes/kubelet # Space used by Kubernetes</pre>
|
|
<p id="cce_faq_00098__cce_bestpractice_00198_p1599151113360">Run the following commands on the node to add the new disk capacity to the <strong id="cce_faq_00098__cce_bestpractice_00198_b746642417811">dockersys</strong> disk:</p>
|
|
<pre class="screen" id="cce_faq_00098__cce_bestpractice_00198_screen10503202016363">pvresize /dev/vdb
|
|
lvextend -l+100%FREE -n vgpaas/dockersys
|
|
resize2fs /dev/vgpaas/dockersys</pre>
|
|
</li><li id="cce_faq_00098__cce_bestpractice_00198_li7973131913245">Devicemapper: A thin pool is allocated to store image data.<pre class="screen" id="cce_faq_00098__cce_bestpractice_00198_screen10480142251"># lsblk
|
|
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
|
vda 8:0 0 50G 0 disk
|
|
└─vda1 8:1 0 50G 0 part /
|
|
<strong id="cce_faq_00098__cce_bestpractice_00198_b9505458151516">vdb</strong> 8:16 0 200G 0 disk
|
|
├─<strong id="cce_faq_00098__cce_bestpractice_00198_b170511380163">vgpaas-dockersys</strong> 253:0 0 18G 0 lvm /var/lib/docker
|
|
├─vgpaas-thinpool_tmeta 253:1 0 3G 0 lvm
|
|
│ └─<strong id="cce_faq_00098__cce_bestpractice_00198_b10865144019161">vgpaas-thinpool</strong> 253:3 0 67G 0 lvm # Space used by thinpool
|
|
│ ...
|
|
├─vgpaas-thinpool_tdata 253:2 0 67G 0 lvm
|
|
│ └─vgpaas-thinpool 253:3 0 67G 0 lvm
|
|
│ ...
|
|
└─vgpaas-kubernetes 253:4 0 10G 0 lvm /mnt/paas/kubernetes/kubelet</pre>
|
|
<ul id="cce_faq_00098__cce_bestpractice_00198_ul151541148142616"><li id="cce_faq_00098__cce_bestpractice_00198_li8154948152611">Run the following commands on the node to add the new disk capacity to the <strong id="cce_faq_00098__cce_bestpractice_00198_b169691932144611">thinpool</strong> disk:<pre class="screen" id="cce_faq_00098__cce_bestpractice_00198_screen1941742617282">pvresize /dev/vdb
|
|
lvextend -l+100%FREE -n vgpaas/thinpool</pre>
|
|
</li><li id="cce_faq_00098__cce_bestpractice_00198_li715464810269">Run the following commands on the node to add the new disk capacity to the <strong id="cce_faq_00098__cce_bestpractice_00198_b143201925134616">dockersys</strong> disk:<pre class="screen" id="cce_faq_00098__cce_bestpractice_00198_screen3309227102613">pvresize /dev/vdb
|
|
lvextend -l+100%FREE -n vgpaas/dockersys
|
|
resize2fs /dev/vgpaas/dockersys</pre>
|
|
</li></ul>
|
|
</li></ul>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="cce_faq_00098__section24491119103316"><a name="cce_faq_00098__section24491119103316"></a><a name="section24491119103316"></a><h4 class="sectiontitle">Check Item 9: Number of Pods Scheduled onto the Node</h4><p id="cce_faq_00098__p19401121333418"><strong id="cce_faq_00098__b54015132347">0/1 nodes are available: 1 Too many pods.</strong> indicates excessive number of pods have been scheduled to the node.</p>
|
|
<p id="cce_faq_00098__p1052316424366">When creating a node, configure <strong id="cce_faq_00098__b16451122452210">Max. Pods</strong> in <strong id="cce_faq_00098__b12869183132213">Advanced Settings</strong> to specify the maximum number of pods that can run properly on the node. The default value varies with the node flavor. You can change the value as needed.</p>
|
|
<p id="cce_faq_00098__p8996942173817">On the <strong id="cce_faq_00098__b830954561314">Nodes</strong> page, obtain the <strong id="cce_faq_00098__b7194102551416">Pods (Allocated/Total)</strong> value of the node, and check whether the number of pods scheduled onto the node has reached the upper limit. If so, add nodes or change the maximum number of pods.</p>
|
|
<p id="cce_faq_00098__p859694015414">To change the maximum number of pods that can run on a node, do as follows:</p>
|
|
<ul id="cce_faq_00098__ul924291719437"><li id="cce_faq_00098__li62428173433">For nodes in the default node pool: Change the <strong id="cce_faq_00098__b912374141714">Max. Pods</strong> value when resetting the node.</li><li id="cce_faq_00098__li20423102114411">For nodes in a customized node pool: Change the value of the node pool parameter <strong id="cce_faq_00098__b143271327131817">max-pods</strong>. </li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="cce_faq_00029.html">Workload Abnormalities</a></div>
|
|
</div>
|
|
</div>
|
|
|