If the workload is in the Unready state and reports the "InstanceSchedulingFailed" event, check the workload's K8S events to identify the cause.
As shown in the following figure, the K8s event is "0/163 nodes are available: 133 Insufficient memory", indicating that the memory is insufficient.
Complex scheduling failure information:
Information interpretation:
The following is the fault locating procedure:
Log in to the CCE console. In the navigation pane, choose Resource Management > Nodes to check whether the node where the workload runs is in the available state.
For example, the event "0/1 nodes are available: 1 node(s) were not ready, 1 node(s) were out of disk space" indicates that the pod fails to be scheduled due to no available node.
Solution
If the requested workload resources exceed the available resources of the node where the workload runs, the node cannot provide the resources required to run new pods and pod scheduling onto the node will definitely fail.
The event "0/1 nodes are available: 1 Insufficient cpu" indicates that the pod fails to be scheduled due to insufficient node resources.
In this example, 0.88 vCPUs and 0.8 GiB memory are available for the node.
In this example, the CPU request is 2 vCPUs and the memory request is 0.5 GiB. The CPU request exceeds the available CPU resources, which causes pod scheduling to fail.
Solution
On the ECS console, modify node specifications to expand node resources.
Inappropriate affinity policies will cause pod scheduling to fail.
0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affintity, 1 node(s) didn't match pod anti-affinity rules.
Solution
Assumptions: An anti-affinity relationship is established between workload 1 and workload 2. Workload 1 is deployed on node 1 while workload 2 is deployed on node 2.
When you try to deploy workload 3 on node 3 and establish an affinity relationship with workload 2, a conflict occurs, resulting in a workload deployment failure.
No nodes are available that match all of the following predicates: MatchNode Selector, NodeNotSupportsContainer
If supportContainer is set to false, the scheduling fails. The following figure shows the error information.
0/1 nodes are available: 1
Pod scheduling fails if the workload's volume and node reside in different AZs.
0/1 nodes are available: 1 NoVolumeZoneConflict.
Solution
In the AZ where the workload's node resides, create a new volume. Alternatively, create an identical workload and select an automatically assigned cloud storage volume.