forked from docs/doc-exports
Reviewed-by: Eotvos, Oliver <oliver.eotvos@t-systems.com> Co-authored-by: Dong, Qiu Jian <qiujiandong1@huawei.com> Co-committed-by: Dong, Qiu Jian <qiujiandong1@huawei.com>
60 lines
12 KiB
HTML
60 lines
12 KiB
HTML
<a name="cce_faq_00409"></a><a name="cce_faq_00409"></a>
|
|
|
|
<h1 class="topictitle1">What Should I Do If There Is a Service Access Failure After a Backend Service Upgrade or a 1-Second Latency When a Service Accesses a CCE Cluster?</h1>
|
|
<div id="body0000001488600874"><div class="section" id="cce_faq_00409__en-us_topic_0000001250194017_section74289466235"><h4 class="sectiontitle">Symptom</h4><p id="cce_faq_00409__en-us_topic_0000001250194017_p711265372416">If the kernel version of a node is earlier than 5.9 and a CCE cluster runs in IPVS forwarding mode, there may be a service access failure after a backend service upgrade or a 1-second latency when a service accesses the CCE cluster. This is caused by a bug in reusing Kubernetes IPVS connections.</p>
|
|
</div>
|
|
<div class="section" id="cce_faq_00409__en-us_topic_0000001250194017_section11811753182518"><h4 class="sectiontitle">IPVS Connection Reuse Parameters</h4><p id="cce_faq_00409__en-us_topic_0000001250194017_p172242314261">The port reuse policy of IPVS is determined by the kernel parameter <strong id="cce_faq_00409__b23571563200">net.ipv4.vs.conn_reuse_mode</strong>.</p>
|
|
<ol id="cce_faq_00409__en-us_topic_0000001250194017_ol137221523162618"><li id="cce_faq_00409__en-us_topic_0000001250194017_li172242392613">If <strong id="cce_faq_00409__b1338285612516">net.ipv4.vs.conn_reuse_mode</strong> is set to <strong id="cce_faq_00409__b13821756142510">0</strong>, IPVS does not reschedule a new connection, but forwards the new connection to the original RS (IPVS backend).</li><li id="cce_faq_00409__en-us_topic_0000001250194017_li272252315261">If <strong id="cce_faq_00409__b69294558266">net.ipv4.vs.conn_reuse_mode</strong> is set to <strong id="cce_faq_00409__b09291555132616">1</strong>, IPVS reschedules a new connection.</li></ol>
|
|
</div>
|
|
<div class="section" id="cce_faq_00409__en-us_topic_0000001250194017_section1944534518255"><h4 class="sectiontitle">Problems Caused by IPVS Connection Reuse</h4><ul id="cce_faq_00409__en-us_topic_0000001250194017_ul273324311328"><li id="cce_faq_00409__en-us_topic_0000001250194017_li4733164313217"><a name="cce_faq_00409__en-us_topic_0000001250194017_li4733164313217"></a><a name="en-us_topic_0000001250194017_li4733164313217"></a><strong id="cce_faq_00409__b1469914241271">Problem 1</strong><p id="cce_faq_00409__en-us_topic_0000001250194017_p4126164514329">If <strong id="cce_faq_00409__b2693632161212">net.ipv4.vs.conn_reuse_mode</strong> is set to <strong id="cce_faq_00409__b969353214121">0</strong>, IPVS does not proactively schedule new connections with port reuse or trigger any connection termination or drop operations. Data packets of the new connections will be directly forwarded to the previously used backend pod. If the backend pod has been deleted or recreated, an exception occurs. However, according to the current implementation logic, in a high-concurrency service access scenario, connection requests for port reuse are continuously forwarded, while kube-proxy did not delete the old ones, resulting in a service access failure.</p>
|
|
</li><li id="cce_faq_00409__en-us_topic_0000001250194017_li6733154317325"><a name="cce_faq_00409__en-us_topic_0000001250194017_li6733154317325"></a><a name="en-us_topic_0000001250194017_li6733154317325"></a><strong id="cce_faq_00409__b17469587273">Problem 2</strong><p id="cce_faq_00409__en-us_topic_0000001250194017_p96817468324">If <strong id="cce_faq_00409__b17541103285">net.ipv4.vs.conn_reuse_mode</strong> is set to <strong id="cce_faq_00409__b375490132814">1</strong> and the source port is the same as that of a previous connection in a high-concurrency scenario, the connection is not reused but rescheduled. According to the processing logic of ip_vs_in(), if <strong id="cce_faq_00409__b984341711332">net.ipv4.vs.conntrack</strong> is enabled, the first SYN packet is dropped. As a result, the SYN packet will be retransmitted, leading to a 1-second latency, and the performance deteriorates.</p>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="cce_faq_00409__en-us_topic_0000001250194017_section436920820279"><h4 class="sectiontitle">Community Settings and Impact on CCE Clusters</h4><p id="cce_faq_00409__p20517029167">The default value of <strong id="cce_faq_00409__b969035182219">net.ipv4.vs.conn_reuse_mode</strong> on a node is <strong id="cce_faq_00409__b6785974226">1</strong>. However, the Kubernetes kube-proxy resets this parameter.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_faq_00409__table16314203111328" frame="border" border="1" rules="all"><thead align="left"><tr id="cce_faq_00409__row631513116324"><th align="left" class="cellrowborder" valign="top" width="8.19081908190819%" id="mcps1.3.4.3.1.4.1.1"><p id="cce_faq_00409__p1231543193220">Cluster Version</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="38.14381438143815%" id="mcps1.3.4.3.1.4.1.2"><p id="cce_faq_00409__p4315631193218">kube-proxy Action</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="53.66536653665366%" id="mcps1.3.4.3.1.4.1.3"><p id="cce_faq_00409__p83151431133214">Impact on CCE Cluster</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="cce_faq_00409__row931533123220"><td class="cellrowborder" valign="top" width="8.19081908190819%" headers="mcps1.3.4.3.1.4.1.1 "><p id="cce_faq_00409__p0315231183216">1.17 or earlier</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="38.14381438143815%" headers="mcps1.3.4.3.1.4.1.2 "><p id="cce_faq_00409__p15315143193219">By default, kube-proxy sets <strong id="cce_faq_00409__b31131912183519">net.ipv4.vs.conn_reuse_mode</strong> to <strong id="cce_faq_00409__b168721213183520">0</strong>. For details, see <a href="https://github.com/kubernetes/kubernetes/pull/71114" target="_blank" rel="noopener noreferrer">Fix IPVS low throughput issue</a>.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="53.66536653665366%" headers="mcps1.3.4.3.1.4.1.3 "><p id="cce_faq_00409__p1439552312618">If CCE clusters of 1.17 or earlier versions use the IPVS service forwarding mode, kube-proxy will set the <strong id="cce_faq_00409__b6819102319384">net.ipv4.vs.conn_reuse_mode</strong> value of all nodes to <strong id="cce_faq_00409__b19670153643810">0</strong> by default. This causes <a href="#cce_faq_00409__en-us_topic_0000001250194017_li4733164313217">Problem 1</a>: The RS cannot be removed when the port is reused.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="cce_faq_00409__row13154314327"><td class="cellrowborder" valign="top" width="8.19081908190819%" headers="mcps1.3.4.3.1.4.1.1 "><p id="cce_faq_00409__p17315331103212">1.19 or later</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="38.14381438143815%" headers="mcps1.3.4.3.1.4.1.2 "><p id="cce_faq_00409__p1740010483414">kube-proxy sets the value of <strong id="cce_faq_00409__b19717136125112">net.ipv4.vs.conn_reuse_mode</strong> based on the kernel version. For details, see <a href="https://github.com/kubernetes/kubernetes/pull/88541" target="_blank" rel="noopener noreferrer">ipvs: only attempt setting of sysctlconnreuse on supported kernels</a>.</p>
|
|
<ul id="cce_faq_00409__ul18506187143418"><li id="cce_faq_00409__li923691943412">If the kernel version is later than 4.1, kube-proxy will set <strong id="cce_faq_00409__b9548126104913">net.ipv4.vs.conn_reuse_mode</strong> to <strong id="cce_faq_00409__b5310102864912">0</strong>.</li><li id="cce_faq_00409__li145061178342">In other cases, the default value <strong id="cce_faq_00409__b10232142714300">1</strong> will be retained.</li></ul>
|
|
<div class="note" id="cce_faq_00409__note1471616191219"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="cce_faq_00409__p13471616141215">This issue has been resolved in Linux kernel 5.9. Since Kubernetes 1.22, kube-proxy does not modify the <strong id="cce_faq_00409__b1036976165215">net.ipv4.vs.conn_reuse_mode</strong> parameter of nodes that use the kernel 5.9 or later. For details, see <a href="https://github.com/kubernetes/kubernetes/pull/102122" target="_blank" rel="noopener noreferrer">Don't set sysctl net.ipv4.vs.conn_reuse_mode for kernels >=5.9</a>.</p>
|
|
</div></div>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="53.66536653665366%" headers="mcps1.3.4.3.1.4.1.3 "><div class="p" id="cce_faq_00409__p5414123117214">If the IPVS service forwarding mode is used in CCE clusters of 1.19.16-r0 or later, the value of <strong id="cce_faq_00409__b135131316174">net.ipv4.vs.conn_reuse_mode</strong> varies with the kernel versions of node OSs.<ul id="cce_faq_00409__en-us_topic_0000001250194017_ul19050428333"><li id="cce_faq_00409__en-us_topic_0000001250194017_li0905842193320">For a node running EulerOS 2.5, if the kernel version is earlier than 4.1, kube-proxy will keep <strong id="cce_faq_00409__b29651942102810">net.ipv4.vs.conn_reuse_mode</strong> at <strong id="cce_faq_00409__b14965154222819">1</strong>. This results in <a href="#cce_faq_00409__en-us_topic_0000001250194017_li6733154317325">Problem 2</a>, which is, there is a 1-second latency in the high-concurrency scenarios.</li><li id="cce_faq_00409__li1990105708">For a node running EulerOS 2.9, if the kernel version is too early, kube-proxy will set <strong id="cce_faq_00409__b1882734620117">net.ipv4.vs.conn_reuse_mode</strong> to <strong id="cce_faq_00409__b182661498119">0</strong>. This results in <a href="#cce_faq_00409__en-us_topic_0000001250194017_li4733164313217">Problem 1</a>. To resolve this problem, upgrade the kernel version. For details, see <a href="#cce_faq_00409__en-us_topic_0000001250194017_section059433819273">Rectification Plan</a>.</li><li id="cce_faq_00409__li0686192103518">For a node running HCE OS 2.0 or Ubuntu 22.04, the kernel version is later than 5.9. The problem has been resolved.</li></ul>
|
|
</div>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="cce_faq_00409__en-us_topic_0000001250194017_section1660952412720"><h4 class="sectiontitle">Suggestions</h4><p id="cce_faq_00409__en-us_topic_0000001250194017_p57213339279">Evaluate the impact of these problems. If they affect your services, take the following measures:</p>
|
|
<ol id="cce_faq_00409__en-us_topic_0000001250194017_ol11721163315272"><li id="cce_faq_00409__li12144145610367">Use an OS that is not affected by the preceding issues, for example, HCE OS 2.0 or Ubuntu 22.04. The newly created nodes which run EulerOS 2.9 are not affected by the preceding issues. Upgrade the earlier kernel versions used by existing nodes to the fixed version. For details, see <a href="#cce_faq_00409__en-us_topic_0000001250194017_section059433819273">Rectification Plan</a>.</li><li id="cce_faq_00409__en-us_topic_0000001250194017_li772133317272">Use a cluster whose forwarding mode is iptables.</li></ol>
|
|
</div>
|
|
<div class="section" id="cce_faq_00409__en-us_topic_0000001250194017_section059433819273"><a name="cce_faq_00409__en-us_topic_0000001250194017_section059433819273"></a><a name="en-us_topic_0000001250194017_section059433819273"></a><h4 class="sectiontitle">Rectification Plan</h4><p id="cce_faq_00409__en-us_topic_0000001250194017_p995510487279">If you use a node running EulerOS 2.9, check whether the kernel version meets the requirements. If the kernel version of the node is too early, reset the node or create a new one.</p>
|
|
<div class="p" id="cce_faq_00409__p1811735914432">The following kernel versions are recommended:<ul id="cce_faq_00409__ul13106122415437"><li id="cce_faq_00409__li6208163704313">x86: 4.18.0-147.5.1.6.h686.eulerosv2r9.x86_64</li><li id="cce_faq_00409__li81069242439">Arm: 4.19.90-vhulk2103.1.0.h584.eulerosv2r9.aarch64</li></ul>
|
|
</div>
|
|
<p id="cce_faq_00409__en-us_topic_0000001250194017_p11955848112715">Kubernetes community issue: <a href="https://github.com/kubernetes/kubernetes/issues/81775" target="_blank" rel="noopener noreferrer">https://github.com/kubernetes/kubernetes/issues/81775</a></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="cce_faq_00407.html">OSs</a></div>
|
|
</div>
|
|
</div>
|
|
|