doc-exports/docs/dws/dev/dws_04_0489.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

96 lines
12 KiB
HTML

<a name="EN-US_TOPIC_0000001188642100"></a><a name="EN-US_TOPIC_0000001188642100"></a>
<h1 class="topictitle1">Case: Rewriting SQL Statements and Deleting in-clause</h1>
<div id="body8662426"><div class="section" id="EN-US_TOPIC_0000001188642100__sc5ec6aba5ded40458c3df395f979d427"><h4 class="sectiontitle">Before Optimization</h4><p id="EN-US_TOPIC_0000001188642100__a219cad1bf7084b9594a5de0749218185">in-clause/any-clause is a common SQL statement constraint. Sometimes, the clause following <strong id="EN-US_TOPIC_0000001188642100__b191961192595010">in</strong> or <strong id="EN-US_TOPIC_0000001188642100__b10218466595010">any</strong> is a constant. For example:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642100__sc51278c1221c4786bc7119d54cf678a0"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span>
<span class="k">count</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span>
<span class="k">from</span><span class="w"> </span><span class="n">calc_empfyc_c1_result_tmp_t1</span><span class="w"> </span>
<span class="k">where</span><span class="w"> </span><span class="n">ls_pid_cusr1</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="s1">'20120405'</span><span class="p">,</span><span class="w"> </span><span class="s1">'20130405'</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642100__add62b0b5c6294385a5e5245c4fe38cb8">or</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642100__s23a8baefb30a477783ae85906f88870b"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span>
<span class="k">count</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span>
<span class="k">from</span><span class="w"> </span><span class="n">calc_empfyc_c1_result_tmp_t1</span><span class="w"> </span>
<span class="k">where</span><span class="w"> </span><span class="n">ls_pid_cusr1</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">any</span><span class="p">(</span><span class="s1">'20120405'</span><span class="p">,</span><span class="w"> </span><span class="s1">'20130405'</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642100__a5a6948d583634ba493504e9f1e31c7e1">Some special usages are as follows:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642100__s87e8e77eb889406eb21a0295cc83ea2d"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span>
<span class="n">ls_pid_cusr1</span><span class="p">,</span><span class="k">COALESCE</span><span class="p">(</span><span class="k">max</span><span class="p">(</span><span class="n">round</span><span class="p">((</span><span class="k">current_date</span><span class="o">-</span><span class="n">bthdate</span><span class="p">)</span><span class="o">/</span><span class="mi">365</span><span class="p">)),</span><span class="mi">0</span><span class="p">)</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">calc_empfyc_c1_result_tmp_t1</span><span class="w"> </span><span class="n">t1</span><span class="p">,</span><span class="n">p10_md_tmp_t2</span><span class="w"> </span><span class="n">t2</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">t1</span><span class="p">.</span><span class="n">ls_pid_cusr1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">any</span><span class="p">(</span><span class="k">values</span><span class="p">(</span><span class="n">id</span><span class="p">),(</span><span class="n">id15</span><span class="p">))</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">ls_pid_cusr1</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642100__a441ae7b510b44c828f52e365b0308dba">Where <strong id="EN-US_TOPIC_0000001188642100__b178031715595010">id</strong> and <strong id="EN-US_TOPIC_0000001188642100__b94213802095010">id15</strong> are columns of p10_md_tmp_t2. ls_pid_cusr1 = any(values(id),(id15)) equals t1. ls_pid_cusr1 = id or t1. ls_pid_cusr1 = id15.</p>
<p id="EN-US_TOPIC_0000001188642100__a52c4ca7966b04a43a551fd226c3a0b49">Therefore, join-condition is essentially an inequality, and nestloop must be used for this join operation. The execution plan is as follows:</p>
<p id="EN-US_TOPIC_0000001188642100__a25fef504bddd496ea63e78e4d686709a"><span><img id="EN-US_TOPIC_0000001188642100__ied62f5b5f51d4f11b510025f8897e5ff" src="figure/en-us_image_0000001188163812.png"></span></p>
</div>
<div class="section" id="EN-US_TOPIC_0000001188642100__s0589c30fef2b4748b0f46112fb8e6611"><h4 class="sectiontitle">After Optimization</h4><p id="EN-US_TOPIC_0000001188642100__aef9f245c318947ea83240809785807d3">The test result shows that both result sets are too large. As a result, nestloop is time-consuming with more than one hour to return results. Therefore, the key to performance optimization is to eliminate nestloop, using more efficient hashjoin. From the perspective of semantic equivalence, the SQL statements can be written as follows:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642100__saa694eb7a9e645a7bdbe99234542c257"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span>
<span class="n">ls_pid_cusr1</span><span class="p">,</span><span class="k">COALESCE</span><span class="p">(</span><span class="k">max</span><span class="p">(</span><span class="n">round</span><span class="p">(</span><span class="n">ym</span><span class="o">/</span><span class="mi">365</span><span class="p">)),</span><span class="mi">0</span><span class="p">)</span>
<span class="k">from</span>
<span class="p">(</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span>
<span class="w"> </span><span class="n">ls_pid_cusr1</span><span class="p">,(</span><span class="k">current_date</span><span class="o">-</span><span class="n">bthdate</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">ym</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">calc_empfyc_c1_result_tmp_t1</span><span class="w"> </span><span class="n">t1</span><span class="p">,</span><span class="n">p10_md_tmp_t2</span><span class="w"> </span><span class="n">t2</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">t1</span><span class="p">.</span><span class="n">ls_pid_cusr1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t2</span><span class="p">.</span><span class="n">id</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">t1</span><span class="p">.</span><span class="n">ls_pid_cusr1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">t2</span><span class="p">.</span><span class="n">id15</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="k">union</span><span class="w"> </span><span class="k">all</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">SELECT</span><span class="w"> </span>
<span class="w"> </span><span class="n">ls_pid_cusr1</span><span class="p">,(</span><span class="k">current_date</span><span class="o">-</span><span class="n">bthdate</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">ym</span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">calc_empfyc_c1_result_tmp_t1</span><span class="w"> </span><span class="n">t1</span><span class="p">,</span><span class="n">p10_md_tmp_t2</span><span class="w"> </span><span class="n">t2</span>
<span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">t1</span><span class="p">.</span><span class="n">ls_pid_cusr1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">id15</span>
<span class="w"> </span><span class="p">)</span>
<span class="p">)</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">ls_pid_cusr1</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642100__p41233963">Note: Use <strong id="EN-US_TOPIC_0000001188642100__b1213137124513">UNION ALL</strong> instead of <strong id="EN-US_TOPIC_0000001188642100__b17276161064515">UNION</strong> if possible. <strong id="EN-US_TOPIC_0000001188642100__b117377828795010">UNION</strong> eliminates duplicate rows while merging two result sets but <strong id="EN-US_TOPIC_0000001188642100__b117436808895010">UNION ALL</strong> merges the two result sets without deduplication. Therefore, replace <strong id="EN-US_TOPIC_0000001188642100__b65143178995010">UNION</strong> with <strong id="EN-US_TOPIC_0000001188642100__b156541963795010">UNION ALL</strong> if you are sure that the two result sets do not contain duplicate rows based on the service logic.</p>
<p id="EN-US_TOPIC_0000001188642100__aa90831364364461b86d44f19edd69766">The optimized SQL queries consist of two equivalent join subqueries, and each subquery can be used for hashjoin in this scenario. The optimized execution plan is as follows:</p>
<p id="EN-US_TOPIC_0000001188642100__p1722451513212"><span><img id="EN-US_TOPIC_0000001188642100__image192249159329" src="figure/en-us_image_0000001233761929.jpg"></span></p>
<p id="EN-US_TOPIC_0000001188642100__a929ba163b3594730922832837a8c1c6b">Before the optimization, no result is returned for more than 1 hour. After the optimization, the result is returned within 7s.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0474.html">Optimization Cases</a></div>
</div>
</div>