doc-exports/docs/dws/dev/dws_04_0478.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

99 lines
21 KiB
HTML

<a name="EN-US_TOPIC_0000001188642182"></a><a name="EN-US_TOPIC_0000001188642182"></a>
<h1 class="topictitle1">Case: Pushing Down Sort Operations to DNs</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001188642182__p19126183813503">In an execution plan, more than 95% of the execution time is spent on <strong id="EN-US_TOPIC_0000001188642182__b55851251448">window agg</strong> performed on the CN. In this case, <strong id="EN-US_TOPIC_0000001188642182__b16836459319459">sum</strong> is performed for the two columns separately, and then another <strong id="EN-US_TOPIC_0000001188642182__b4294522439459">sum</strong> is performed for the separate sum results of the two columns. After this, trunc and sorting are performed in sequence. You can try to rewrite the statement into a subquery to push down the sorting operations.</p>
<div class="section" id="EN-US_TOPIC_0000001188642182__s07947b256b6b4780a6a6a749c807a082"><h4 class="sectiontitle">Before optimization</h4><p id="EN-US_TOPIC_0000001188642182__ab66903e7f54c413688aa2261652bd160">The table structure is as follows:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642182__s4a6d1aa414c145fca8f0d222705f1aba"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">public</span><span class="p">.</span><span class="n">test</span><span class="p">(</span><span class="n">imsi</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="n">L4_DW_THROUGHPUT</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="n">L4_UL_THROUGHPUT</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span>
<span class="k">with</span><span class="w"> </span><span class="p">(</span><span class="n">orientation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">column</span><span class="p">)</span><span class="w"> </span><span class="n">DISTRIBUTE</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="n">imsi</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642182__a73f38ee1e7e64dd2b3414592456800e8">The query statements are as follows:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642182__sbeff6568c153439987ab5048c9c0df5b"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="n">over</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">DATACNT</span><span class="p">,</span>
<span class="n">IMSI</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">IMSI_IMSI</span><span class="p">,</span>
<span class="k">CAST</span><span class="p">(</span><span class="n">TRUNC</span><span class="p">(((</span><span class="k">SUM</span><span class="p">(</span><span class="n">L4_UL_THROUGHPUT</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="n">L4_DW_THROUGHPUT</span><span class="p">))),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span>
<span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">20</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">TOTAL_VOLOME_KPIID</span>
<span class="k">FROM</span><span class="w"> </span><span class="k">public</span><span class="p">.</span><span class="n">test</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">test</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">IMSI</span>
<span class="k">order</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">TOTAL_VOLOME_KPIID</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642182__a802c8948f44d4544bbcdfaf606efab0a">The execution plan is as follows:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642182__sa4048a4a2e8649cda5ef8245d6bea47a"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span>
<span class="normal">9</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">Row</span><span class="w"> </span><span class="n">Adapter</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">70</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">70</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">68</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">70</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="p">((</span><span class="n">trunc</span><span class="p">((((</span><span class="k">sum</span><span class="p">(</span><span class="n">l4_ul_throughput</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="k">sum</span><span class="p">(</span><span class="n">l4_dw_throughput</span><span class="p">))))::</span><span class="nb">numeric</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">))::</span><span class="nb">numeric</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">0</span><span class="p">))</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">WindowAgg</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">09</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">51</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">Streaming</span><span class="w"> </span><span class="p">(</span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="n">GATHER</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">242</span><span class="p">.</span><span class="mi">04</span><span class="p">..</span><span class="mi">246</span><span class="p">.</span><span class="mi">84</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">240</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="n">Node</span><span class="o">/</span><span class="n">s</span><span class="p">:</span><span class="w"> </span><span class="k">All</span><span class="w"> </span><span class="n">datanodes</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="k">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">09</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">29</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="k">Group</span><span class="w"> </span><span class="k">By</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="n">imsi</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">CStore</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">01</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642182__a889f6f5437f040a3aa98c9d0561b10b5">As we can see, both <strong id="EN-US_TOPIC_0000001188642182__b12353518039459">window agg</strong> and <strong id="EN-US_TOPIC_0000001188642182__b8955251159459">sort</strong> are performed on the CN, which is time consuming.</p>
</div>
<div class="section" id="EN-US_TOPIC_0000001188642182__s2853f5534a9447d8af70b01b99f7e554"><h4 class="sectiontitle">After optimization</h4><p id="EN-US_TOPIC_0000001188642182__a873821e7070c4dfcab9f0d351370ae12">Modify the statement to a subquery statement, as shown below:</p>
</div>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642182__sd2785499b44449fb8f394ada6609bea5"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="n">over</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">DATACNT</span><span class="p">,</span><span class="w"> </span><span class="n">IMSI_IMSI</span><span class="p">,</span><span class="w"> </span><span class="n">TOTAL_VOLOME_KPIID</span>
<span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="k">SELECT</span><span class="w"> </span><span class="n">IMSI</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">IMSI_IMSI</span><span class="p">,</span>
<span class="k">CAST</span><span class="p">(</span><span class="n">TRUNC</span><span class="p">(((</span><span class="k">SUM</span><span class="p">(</span><span class="n">L4_UL_THROUGHPUT</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="n">L4_DW_THROUGHPUT</span><span class="p">))),</span>
<span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">20</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">TOTAL_VOLOME_KPIID</span>
<span class="k">FROM</span><span class="w"> </span><span class="k">public</span><span class="p">.</span><span class="n">test</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">test</span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">IMSI</span>
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">TOTAL_VOLOME_KPIID</span><span class="w"> </span><span class="k">DESC</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642182__a575723cc8c434f4dbda2babf5a352bb4">Perform <strong id="EN-US_TOPIC_0000001188642182__b842352706103023">sum</strong> on the <strong id="EN-US_TOPIC_0000001188642182__b842352706103028">trunc</strong> results of the two columns, take it as a subquery, and then perform <strong id="EN-US_TOPIC_0000001188642182__b84235270610316">window agg</strong> for the subquery to push down the sorting operation to DNs, as shown below:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188642182__sb54e7b3e24e742eb83188d0e753dd16a"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span>
<span class="normal">8</span>
<span class="normal">9</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">Row</span><span class="w"> </span><span class="n">Adapter</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">70</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">70</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">WindowAgg</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">45</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">70</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">Streaming</span><span class="w"> </span><span class="p">(</span><span class="k">type</span><span class="p">:</span><span class="w"> </span><span class="n">GATHER</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">250</span><span class="p">.</span><span class="mi">83</span><span class="p">..</span><span class="mi">253</span><span class="p">.</span><span class="mi">83</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">240</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
<span class="w"> </span><span class="n">Node</span><span class="o">/</span><span class="n">s</span><span class="p">:</span><span class="w"> </span><span class="k">All</span><span class="w"> </span><span class="n">datanodes</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">45</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">48</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="n">Sort</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="p">((</span><span class="n">trunc</span><span class="p">(((</span><span class="k">sum</span><span class="p">(</span><span class="n">test</span><span class="p">.</span><span class="n">l4_ul_throughput</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="k">sum</span><span class="p">(</span><span class="n">test</span><span class="p">.</span><span class="n">l4_dw_throughput</span><span class="p">)))::</span><span class="nb">numeric</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">))::</span><span class="nb">numeric</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">0</span><span class="p">))</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">Vector</span><span class="w"> </span><span class="n">Hash</span><span class="w"> </span><span class="k">Aggregate</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">10</span><span class="p">.</span><span class="mi">09</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">29</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="w"> </span><span class="k">Group</span><span class="w"> </span><span class="k">By</span><span class="w"> </span><span class="k">Key</span><span class="p">:</span><span class="w"> </span><span class="n">test</span><span class="p">.</span><span class="n">imsi</span>
<span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">CStore</span><span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">10</span><span class="p">.</span><span class="mi">01</span><span class="w"> </span><span class="k">rows</span><span class="o">=</span><span class="mi">10</span><span class="w"> </span><span class="n">width</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188642182__a89ed7f10e82b49109f73ddd341cbd260">The optimized SQL statement greatly improves the performance by reducing the execution time from 120s to 7s.</p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0474.html">Optimization Cases</a></div>
</div>
</div>