forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
59 lines
8.9 KiB
HTML
59 lines
8.9 KiB
HTML
<a name="mrs_01_1948"></a><a name="mrs_01_1948"></a>
|
|
|
|
<h1 class="topictitle1">Enhancing Stability in a Limited Memory Condition</h1>
|
|
<div id="body1595920206812"><div class="section" id="mrs_01_1948__s5ccaa08c9e3d4e8891c4a082f7b1bd0f"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1948__a018e28a4ce214eaf87e410983d0f659a">A large amount of memory is required when Spark SQL executes a query, especially during Aggregate and Join operations. If the memory is limited, OutOfMemoryError may occur. Stability in a limited memory condition ensures queries to be run in limited memory without OutOfMemoryError.</p>
|
|
<div class="note" id="mrs_01_1948__n77573a24ecdc4b0cb1652b712b72f2a9"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1948__a4df999fce09141b88a5069a8c64a8e2f">Limited memory does not mean infinitely small memory, but ensures stable queries by using disks in a scenario where memory fails to store the data amount that is several times larger than the available memory size. For example, for queries involving Join, the data of the same key used for Join needs to be stored in memory. If the data amount is too large to be stored in the available memory, OutOfMemoryError occurs.</p>
|
|
</div></div>
|
|
<p id="mrs_01_1948__afb70693b302446429aabc3525d43169e">Stability in a limited memory condition involves the following sub-functions:</p>
|
|
<ol id="mrs_01_1948__o4d68b12d83254cf4accc1f3c2ad5050d"><li id="mrs_01_1948__l7552e2c805b94675a7ce3edf7053ba0d">ExternalSort<p id="mrs_01_1948__a16d6318d9b424c0f96416f5c36acbeea"><a name="mrs_01_1948__l7552e2c805b94675a7ce3edf7053ba0d"></a><a name="l7552e2c805b94675a7ce3edf7053ba0d"></a>If the memory is inadequate during sorting, partial data overflows to disks.</p>
|
|
</li><li id="mrs_01_1948__l7330289aa7b54d1cb6fd9134c3a222e6">TungstenAggregate<p id="mrs_01_1948__a5d014e3791e941438f54f7700fac37f9"><a name="mrs_01_1948__l7330289aa7b54d1cb6fd9134c3a222e6"></a><a name="l7330289aa7b54d1cb6fd9134c3a222e6"></a>By default, ExternalSort is used to sort data before data aggregation. Therefore, if the memory is inadequate, the data overflows to disks during sorting. The data has been properly sorted before aggregation and only aggregation results of the current key are remained, which use a small amount of memory.</p>
|
|
</li><li id="mrs_01_1948__lb149c12433be4812ae83f5d0217a63a6">SortMergeJoin and SortMergeOuterJoin<p id="mrs_01_1948__a81a67592e912440daa2bb31653a0a862"><a name="mrs_01_1948__lb149c12433be4812ae83f5d0217a63a6"></a><a name="lb149c12433be4812ae83f5d0217a63a6"></a>SortMergeJoin and SortMergeOuterJoinan are based on the equivalence join of sorted data. By default, ExternalSort is used to sort the data before the equivalence join. Therefore, if the memory is inadequate, the data overflows to disks during sorting. The data has been properly sorted before the equivalence join and only the data of the same key are remained, which uses a small amount of memory.</p>
|
|
</li></ol>
|
|
</div>
|
|
<div class="section" id="mrs_01_1948__sb255977d7f914000beaa6f33c29a9cf7"><h4 class="sectiontitle">Configuration</h4><p id="mrs_01_1948__a8c256ca7444445ca813c63bcf3709f80"><strong id="mrs_01_1948__b164463913352">Navigation path for setting parameters:</strong></p>
|
|
<p id="mrs_01_1948__a13b4514f5cec479ea55e07fb363962d4">When submitting an application, set the following parameters using <span class="parmname" id="mrs_01_1948__parmname468218112359"><b>--conf</b></span> or adjust the parameters in the <span class="filepath" id="mrs_01_1948__filepath11682201133520"><b>spark-defaults.conf</b></span> configuration file on the client.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1948__tfda7ed6906cd49f9834bff000d8d1f90" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_1948__rec3b282c92cb4717aae24defa6fdc2e3"><th align="left" class="cellrowborder" valign="top" width="25.91259125912591%" id="mcps1.3.2.4.2.5.1.1"><p id="mrs_01_1948__abf5a0b7e5ce6416285870eb9fbbf64c1">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="15.791579157915791%" id="mcps1.3.2.4.2.5.1.2"><p id="mrs_01_1948__a8a0a9a60b6fa482689d1895bd8f259ec">Scenario</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="46.41464146414641%" id="mcps1.3.2.4.2.5.1.3"><p id="mrs_01_1948__a48000d547d83464f9c7769abe951b5d3">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="11.881188118811883%" id="mcps1.3.2.4.2.5.1.4"><p id="mrs_01_1948__ae9c45759b8294ad6bccc82929058b322">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_1948__r431b0672ccd947af83b8ff634ef916c9"><td class="cellrowborder" valign="top" width="25.91259125912591%" headers="mcps1.3.2.4.2.5.1.1 "><p id="mrs_01_1948__af6570b074dc04d19ac3e961825c7e636">spark.sql.tungsten.enabled</p>
|
|
</td>
|
|
<td class="cellrowborder" rowspan="2" valign="top" width="15.791579157915791%" headers="mcps1.3.2.4.2.5.1.2 "><p id="mrs_01_1948__p8588325175720">/</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="46.41464146414641%" headers="mcps1.3.2.4.2.5.1.3 "><p id="mrs_01_1948__aa9f60c02d88d402387077385eb887b72">Type: Boolean</p>
|
|
<ul id="mrs_01_1948__u2ea46f3d6562440296cbc992302e546c"><li id="mrs_01_1948__lb4e6f84e2d3d4b1780cd6fe7f7930183">If the value is <strong id="mrs_01_1948__b0796155114417">true</strong>, tungsten is enabled. That is, the logic plan is equivalent to the codegeneration function, and the physical plan uses the corresponding tungsten execution plan.</li><li id="mrs_01_1948__l212fb19841984c1986d2471a96d00afd">If the value is <strong id="mrs_01_1948__b63381854124219">false</strong>, tungsten is disabled.</li></ul>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.881188118811883%" headers="mcps1.3.2.4.2.5.1.4 "><p id="mrs_01_1948__a7150839aeec44332991012d57910530c">true</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1948__r1113474f10774860983b04a7aea105e9"><td class="cellrowborder" valign="top" headers="mcps1.3.2.4.2.5.1.1 "><p id="mrs_01_1948__a43b2279fe6f144d4ace54a32f1d9e8ea">spark.sql.codegen.wholeStage</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" headers="mcps1.3.2.4.2.5.1.2 "><p id="mrs_01_1948__ace86f0211e184c72ab85d49ba2235d65">Type: Boolean</p>
|
|
<ul id="mrs_01_1948__u0ada36a8044749979dced7f1783ba144"><li id="mrs_01_1948__l4702b4e35592456cafd3a76fc91469ca">If the value is <strong id="mrs_01_1948__b5769115917421">true</strong>, codegeneration is enabled. That is, for some specified queries, the logic plan code will be generated dynamically when running.</li><li id="mrs_01_1948__l007c1b1175254255867b32a0253c858c">If the value is <strong id="mrs_01_1948__b13873317442">false</strong>, codegeneration is disabled and the existing static code is used.</li></ul>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" headers="mcps1.3.2.4.2.5.1.3 "><p id="mrs_01_1948__a904168e06b984d45aeac6d064ee7caa5">true</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<div class="note" id="mrs_01_1948__n45e1033a80d34e9f8c580b1bc69b3e0e"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ol id="mrs_01_1948__oa9788d8076414fe6ae5d2762c56100c3"><li id="mrs_01_1948__lbe86f5c051c5437c8fcb1f1a63929c95">To enable ExternalSort, you need to set <strong id="mrs_01_1948__b280319594417">spark.sql.planner.externalSort</strong> to <strong id="mrs_01_1948__b18808651442">true</strong> and <strong id="mrs_01_1948__b9808115114413">spark.sql.unsafe.enabled</strong> to <strong id="mrs_01_1948__b880820517442">false</strong> or <strong id="mrs_01_1948__b1080825104413">spark.sql.codegen.wholeStage</strong> to <strong id="mrs_01_1948__b16808152449">false</strong>.</li><li id="mrs_01_1948__l26e63aafca414d43b6115f2598754514">To enable TungstenAggregate, use either of the following methods:<p id="mrs_01_1948__acfe3f250b83a4a25b12ac8002de3a622"><a name="mrs_01_1948__l26e63aafca414d43b6115f2598754514"></a><a name="l26e63aafca414d43b6115f2598754514"></a>Set <strong id="mrs_01_1948__b4688115815441">spark.sql.codegen.wholeStage</strong> and <strong id="mrs_01_1948__b769305815449">spark.sql.unsafe.enabled</strong> to <strong id="mrs_01_1948__b11694135813442">true</strong> in the configuration file or CLI.</p>
|
|
<p id="mrs_01_1948__af9616e4fbb2f4a2e8160055ef3b5440a">If neither <strong id="mrs_01_1948__b15751133474615">spark.sql.codegen.wholeStage</strong> nor <strong id="mrs_01_1948__b19537114104620">spark.sql.unsafe.enabled</strong> is <strong id="mrs_01_1948__b153713494616">true</strong> or either of them is <strong id="mrs_01_1948__b653715444611">true</strong>, TungstenAggregate is enabled as long as <strong id="mrs_01_1948__b45382494617">spark.sql.tungsten.enabled</strong> is set to <strong id="mrs_01_1948__b185383444618">true</strong>.</p>
|
|
</li></ol>
|
|
</div></div>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1941.html">Scenario-Specific Configuration</a></div>
|
|
</div>
|
|
</div>
|
|
|