Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

40 lines
5.0 KiB
HTML

<a name="mrs_01_1990"></a><a name="mrs_01_1990"></a>
<h1 class="topictitle1">Multiple JDBC Clients Concurrently Connecting to JDBCServer</h1>
<div id="body1595920218431"><div class="section" id="mrs_01_1990__se9244b68372944928499ce32dff51467"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1990__ae05b426ffd02443088f37989be8328e2">Multiple clients can be connected to JDBCServer at the same time. However, if the number of concurrent tasks is too large, the default configuration of JDBCServer must be optimized to adapt to the scenario.</p>
</div>
<div class="section" id="mrs_01_1990__sd4eb11d4a4da4433ba638ee8a1dc6c1d"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_1990__o721cf9b2d8954d9c8883c62f4af0cdfd"><li id="mrs_01_1990__lcc0bc45459384d8791573751d8996a96">Set the fair scheduling policy of JDBCServer.<div class="p" id="mrs_01_1990__a28b8e936795c4ced9cd3c6481fa0cf07"><a name="mrs_01_1990__lcc0bc45459384d8791573751d8996a96"></a><a name="lcc0bc45459384d8791573751d8996a96"></a>The default scheduling policy of Spark is <strong id="mrs_01_1990__b4557603578">FIFO</strong>, which may cause a failure of short tasks in multi-task scenarios. Therefore, the fair scheduling policy must be used in multi-task scenarios to prevent task failure.<ol type="a" id="mrs_01_1990__o4a1a780875f24401a8a499525e6ab98f"><li id="mrs_01_1990__ld6f4f549733a47b09446c05e98fb4bc9">For details about how to configure Fair Scheduler in Spark, visit <a href="http://spark.apache.org/docs/3.1.1/job-scheduling.html#scheduling-within-an-application" target="_blank" rel="noopener noreferrer">http://spark.apache.org/docs/3.1.1/job-scheduling.html#scheduling-within-an-application</a>.</li><li id="mrs_01_1990__l4cab9a3de03e4eb8aefe85f396efcf04">Configure Fair Scheduler on the JDBC client.<ol class="substepthirdol" id="mrs_01_1990__od906f962b0e44f1688693ce7f120d5b5"><li id="mrs_01_1990__l5a14c99fd3a649ff96046a42237e61fe">In the Beeline command line client or the code defined by JDBC, run the following statement:<p id="mrs_01_1990__ad2fae1cfd15e438886b3d0458ee39f2a"><a name="mrs_01_1990__l5a14c99fd3a649ff96046a42237e61fe"></a><a name="l5a14c99fd3a649ff96046a42237e61fe"></a><strong id="mrs_01_1990__b179168391583">PoolName</strong> is a scheduling pool for Fair Scheduler.</p>
<pre class="screen" id="mrs_01_1990__s3f60da592e9741ac840e55dc23e9667c">SET spark.sql.thriftserver.scheduler.pool=PoolName;</pre>
</li><li id="mrs_01_1990__l1e843234bbb64356a27229f9f1369f45">Run the SQL command. The Spark task will be executed in the preceding scheduling pool.</li></ol>
</li></ol>
</div>
</li><li id="mrs_01_1990__lea4c072fbb1f4db1a6a1502966420c71">Set the <strong id="mrs_01_1990__b0931125855920">BroadCastHashJoin</strong> timeout interval.<div class="p" id="mrs_01_1990__a2cb1dc9e9a5a4c7085d3cb9b9c0a75f0">There is a timeout parameter of <strong id="mrs_01_1990__b13478141017">BroadCastHashJoin</strong>. The task query fails if the query period exceeds the preset timeout interval. In multi-task scenarios, the Spark task of BroadCastHashJoin may fail due to resource preemption. Therefore, it is necessary to modify the timeout interval in the <strong id="mrs_01_1990__b1454913321010">spark-defaults.conf</strong> file of JDBCServer.
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1990__tf783bc54dbb048eea7ef0c02f29c79ac" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_1990__rf433b3c6c9bd4a2b921d5811baba8e57"><th align="left" class="cellrowborder" valign="top" width="24.75%" id="mcps1.3.2.2.2.2.3.2.4.1.1"><p id="mrs_01_1990__ad6c7ae59e82d460a9959752d1ebedd4f">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="48.620000000000005%" id="mcps1.3.2.2.2.2.3.2.4.1.2"><p id="mrs_01_1990__a1a3dbb084ca44163815247126e356d0e">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="26.63%" id="mcps1.3.2.2.2.2.3.2.4.1.3"><p id="mrs_01_1990__a8b72f43487f0422d9350009d7996f8bc">Default Value</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1990__ra51a01bddd7d41cb95fdf5d257e36967"><td class="cellrowborder" valign="top" width="24.75%" headers="mcps1.3.2.2.2.2.3.2.4.1.1 "><p id="mrs_01_1990__aff366ae080dc4bd5814567ba92ea182c">spark.sql.broadcastTimeout</p>
</td>
<td class="cellrowborder" valign="top" width="48.620000000000005%" headers="mcps1.3.2.2.2.2.3.2.4.1.2 "><p id="mrs_01_1990__a1a85f1e55af34c60ba44793632bf0f64">The timeout interval in the broadcast table of <strong id="mrs_01_1990__b17167122717210">BroadcastHashJoin</strong>. If there are many concurrent tasks, set the parameter to a larger value or a negative number.</p>
</td>
<td class="cellrowborder" valign="top" width="26.63%" headers="mcps1.3.2.2.2.2.3.2.4.1.3 "><p id="mrs_01_1990__a7ea3a8f6f6294d39ba690f90faaac26d">-1 (Numeral type. The actual value is 5 minutes.)</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1985.html">Spark SQL and DataFrame Tuning</a></div>
</div>
</div>