Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

112 lines
8.0 KiB
HTML

<a name="mrs_01_2000"></a><a name="mrs_01_2000"></a>
<h1 class="topictitle1">SQL Optimization for Multi-level Nesting and Hybrid Join</h1>
<div id="body1595920219239"><div class="section" id="mrs_01_2000__s201c1a02f75b4a58b028d211ec2f245b"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_2000__a86f0f70172914caea5fbe89720bb14ca">This section describes the optimization suggestions for SQL statements in multi-level nesting and hybrid join scenarios.</p>
</div>
<div class="section" id="mrs_01_2000__s7f1275efcd2c492f8345df7761b66d3f"><h4 class="sectiontitle">Prerequisites</h4><p id="mrs_01_2000__aec5668c59419427f9542c7cf285b7619">The following provides an example of complex query statements:</p>
<pre class="screen" id="mrs_01_2000__sc93031d1706f45fd8f27097a60389341">elect
s_name,
count(1) as numwait
from (
select s_name from (
select
s_name,
t2.l_orderkey,
l_suppkey,
count_suppkey,
max_suppkey
from
test2 t2 right outer join (
select
s_name,
l_orderkey,
l_suppkey from (
select
s_name,
t1.l_orderkey,
l_suppkey,
count_suppkey,
max_suppkey
from
test1 t1 join (
select
s_name,
l_orderkey,
l_suppkey
from
orders o join (
select
s_name,
l_orderkey,
l_suppkey
from
nation n join supplier s
on
s.s_nationkey = n.n_nationkey
and n.n_name = 'SAUDI ARABIA'
join lineitem l
on
s.s_suppkey = l.l_suppkey
where
l.l_receiptdate &gt; l.l_commitdate
and l.l_orderkey is not null
) l1 on o.o_orderkey = l1.l_orderkey and o.o_orderstatus = 'F'
) l2 on l2.l_orderkey = t1.l_orderkey
) a
where
(count_suppkey &gt; 1)
or ((count_suppkey=1)
and (l_suppkey &lt;&gt; max_suppkey))
) l3 on l3.l_orderkey = t2.l_orderkey
) b
where
(count_suppkey is null)
or ((count_suppkey=1)
and (l_suppkey = max_suppkey))
) c
group by
s_name
order by
numwait desc,
s_name
limit 100;</pre>
</div>
<div class="section" id="mrs_01_2000__s9f019f7314af4ebcbb937fb59623050f"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_2000__o8a3a607ec5e04997a97403a942c6ee68"><li id="mrs_01_2000__laf109f64057d456ea76c0b28e5d8c5f1"><span>Analyze business.</span><p><p id="mrs_01_2000__aa2fd89024dd14b8ebb1b527ccf33034a">Analyze business to determine whether SQL statements can be simplified through measures, for example, by combining tables to reduce the number of nesting levels and join times.</p>
</p></li><li id="mrs_01_2000__l24fe6a426a7f45dbbf4e2ef4ca3b1eda"><span>If the SQL statements cannot be simplified, configure the driver memory.</span><p><ul id="mrs_01_2000__u8084ba2344f84aa58ad286db2189973f"><li id="mrs_01_2000__l59fc9ef852c84f4bafb17eadb61b89f8">If SQL statements are executed through spark-submit or spark-sql, go to <a href="#mrs_01_2000__l808aae986fc948d6903dc2cc981034dd">3</a>.</li><li id="mrs_01_2000__l82c71aeba0464a16b4b1c375116ed139">If SQL statements are executed through spark-beeline, go to <a href="#mrs_01_2000__l156d16075fcf45fdb9aaea523ea81729">4</a>.</li></ul>
</p></li><li id="mrs_01_2000__l808aae986fc948d6903dc2cc981034dd"><a name="mrs_01_2000__l808aae986fc948d6903dc2cc981034dd"></a><a name="l808aae986fc948d6903dc2cc981034dd"></a><span>During execution of SQL statements, specify the <strong id="mrs_01_2000__a9d4af25026374733aca8cde424da2e1d">driver-memory</strong> parameter. An example of SQL statements is as follows:</span><p><p id="mrs_01_2000__a29dc94932ac9466bafd15b5eeabf8f36"><strong id="mrs_01_2000__af5975b82b6fb42ec850983e772b9fd01">/spark-sql --master=local[4] --driver-memory=512M -f /tpch.sql</strong></p>
</p></li><li id="mrs_01_2000__l156d16075fcf45fdb9aaea523ea81729"><a name="mrs_01_2000__l156d16075fcf45fdb9aaea523ea81729"></a><a name="l156d16075fcf45fdb9aaea523ea81729"></a><span>Before running SQL statements, change the memory size as the <span id="mrs_01_2000__ph46491861246">system </span>administrator.</span><p><ol type="a" id="mrs_01_2000__od28bebef18a4424793f05f196d9e111a"><li id="mrs_01_2000__ld1a8e5e18b64401aa3babf90ceaec296">Log in to FusionInsight Manager and choose <strong id="mrs_01_2000__b12293512913">Cluster &gt; </strong><em id="mrs_01_2000__i132319512919">Name of the desired cluster</em><strong id="mrs_01_2000__b2229135091"> &gt; Service<span id="mrs_01_2000__ph15149528134019">s</span> </strong>&gt; <strong id="mrs_01_2000__a6bda9b42e87d47b8ba33b7bcf8d7f67a">Spark2x </strong>&gt;<strong id="mrs_01_2000__a2de281bd0b564000af0ee3a02fb2e53b"> Configuration<span id="mrs_01_2000__ph13721183164010">s</span></strong>.</li><li id="mrs_01_2000__ldf78b9a543704fd58fdde9068231b054">On the displayed page, click <strong id="mrs_01_2000__b1018995413527">All Configurations</strong> and search for <strong id="mrs_01_2000__a620cf81477774280bc46d229917d9bb1">SPARK_DRIVER_MEMORY</strong>.</li><li id="mrs_01_2000__l8160516a56e14f36a391fb6aedc590e3">Modify the <strong id="mrs_01_2000__ad2c0bee85f524f3f9e21956196f61c6b">SPARK_DRIVER_MEMORY</strong> parameter value to increase the memory size. The parameter value consists of two parts: memory size (an integer) and the unit (M or G), for example, <strong id="mrs_01_2000__aebbb87aa1aab4482963bb9481bd58e54">512M</strong>.</li></ol>
</p></li></ol>
</div>
<div class="section" id="mrs_01_2000__s59d6c99a99b24e27b0fd8e04cb07796e"><h4 class="sectiontitle">Reference</h4><p id="mrs_01_2000__aa14f9e85681d43e39ada793c2c43016f">In the event of insufficient driver memory, the following error may be displayed during the query:</p>
<pre class="screen" id="mrs_01_2000__s6e9732da79a54885aeb79eaa445b9877">2018-02-11 09:13:14,683 | WARN | Executor task launch worker for task 5 | Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0. | org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch.spill(RowBasedKeyValueBatch.java:173)
2018-02-11 09:13:14,682 | WARN | Executor task launch worker for task 3 | Calling spill() on RowBasedKeyValueBatch. Will not spill but return 0. | org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch.spill(RowBasedKeyValueBatch.java:173)
2018-02-11 09:13:14,704 | ERROR | Executor task launch worker for task 2 | Exception in task 2.0 in stage 1.0 (TID 2) | org.apache.spark.internal.Logging$class.logError(Logging.scala:91)
java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:100)
at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:791)
at org.apache.spark.unsafe.map.BytesToBytesMap.&lt;init&gt;(BytesToBytesMap.java:208)
at org.apache.spark.unsafe.map.BytesToBytesMap.&lt;init&gt;(BytesToBytesMap.java:223)
at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.&lt;init&gt;(UnsafeFixedWidthAggregationMap.java:104)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.createHashMap(HashAggregateExec.scala:307)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:381)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)</pre>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1985.html">Spark SQL and DataFrame Tuning</a></div>
</div>
</div>