Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

29 lines
2.4 KiB
HTML

<a name="mrs_01_2023"></a><a name="mrs_01_2023"></a>
<h1 class="topictitle1">What Do I have to Note When Using Spark SQL ROLLUP and CUBE?</h1>
<div id="body1595920219938"><div class="section" id="mrs_01_2023__s1b32dcfc91ac45dc9e36c58cb8c07e7e"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2023__a3bbbef80445b4b6f9db995464f621bac">Suppose that there is a table src(d1, d2, m) with the following data:</p>
<pre class="screen" id="mrs_01_2023__s974c6a586c5947fdaf7a911f9f7a90fc">1 a 1
1 b 1
2 b 2</pre>
<p id="mrs_01_2023__a12e3e06b09e9493290887461d9debf23">The results for statement "select d1, sum(d1) from src group by d1, d2 with rollup" are shown as below:</p>
<pre class="screen" id="mrs_01_2023__s88fcdace58834d3d9feecf4e6a99d900">NULL 0
1 2
2 2
1 1
1 1
2 2</pre>
<p id="mrs_01_2023__ac8cc27e319794a2e998344f65c3461e1">Why the first line of the above results is (NULL,0), rather than (NULL,4)?</p>
</div>
<div class="section" id="mrs_01_2023__sa34deb2d2ebb4d8ba22c79c3e70aa9da"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2023__aad162a4b59c84f26aa87316bb015977a">When conducting the rollup and cube operation, we usually perform the dimension-based analysis and what we need is the measurement result, so we would not conduct aggregation operation on the dimension.</p>
<p id="mrs_01_2023__a01abd3f7275e48539aa012b3c9de9743">Suppose that there is a table src(d1, d2, m), so the statement 1 "select d1, sum(m) from src group by d1, d2 with rollup" conducts the rollup operation on the dimension d1 and d2 to compute the result of m. It has actual business meaning, and its results are in line with the expectation. However, the statement 2 "select d1, sum(d1) from src group by d1, d2 with rollup" cannot be explained from the business perspective. For the statement 2, the result for all aggregations (sum/avg/max/min) is 0.</p>
<div class="note" id="mrs_01_2023__n02f1bd80f55b4e22a6224acdd9d4d683"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_2023__a7e18d2d652e44267bf4b88868bcc081c">Only when there is an aggregation operation for fields in "group by" in the rollup and cube operation, the result is 0. For non-rollup and non-cube operations, the result will be in line with the expectation.</p>
</div></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2022.html">Spark SQL and DataFrame</a></div>
</div>
</div>