doc-exports/docs/dws/dev/dws_04_0490.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

140 lines
21 KiB
HTML

<a name="EN-US_TOPIC_0000001188323760"></a><a name="EN-US_TOPIC_0000001188323760"></a>
<h1 class="topictitle1">Case: Setting Partial Cluster Keys</h1>
<div id="body1536744408857"><p id="EN-US_TOPIC_0000001188323760__p8060118">You can add <strong id="EN-US_TOPIC_0000001188323760__b95117431449">PARTIAL CLUSTER KEY</strong>(<em id="EN-US_TOPIC_0000001188323760__i10363194715415">column_name</em>[,...]) to the definition of a column-store table to set one or more columns of this table as partial cluster keys. In this way, each 70 CUs (4.2 million rows) will be sorted based on the cluster keys by default during data import and the value range is narrowed down for each of the new 70 CUs. If the <strong id="EN-US_TOPIC_0000001188323760__b56621737043">where</strong> condition in the query statement contains these columns, the filtering performance will be improved.</p>
<div class="section" id="EN-US_TOPIC_0000001188323760__section10153114193014"><h4 class="sectiontitle">Before Optimization</h4><div class="p" id="EN-US_TOPIC_0000001188323760__p186105713304">The partial cluster key is not used. The table is defined as follows:<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188323760__screen122121949103015"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">lineitem</span>
<span class="p">(</span>
<span class="n">L_ORDERKEY</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_PARTKEY</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SUPPKEY</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_LINENUMBER</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_QUANTITY</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_EXTENDEDPRICE</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_DISCOUNT</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_TAX</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_RETURNFLAG</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_LINESTATUS</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SHIPDATE</span><span class="w"> </span><span class="nb">DATE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_COMMITDATE</span><span class="w"> </span><span class="nb">DATE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_RECEIPTDATE</span><span class="w"> </span><span class="nb">DATE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SHIPINSTRUCT</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SHIPMODE</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_COMMENT</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">44</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">)</span>
<span class="k">with</span><span class="w"> </span><span class="p">(</span><span class="n">orientation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">column</span><span class="p">)</span>
<span class="n">distribute</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="n">L_ORDERKEY</span><span class="p">);</span>
<span class="k">select</span>
<span class="k">sum</span><span class="p">(</span><span class="n">l_extendedprice</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">l_discount</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">revenue</span>
<span class="k">from</span>
<span class="n">lineitem</span>
<span class="k">where</span>
<span class="n">l_shipdate</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s1">'1994-01-01'</span><span class="p">::</span><span class="nb">date</span>
<span class="k">and</span><span class="w"> </span><span class="n">l_shipdate</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="s1">'1994-01-01'</span><span class="p">::</span><span class="nb">date</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">interval</span><span class="w"> </span><span class="s1">'1 year'</span>
<span class="k">and</span><span class="w"> </span><span class="n">l_discount</span><span class="w"> </span><span class="k">between</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">06</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">01</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">06</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">01</span>
<span class="k">and</span><span class="w"> </span><span class="n">l_quantity</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">24</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
</div>
<p id="EN-US_TOPIC_0000001188323760__p11454154434214">After the data is imported, perform the query and check the execution time.</p>
<div class="fignone" id="EN-US_TOPIC_0000001188323760__fig159933473117"><span class="figcap"><b>Figure 1 </b>Partial cluster keys not used</span><br><span><img id="EN-US_TOPIC_0000001188323760__image1059933418317" src="figure/en-us_image_0000001551738404.png"></span></div>
<div class="fignone" id="EN-US_TOPIC_0000001188323760__fig8599163453110"><span class="figcap"><b>Figure 2 </b>CU loading without partial cluster keys</span><br><span><img id="EN-US_TOPIC_0000001188323760__image6599143483112" src="figure/en-us_image_0000001602937273.png"></span></div>
</div>
<div class="section" id="EN-US_TOPIC_0000001188323760__section15599193463116"><h4 class="sectiontitle">After Optimization</h4><p id="EN-US_TOPIC_0000001188323760__p05971347318">In the <strong id="EN-US_TOPIC_0000001188323760__b1190212094818">where</strong> condition, both the <strong id="EN-US_TOPIC_0000001188323760__b10316824819">l_shipdate</strong> and <strong id="EN-US_TOPIC_0000001188323760__b194471611154818">l_quantity</strong> columns have a few distinct values, and their values can be used for min/max filtering. Therefore, modify the table definition as follows:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188323760__screen185985348318"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">lineitem</span>
<span class="p">(</span>
<span class="n">L_ORDERKEY</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_PARTKEY</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SUPPKEY</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_LINENUMBER</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_QUANTITY</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_EXTENDEDPRICE</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_DISCOUNT</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_TAX</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_RETURNFLAG</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_LINESTATUS</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SHIPDATE</span><span class="w"> </span><span class="nb">DATE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_COMMITDATE</span><span class="w"> </span><span class="nb">DATE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_RECEIPTDATE</span><span class="w"> </span><span class="nb">DATE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SHIPINSTRUCT</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_SHIPMODE</span><span class="w"> </span><span class="nb">CHAR</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="n">L_COMMENT</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">44</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span>
<span class="p">,</span><span class="w"> </span><span class="k">partial</span><span class="w"> </span><span class="k">cluster</span><span class="w"> </span><span class="k">key</span><span class="p">(</span><span class="n">l_shipdate</span><span class="p">,</span><span class="w"> </span><span class="n">l_quantity</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">with</span><span class="w"> </span><span class="p">(</span><span class="n">orientation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">column</span><span class="p">)</span>
<span class="n">distribute</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">hash</span><span class="p">(</span><span class="n">L_ORDERKEY</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188323760__p10598734203117">Import the data again, perform the query, and check the execution time.</p>
<div class="fignone" id="EN-US_TOPIC_0000001188323760__fig8599334183116"><span class="figcap"><b>Figure 3 </b>Partial cluster keys used</span><br><span><img id="EN-US_TOPIC_0000001188323760__image3599134163120" src="figure/en-us_image_0000001602802473.png"></span></div>
<div class="fignone" id="EN-US_TOPIC_0000001188323760__fig45996342312"><span class="figcap"><b>Figure 4 </b>CU loading with partial cluster keys</span><br><span><img id="EN-US_TOPIC_0000001188323760__image5599113453116" src="figure/en-us_image_0000001551897872.png"></span></div>
<p id="EN-US_TOPIC_0000001188323760__p1259913413110">After partial cluster keys are used, the execution time of <strong id="EN-US_TOPIC_0000001188323760__b6267335419489">5-- CStore Scan on public.lineitem</strong> decreases by 1.2s because 84 CUs are filtered out.</p>
</div>
<div class="section" id="EN-US_TOPIC_0000001188323760__section5759034104119"><h4 class="sectiontitle">Optimization</h4><ul id="EN-US_TOPIC_0000001188323760__ul660595504117"><li id="EN-US_TOPIC_0000001188323760__li1605955134112">Select partial cluster keys.<ul id="EN-US_TOPIC_0000001188323760__ul1338514552167"><li id="EN-US_TOPIC_0000001188323760__li837120133286">The following data types support cluster keys: character varying(n), varchar(n), character(n), char(n), text, nvarchar2, timestamp with time zone, timestamp without time zone, date, time without time zone, and time with time zone.</li><li id="EN-US_TOPIC_0000001188323760__li238585591611">Smaller number of distinct values in a partial cluster key generates higher filtering performance.</li><li id="EN-US_TOPIC_0000001188323760__li121972010318">Columns that can filter out larger amount of data is preferentially selected as partial cluster keys.</li><li id="EN-US_TOPIC_0000001188323760__li675415919326">If multiple columns are selected as partial cluster keys, the columns are used in sequence to sort data. You are advised to select a maximum of three columns.</li></ul>
</li></ul>
</div>
<ul id="EN-US_TOPIC_0000001188323760__ul14138423425"><li id="EN-US_TOPIC_0000001188323760__li1213852174214">Modify parameters to reduce the impact of partial cluster keys on the import performance.<p id="EN-US_TOPIC_0000001188323760__p1050085854011"><a name="EN-US_TOPIC_0000001188323760__li1213852174214"></a><a name="li1213852174214"></a>After partial cluster keys are used, data will be sorted when they are imported, affecting the import performance. If all the data can be sorted in the memory, the keys have little impact on import. If some data cannot be sorted in the memory and is written into a temporary file for sorting, the import performance will be greatly affected.</p>
<p id="EN-US_TOPIC_0000001188323760__p1850075874020">The memory used for sorting is specified by the <strong id="EN-US_TOPIC_0000001188323760__b12043594339489">psort_work_mem</strong> parameter. You can set it to a larger value so that the sorting has less impact on the import performance.</p>
<p id="EN-US_TOPIC_0000001188323760__p55004586405">The volume of data to be sorted is specified by the <strong id="EN-US_TOPIC_0000001188323760__b19965101711511">PARTIAL_CLUSTER_ROWS</strong> parameter of the table. Decreasing the value of this parameter reduces the amount of data to be sorted at a time. <strong id="EN-US_TOPIC_0000001188323760__b239621906">PARTIAL_CLUSTER_ROWS</strong> is usually used along with the <strong id="EN-US_TOPIC_0000001188323760__b199562046165213">MAX_BATCHROW</strong> parameter. The value of <strong id="EN-US_TOPIC_0000001188323760__b13516131265313">PARTIAL_CLUSTER_ROWS</strong> must be an integer multiple of the <strong id="EN-US_TOPIC_0000001188323760__b17546131813538">MAX_BATCHROW</strong> value. <strong id="EN-US_TOPIC_0000001188323760__b42881739125312">MAX_BATCHROW</strong> specifies the maximum number of rows in a CU.</p>
</li></ul>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0474.html">Optimization Cases</a></div>
</div>
</div>