forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
28 lines
4.3 KiB
HTML
28 lines
4.3 KiB
HTML
<a name="EN-US_TOPIC_0000001188163558"></a><a name="EN-US_TOPIC_0000001188163558"></a>
|
|
|
|
<h1 class="topictitle1">Selecting a Distribution Key</h1>
|
|
<div id="body8662426"><p id="EN-US_TOPIC_0000001188163558__a8bc4b1964cee4988bba61310304ea8b8">Using the following principles to select a distribution key for a hash table:</p>
|
|
<ol id="EN-US_TOPIC_0000001188163558__ol29343014911"><li id="EN-US_TOPIC_0000001188163558__li993416017495"><strong id="EN-US_TOPIC_0000001188163558__a2e01a4d39fa04d51814bf0093d04a484">The values of the distribution key should be discrete so that data can be evenly distributed on each DN.</strong> You can select the primary key of the table as the distribution key. For example, for a person information table, choose the ID number column as the distribution key.</li><li id="EN-US_TOPIC_0000001188163558__li993480194918"><strong id="EN-US_TOPIC_0000001188163558__a529edd2e23f940089b97f079e83d44f2">Do not select the column that has a constant filter.</strong> For example, if a constant constraint (for example, zqdh= '000001') exists on the <strong id="EN-US_TOPIC_0000001188163558__b68731857541818">zqdh</strong> column in some queries on the <strong id="EN-US_TOPIC_0000001188163558__b135026267641818">dwcjk</strong> table, you are not advised to use <strong id="EN-US_TOPIC_0000001188163558__b44859862841818">zqdh</strong> as the distribution key.</li><li id="EN-US_TOPIC_0000001188163558__li4935110124917"><strong id="EN-US_TOPIC_0000001188163558__a5d53bae347be4959a7d42f087d6ab01a">With the above principles met, you can select join conditions as distribution keys</strong>, so that join tasks can be pushed down to DNs for execution, reducing the amount of data transferred between the DNs.<p id="EN-US_TOPIC_0000001188163558__a4af0ed216612409ea73522d1b1f5ae64">For a hash table, an inappropriate distribution key may cause data skew or poor I/O performance on certain DNs. Therefore, you need to check the table to ensure that data is evenly distributed on each DN. You can run the following SQL statements to check for data skew:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188163558__s9b81a8050014459d962cffd10a0611b5"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span>
|
|
<span class="normal">3</span>
|
|
<span class="normal">4</span>
|
|
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span>
|
|
<span class="n">xc_node_id</span><span class="p">,</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="k">FROM</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span>
|
|
<span class="k">group</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">xc_node_id</span><span class="w"> </span>
|
|
<span class="k">order</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">xc_node_id</span><span class="w"> </span><span class="k">desc</span><span class="p">;</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001188163558__p173751945154918"><strong id="EN-US_TOPIC_0000001188163558__b16650436641818">xc_node_id</strong> corresponds to a DN. Generally, <strong id="EN-US_TOPIC_0000001188163558__b13556896241818">over 5% difference between the amount of data on different DNs is regarded as data skew. If the difference is over 10%, choose another distribution key.</strong></p>
|
|
</li><li id="EN-US_TOPIC_0000001188163558__li796918294492">You are not advised to add a column as a distribution key, especially add a new column and use the SEQUENCE value to fill the column. (Sequences may cause performance bottlenecks and unnecessary maintenance costs.)</li></ol>
|
|
<p id="EN-US_TOPIC_0000001188163558__aca377a4227184714b95dba9b277485b6">Multiple distribution columns can be selected in <span id="EN-US_TOPIC_0000001188163558__text1523288790">GaussDB(DWS)</span> to evenly distribute data.</p>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0437.html">Reviewing and Modifying a Table Definition</a></div>
|
|
</div>
|
|
</div>
|
|
|