doc-exports/docs/dws/dev/dws_04_0436.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

50 lines
11 KiB
HTML

<a name="EN-US_TOPIC_0000001233681757"></a><a name="EN-US_TOPIC_0000001233681757"></a>
<h1 class="topictitle1">Updating Statistics</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001233681757__p78113337546">In a database, statistics indicate the source data of a plan generated by a planner. If no collection statistics are available or out of date, the execution plan may seriously deteriorate, leading to low performance.</p>
<div class="section" id="EN-US_TOPIC_0000001233681757__sdbde8a9d41484527a9706162eaec0ea1"><h4 class="sectiontitle">Context</h4><p id="EN-US_TOPIC_0000001233681757__p9599144717409">The <strong id="EN-US_TOPIC_0000001233681757__b84235270611236">ANALYZE</strong> statement collects statistic about table contents in databases, which will be stored in the system table <strong id="EN-US_TOPIC_0000001233681757__b66748299">PG_STATISTIC</strong>. Then, the query optimizer uses the statistics to work out the most efficient execution plan.</p>
<p id="EN-US_TOPIC_0000001233681757__p41451454144012">After executing batch insertion and deletions, you are advised to run the <strong id="EN-US_TOPIC_0000001233681757__b842352706165842">ANALYZE</strong> statement on the table or the entire library to update statistics. By default, 30,000 rows of statistics are sampled. That is, the default value of the GUC parameter <strong id="EN-US_TOPIC_0000001233681757__b842352706174552">default_statistics_target</strong> is <strong id="EN-US_TOPIC_0000001233681757__b842352706174556">100</strong>. If the total number of rows in the table exceeds 1,600,000, you are advised to set <strong id="EN-US_TOPIC_0000001233681757__b842352706174638">default_statistics_target</strong> to <strong id="EN-US_TOPIC_0000001233681757__b842352706174645">-2</strong>, indicating that 2% of the statistics are collected.</p>
<p id="EN-US_TOPIC_0000001233681757__aa4f59ffe8c4e4520a777b61db2ff0dde">For an intermediate table generated during the execution of a batch script or stored procedure, you also need to run the <strong id="EN-US_TOPIC_0000001233681757__b59965843211354">ANALYZE</strong> statement.</p>
<p id="EN-US_TOPIC_0000001233681757__p83021639060">If there are multiple inter-related columns in a table and the conditions or grouping operations based on these columns are involved in the query, collect statistics about these columns so that the query optimizer can accurately estimate the number of rows and generate an effective execution plan.</p>
</div>
<div class="section" id="EN-US_TOPIC_0000001233681757__s2df36e544a5e4ccea6342438f9d1f6ce"><h4 class="sectiontitle">Generating Statistics</h4><p id="EN-US_TOPIC_0000001233681757__a75e1de2b9d0c4da994c9fc5fe08192e6">Run the following commands to update the statistics about a table or the entire database:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233681757__s8328d559ffa349d3bffcc9ad313676c3"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">tablename</span><span class="p">;</span><span class="w"> </span><span class="c1">--Update statistics about a table.</span>
<span class="k">ANALYZE</span><span class="p">;</span><span class="w"> </span><span class="c1">---Update statistics about the entire database.</span>
</pre></div></td></tr></table></div>
</div>
</div>
<p id="EN-US_TOPIC_0000001233681757__p93142485261">Run the following statements to perform statistics-related operations on multiple columns:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233681757__screen2033246152712"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">ANALYZE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="p">((</span><span class="n">column_1</span><span class="p">,</span><span class="w"> </span><span class="n">column_2</span><span class="p">));</span><span class="w"> </span><span class="c1">--Collect statistics about column_1 and column_2 of tablename.</span>
<span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">STATISTICS</span><span class="w"> </span><span class="p">((</span><span class="n">column_1</span><span class="p">,</span><span class="w"> </span><span class="n">column_2</span><span class="p">));</span><span class="w"> </span><span class="c1">--Declare statistics about column_1 and column_2 of tablename.</span>
<span class="k">ANALYZE</span><span class="w"> </span><span class="n">tablename</span><span class="p">;</span><span class="w"> </span><span class="c1">--Collect statistics about one or more columns.</span>
<span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="k">DELETE</span><span class="w"> </span><span class="k">STATISTICS</span><span class="w"> </span><span class="p">((</span><span class="n">column_1</span><span class="p">,</span><span class="w"> </span><span class="n">column_2</span><span class="p">));</span><span class="w"> </span><span class="c1">--Delete statistics about column_1 and column_2 of tablename or their statistics declaration.</span>
</pre></div></td></tr></table></div>
</div>
<div class="notice" id="EN-US_TOPIC_0000001233681757__note121972410486"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><ul id="EN-US_TOPIC_0000001233681757__ul3627143841712"><li id="EN-US_TOPIC_0000001233681757__li166271938171714">After the statistics are declared for multiple columns by running the <strong id="EN-US_TOPIC_0000001233681757__b16349578249">ALTER TABLE</strong> <em id="EN-US_TOPIC_0000001233681757__i189157082518">tablename</em> <strong id="EN-US_TOPIC_0000001233681757__b149010518250">ADD STATISTICS</strong> statement, the system collects the statistics about these columns next time <strong id="EN-US_TOPIC_0000001233681757__b76462160273">ANALYZE</strong> is performed on the table or the entire database. To collect the statistics, run the <strong id="EN-US_TOPIC_0000001233681757__b15620312288">ANALYZE</strong> statement.</li><li id="EN-US_TOPIC_0000001233681757__li103621941141714">Use <strong id="EN-US_TOPIC_0000001233681757__b18601175115010">EXPLAIN</strong> to show the execution plan of each SQL statement. If <strong id="EN-US_TOPIC_0000001233681757__b06017514011">rows=10</strong> (the default value, probably indicating the table has not been analyzed) is displayed in the <strong id="EN-US_TOPIC_0000001233681757__b560125119010">SEQ SCAN</strong> output of a table, run the <strong id="EN-US_TOPIC_0000001233681757__b6601125114010">ANALYZE</strong> statement for this table.</li></ul>
</div></div>
<div class="section" id="EN-US_TOPIC_0000001233681757__section5205338131912"><h4 class="sectiontitle">Improving the Quality of Statistics</h4><p id="EN-US_TOPIC_0000001233681757__p1250062882013"><strong id="EN-US_TOPIC_0000001233681757__b182341344211">ANALYZE</strong> samples data from a table based on the random sampling algorithm and calculates table data features based on the samples. The number of samples can be specified by the <strong id="EN-US_TOPIC_0000001233681757__b383979225">default_statistics_target</strong> parameter. The value of <strong id="EN-US_TOPIC_0000001233681757__b889117131124">default_statistics_target</strong> ranges from -100 to 10000, and the default value is 100.</p>
<p id="EN-US_TOPIC_0000001233681757__p19500192819206">If <strong id="EN-US_TOPIC_0000001233681757__b14525134414412">default_statistics_target</strong> &gt; 0, the number of samples is 300 x <strong id="EN-US_TOPIC_0000001233681757__b396495719218">default_statistics_target</strong>. This means a larger value of <strong id="EN-US_TOPIC_0000001233681757__b17971192719318">default_statistics_target</strong> indicates a larger number of samples, larger memory space occupied by samples, and longer time required for calculating statistics.</p>
<p id="EN-US_TOPIC_0000001233681757__p850010289207">If <strong id="EN-US_TOPIC_0000001233681757__b6323155012412">default_statistics_target</strong> &lt; 0, the number of samples is <strong id="EN-US_TOPIC_0000001233681757__b399516413510">default_statistics_target</strong>/100 x Total number of rows in the table. A smaller value of <strong id="EN-US_TOPIC_0000001233681757__b9525112016519">default_statistics_target</strong> indicates a larger number of samples. When <strong id="EN-US_TOPIC_0000001233681757__b57723163717">default_statistics_target</strong> &lt; 0, the sampled data is written to the disk. In this case, the samples do not occupy memory. However, the calculation still takes a long time because the sample size is too large.</p>
<p id="EN-US_TOPIC_0000001233681757__p155021286202">When <strong id="EN-US_TOPIC_0000001233681757__b451612214811">default_statistics_target</strong> &lt; 0, the actual number of samples is <strong id="EN-US_TOPIC_0000001233681757__b4307534281">default_statistics_target</strong>/100 x Total number of rows in the table. Therefore, this sampling mode is also called percentage sampling.</p>
</div>
<div class="section" id="EN-US_TOPIC_0000001233681757__section8123134612210"><h4 class="sectiontitle">Automatic Statistics Collection</h4><p id="EN-US_TOPIC_0000001233681757__p7891087250">When the parameter <strong id="EN-US_TOPIC_0000001233681757__b1536952015107">autoanalyze</strong> is enabled, if the query statement reaches the optimizer and finds that there are no statistics, statistics collection will be automatically triggered to meet the optimizer's requirements.</p>
<p id="EN-US_TOPIC_0000001233681757__p14750103312513">Note: Automatic statistics collection is triggered only for complex query SQL statements that are sensitive to statistics (such as multi-table association). Simple queries (such as single-point query and single-table aggregation) do not trigger automatic statistics collection.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0430.html">SQL Optimization Guide</a></div>
</div>
</div>