doc-exports/docs/dws/dev/dws_06_0100.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

64 lines
10 KiB
HTML

<a name="EN-US_TOPIC_0000001233430165"></a><a name="EN-US_TOPIC_0000001233430165"></a>
<h1 class="topictitle1">Collecting Document Statistics</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001233430165__en-us_topic_0059778114_p334934915318">The function <strong id="EN-US_TOPIC_0000001233430165__b842352706161458">ts_stat</strong> is useful for checking your configuration and for finding stop-word candidates.</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233430165__saf7bacc8ef794906a8b7fea5008361a2"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">ts_stat</span><span class="p">(</span><span class="n">sqlquery</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">weights</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span>
<span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span><span class="nb">integer</span><span class="p">)</span><span class="w"> </span><span class="k">returns</span><span class="w"> </span><span class="k">setof</span><span class="w"> </span><span class="n">record</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001233430165__a28b8224c0c674f42a15d4110419f301e"><strong id="EN-US_TOPIC_0000001233430165__b84235270616159">sqlquery</strong> is a text value containing an SQL query which must return a single <strong id="EN-US_TOPIC_0000001233430165__b842352706161511">tsvector</strong> column. <strong id="EN-US_TOPIC_0000001233430165__b842352706161525">ts_stat</strong> executes the query and returns statistics about each distinct lexeme (word) contained in the <strong id="EN-US_TOPIC_0000001233430165__b842352706161529">tsvector</strong> data. The columns returned are</p>
<ul id="EN-US_TOPIC_0000001233430165__u8f6029b26bcf4d6b96665fe1b3b59651"><li id="EN-US_TOPIC_0000001233430165__l19479245bb804f4385032d69a6059c39"><strong id="EN-US_TOPIC_0000001233430165__b842352706161545">word text</strong>: the value of a lexeme</li><li id="EN-US_TOPIC_0000001233430165__la31631b62e0549a2b0c6ea4f2c7f98f4"><strong id="EN-US_TOPIC_0000001233430165__b84235270616161">ndoc integer</strong>: number of documents (<strong id="EN-US_TOPIC_0000001233430165__b842352706161612">tsvector</strong>s) the word occurred in</li><li id="EN-US_TOPIC_0000001233430165__lea23f3304bcf4af3838ac5959231b9cd"><strong id="EN-US_TOPIC_0000001233430165__b842352706155026">nentry integer</strong>: total number of occurrences of the word </li></ul>
<p id="EN-US_TOPIC_0000001233430165__a9213b1c4a4ea471f830183f9e516c6b9">If <strong id="EN-US_TOPIC_0000001233430165__b842352706161644">weights</strong> are supplied, only occurrences having one of those weights are counted. For example, to find the ten most frequent words in a document collection:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233430165__s5409601f207846f1b77ca755d49be2d3"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_stat</span><span class="p">(</span><span class="s1">'SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk &lt; 10'</span><span class="p">)</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span>
<span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span>
<span class="c1">------+------+--------</span>
<span class="w"> </span><span class="mi">32</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span>
<span class="w"> </span><span class="mi">33</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">15</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">17</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="mi">22</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span>
<span class="p">(</span><span class="mi">10</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001233430165__a3023981b11eb45839210c09bb925e530">The same, but counting only word occurrences with weight <strong id="EN-US_TOPIC_0000001233430165__b84235270616173">A</strong> or <strong id="EN-US_TOPIC_0000001233430165__b84235270616175">B</strong>:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233430165__s9d5364b9b03845398238f2042156f96e"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_stat</span><span class="p">(</span><span class="s1">'SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk &lt; 10'</span><span class="p">,</span><span class="w"> </span><span class="s1">'a'</span><span class="p">)</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span>
<span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span>
<span class="c1">------+------+--------</span>
<span class="p">(</span><span class="mi">0</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0096.html">Additional Features</a></div>
</div>
</div>