doc-exports/docs/dws/dev/dws_06_0100.html
Lu, Huayi a24ca60074 DWS DEVELOPER 811 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2023-01-19 13:37:49 +00:00

64 lines
11 KiB
HTML

<a name="EN-US_TOPIC_0000001145710731"></a><a name="EN-US_TOPIC_0000001145710731"></a>
<h1 class="topictitle1">Gathering Document Statistics</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001145710731__en-us_topic_0059778114_p334934915318">The function <strong id="EN-US_TOPIC_0000001145710731__b842352706161458">ts_stat</strong> is useful for checking your configuration and for finding stop-word candidates.</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001145710731__s3da22e70d6094cb591c4eb60e244e7ae"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">ts_stat</span><span class="p">(</span><span class="n">sqlquery</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">weights</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w"></span>
<span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span><span class="nb">integer</span><span class="p">)</span><span class="w"> </span><span class="k">returns</span><span class="w"> </span><span class="k">setof</span><span class="w"> </span><span class="n">record</span><span class="w"></span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001145710731__af447f3991f384448a73e6c052c3a13e2"><strong id="EN-US_TOPIC_0000001145710731__b84235270616159">sqlquery</strong> is a text value containing an SQL query which must return a single <strong id="EN-US_TOPIC_0000001145710731__b842352706161511">tsvector</strong> column. <strong id="EN-US_TOPIC_0000001145710731__b842352706161525">ts_stat</strong> executes the query and returns statistics about each distinct lexeme (word) contained in the <strong id="EN-US_TOPIC_0000001145710731__b842352706161529">tsvector</strong> data. The columns returned are</p>
<ul id="EN-US_TOPIC_0000001145710731__ub360cbaa2f244a7a9f8864dc96adb954"><li id="EN-US_TOPIC_0000001145710731__l8aee49fba2cb4b7fa4a428d8b8a43a2e"><strong id="EN-US_TOPIC_0000001145710731__b842352706161545">word text</strong>: the value of a lexeme</li><li id="EN-US_TOPIC_0000001145710731__lc52b72d4852549569f1ea5f788c03b3e"><strong id="EN-US_TOPIC_0000001145710731__b84235270616161">ndoc integer</strong>: number of documents (<strong id="EN-US_TOPIC_0000001145710731__b842352706161612">tsvector</strong>s) the word occurred in</li><li id="EN-US_TOPIC_0000001145710731__laf1cf77bcc1a4cb2b4ae25437d0368be"><strong id="EN-US_TOPIC_0000001145710731__b842352706155026">nentry integer</strong>: total number of occurrences of the word </li></ul>
<p id="EN-US_TOPIC_0000001145710731__a4b6b893f225d4e6c971068c07549ff41">If <strong id="EN-US_TOPIC_0000001145710731__b842352706161644">weights</strong> are supplied, only occurrences having one of those weights are counted. For example, to find the ten most frequent words in a document collection:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001145710731__sd50d235d8ad940438140f0ae9ae27954"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_stat</span><span class="p">(</span><span class="s1">'SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk &lt; 10'</span><span class="p">)</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;;</span><span class="w"></span>
<span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span>
<span class="c1">------+------+--------</span>
<span class="w"> </span><span class="mi">32</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"></span>
<span class="w"> </span><span class="mi">33</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"></span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">15</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">17</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="w"> </span><span class="mi">22</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"></span>
<span class="p">(</span><span class="mi">10</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span><span class="w"></span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001145710731__a7c7ae71e7bc54a1a89399be87bdde3a2">The same, but counting only word occurrences with weight <strong id="EN-US_TOPIC_0000001145710731__b84235270616173">A</strong> or <strong id="EN-US_TOPIC_0000001145710731__b84235270616175">B</strong>:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001145710731__s6c4df7573c2e46a19a02a346a3c0207c"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_stat</span><span class="p">(</span><span class="s1">'SELECT to_tsvector(''english'', sr_reason_sk) FROM tpcds.store_returns WHERE sr_customer_sk &lt; 10'</span><span class="p">,</span><span class="w"> </span><span class="s1">'a'</span><span class="p">)</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="k">DESC</span><span class="p">,</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ndoc</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">nentry</span><span class="w"> </span>
<span class="c1">------+------+--------</span>
<span class="p">(</span><span class="mi">0</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span><span class="w"></span>
</pre></div></td></tr></table></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0096.html">Additional Features</a></div>
</div>
</div>