doc-exports/docs/dws/dev/dws_06_0107.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

112 lines
14 KiB
HTML

<a name="EN-US_TOPIC_0000001188110540"></a><a name="EN-US_TOPIC_0000001188110540"></a>
<h1 class="topictitle1">Thesaurus Dictionary</h1>
<div id="body1561195448345"><p id="EN-US_TOPIC_0000001188110540__p61561752184216">A thesaurus dictionary (sometimes abbreviated as TZ) is a collection of words that include relationships between words and phrases, such as broader terms (BT), narrower terms (NT), preferred terms, non-preferred terms, and related terms. A thesaurus dictionary replaces all non-preferred terms by one preferred term and, optionally, preserves the original terms for indexing as well. A thesaurus dictionary is an extension of the synonym dictionary with added phrase support.</p>
<div class="section" id="EN-US_TOPIC_0000001188110540__section62562319454"><h4 class="sectiontitle">Precautions</h4><ul id="EN-US_TOPIC_0000001188110540__ul12908102613454"><li id="EN-US_TOPIC_0000001188110540__li846731517503">A thesaurus dictionary has the capability to recognize phrases. Therefore, it must remember its state and interact with the parser to check whether it should handle the next token or stop accumulation. The thesaurus dictionary must be configured carefully. For example, if the thesaurus dictionary is assigned to handle only the <strong id="EN-US_TOPIC_0000001188110540__b178589376575">asciiword</strong> token, then a thesaurus dictionary definition like <strong id="EN-US_TOPIC_0000001188110540__b12471646155713">one 7</strong> will not work because token type <strong id="EN-US_TOPIC_0000001188110540__b148765541578">uint</strong> is not assigned to the thesaurus dictionary.</li><li id="EN-US_TOPIC_0000001188110540__li1680016287457">Thesauruses are used during indexing. Any change in the thesaurus dictionary's parameters requires reindexing. For most other dictionary types, small changes such as adding or removing stop words does not force reindexing.</li></ul>
</div>
<div class="section" id="EN-US_TOPIC_0000001188110540__section1031795115131"><h4 class="sectiontitle">Procedure</h4><ol id="EN-US_TOPIC_0000001188110540__ol1563682614114"><li id="EN-US_TOPIC_0000001188110540__li86631231174119"><span>Create a TZ named <strong id="EN-US_TOPIC_0000001188110540__b614502265">thesaurus_astro</strong>.</span><p><div class="p" id="EN-US_TOPIC_0000001188110540__p239763717414"><strong id="EN-US_TOPIC_0000001188110540__b12852143285">thesaurus_astro</strong> is a simple astronomical TZ that defines two astronomical word combinations (word+synonym).<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110540__screen042463111202"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">supernovae</span><span class="w"> </span><span class="n">stars</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">sn</span><span class="w"> </span>
<span class="n">crab</span><span class="w"> </span><span class="n">nebulae</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">crab</span>
</pre></div></td></tr></table></div>
</div>
</div>
<p id="EN-US_TOPIC_0000001188110540__p192191827862">Run the following statement to create the TZ:</p>
<div class="notice" id="EN-US_TOPIC_0000001188110540__note29448236569"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="EN-US_TOPIC_0000001188110540__p1794462317569">// Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.</p>
</div></div>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110540__screen133702144519"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="k">DICTIONARY</span><span class="w"> </span><span class="n">thesaurus_astro</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">TEMPLATE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">thesaurus</span><span class="p">,</span>
<span class="w"> </span><span class="n">DictFile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">thesaurus_astro</span><span class="p">,</span>
<span class="w"> </span><span class="k">Dictionary</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pg_catalog</span><span class="p">.</span><span class="n">english_stem</span><span class="p">,</span>
<span class="w"> </span><span class="n">FILEPATH</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'obs://bucket_name/path accesskey=ak secretkey=sk region=rg'</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188110540__p1816975081418">The full name of the thesaurus dictionary file is <strong id="EN-US_TOPIC_0000001188110540__b1262718213455">thesaurus_astro.ths</strong>, and the dictionary is stored in the <strong id="EN-US_TOPIC_0000001188110540__b17747113917448">obs://bucket_name/path accesskey=ak secretkey=sk region=rg</strong> directory. <strong id="EN-US_TOPIC_0000001188110540__b32391345133214">pg_catalog.english_stem</strong> is the subdictionary (a <strong id="EN-US_TOPIC_0000001188110540__b1169514103341">Snowball</strong> English stemmer) used for input normalization. The subdictionary has its own configuration (for example, stop words), which is not shown here. For details about the syntax and parameters for creating a TZ, see <a href="dws_06_0183.html">CREATE TEXT SEARCH DICTIONARY</a>.</p>
</p></li><li id="EN-US_TOPIC_0000001188110540__li156362266415"><span>Bind the TZ to the desired token types in the text search configuration.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110540__screen345912558500"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">ALTER</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="n">CONFIGURATION</span><span class="w"> </span><span class="n">english</span>
<span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="n">MAPPING</span><span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">asciiword</span><span class="p">,</span><span class="w"> </span><span class="n">asciihword</span><span class="p">,</span><span class="w"> </span><span class="n">hword_asciipart</span>
<span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">thesaurus_astro</span><span class="p">,</span><span class="w"> </span><span class="n">english_stem</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188110540__li76368266411"><span>Use the TZ.</span><p><ul id="EN-US_TOPIC_0000001188110540__ul186562012411"><li id="EN-US_TOPIC_0000001188110540__li1877564114519">Test the TZ.<div class="p" id="EN-US_TOPIC_0000001188110540__p242419514514"><a name="EN-US_TOPIC_0000001188110540__li1877564114519"></a><a name="li1877564114519"></a>The <strong id="EN-US_TOPIC_0000001188110540__b15483203273912">ts_lexize</strong> function is not very useful for testing the TZ because the function processes its input as a single token. Instead, you can use the <strong id="EN-US_TOPIC_0000001188110540__b1957375194019">plainto_tsquery</strong>, <strong id="EN-US_TOPIC_0000001188110540__b3887255144018">to_tsvector</strong>, or <strong id="EN-US_TOPIC_0000001188110540__b124172002418">to_tsquery</strong> function which will break their input strings into multiple tokens.<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110540__screen165281749182514"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">plainto_tsquery</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="s1">'supernova star'</span><span class="p">);</span>
<span class="w"> </span><span class="n">plainto_tsquery</span><span class="w"> </span>
<span class="c1">-----------------</span>
<span class="w"> </span><span class="s1">'sn'</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">to_tsvector</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="s1">'supernova star'</span><span class="p">);</span>
<span class="w"> </span><span class="n">to_tsvector</span><span class="w"> </span>
<span class="c1">-------------</span>
<span class="w"> </span><span class="s1">'sn'</span><span class="p">:</span><span class="mi">1</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">to_tsquery</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="s1">'''supernova star'''</span><span class="p">);</span>
<span class="w"> </span><span class="n">to_tsquery</span><span class="w"> </span>
<span class="c1">------------</span>
<span class="w"> </span><span class="s1">'sn'</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</div>
<p id="EN-US_TOPIC_0000001188110540__p47655114141"><strong id="EN-US_TOPIC_0000001188110540__b694001520422">supernova star</strong> matches <strong id="EN-US_TOPIC_0000001188110540__b16368112610424">supernovae stars</strong> in <strong id="EN-US_TOPIC_0000001188110540__b7762332154217">thesaurus_astro</strong> because the <strong id="EN-US_TOPIC_0000001188110540__b0957655144210">english_stem</strong> stemmer is specified in the <strong id="EN-US_TOPIC_0000001188110540__b1625817296443">thesaurus_astro</strong> definition. The stemmer removed <strong id="EN-US_TOPIC_0000001188110540__b45069471436">e</strong> and <strong id="EN-US_TOPIC_0000001188110540__b157655019438">s</strong>.</p>
</li><li id="EN-US_TOPIC_0000001188110540__li114866934112">To index the original phrase, include it in the right-hand part of the definition.<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110540__screen101864317208"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">supernovae</span><span class="w"> </span><span class="n">stars</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">sn</span><span class="w"> </span><span class="n">supernovae</span><span class="w"> </span><span class="n">stars</span>
<span class="k">ALTER</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="k">DICTIONARY</span><span class="w"> </span><span class="n">thesaurus_astro</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">DictFile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">thesaurus_astro</span><span class="p">,</span>
<span class="w"> </span><span class="n">FILEPATH</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'file:///home/dicts/'</span><span class="p">);</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">plainto_tsquery</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="s1">'supernova star'</span><span class="p">);</span>
<span class="w"> </span><span class="n">plainto_tsquery</span><span class="w"> </span>
<span class="c1">-----------------------------</span>
<span class="w"> </span><span class="s1">'sn'</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="s1">'supernova'</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="s1">'star'</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</li></ul>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0102.html">Dictionaries</a></div>
</div>
</div>