doc-exports/docs/dws/dev/dws_06_0110.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

147 lines
19 KiB
HTML

<a name="EN-US_TOPIC_0000001188270538"></a><a name="EN-US_TOPIC_0000001188270538"></a>
<h1 class="topictitle1">Text Search Configuration Example</h1>
<div id="body1561534483449"><p id="EN-US_TOPIC_0000001188270538__p39950155319">Text search configuration specifies the following components required for converting a document into a <strong id="EN-US_TOPIC_0000001188270538__b733223717450">tsvector</strong>:</p>
<ul id="EN-US_TOPIC_0000001188270538__ul16432193525219"><li id="EN-US_TOPIC_0000001188270538__li37982015315">A parser, decomposes a text into tokens.</li><li id="EN-US_TOPIC_0000001188270538__li164321835105214">Dictionary list, converts each token into a lexeme.</li></ul>
<p id="EN-US_TOPIC_0000001188270538__p8060118">Each time when the <strong id="EN-US_TOPIC_0000001188270538__b9357174411162">to_tsvector</strong> or <strong id="EN-US_TOPIC_0000001188270538__b344185431614">to_tsquery</strong> function is invoked, a text search configuration is required to specify a processing procedure. The GUC parameter <strong id="EN-US_TOPIC_0000001188270538__b83051401489">default_text_search_config</strong> specifies the default text search configuration, which will be used if the text search function does not explicitly specify a text search configuration.</p>
<p id="EN-US_TOPIC_0000001188270538__p2161716193716"><span id="EN-US_TOPIC_0000001188270538__text1625964664">GaussDB(DWS)</span> provides some predefined text search configurations. You can also create user-defined text search configurations. In addition, to facilitate the management of text search objects, multiple gsql meta-commands are provided to display related information. For details, see "Meta-Command Reference" in the <em id="EN-US_TOPIC_0000001188270538__i029275019166">Tool Guide</em>.</p>
<div class="section" id="EN-US_TOPIC_0000001188270538__section2016620211300"><h4 class="sectiontitle">Procedure</h4><ol id="EN-US_TOPIC_0000001188270538__ol482416413018"><li id="EN-US_TOPIC_0000001188270538__li2082454118013"><span>Create a text search configuration <strong id="EN-US_TOPIC_0000001188270538__b1396852113118">ts_conf</strong> by copying the predefined text search configuration <strong id="EN-US_TOPIC_0000001188270538__b1672710485347">english</strong>.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen158854278533"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="n">CONFIGURATION</span><span class="w"> </span><span class="n">ts_conf</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="k">COPY</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pg_catalog</span><span class="p">.</span><span class="n">english</span><span class="w"> </span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="n">CONFIGURATION</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188270538__li13665215714"><span>Create a <strong id="EN-US_TOPIC_0000001188270538__b1655625313173">Synonym</strong> dictionary.</span><p><div class="p" id="EN-US_TOPIC_0000001188270538__p1174133112116">Assume that the definition file <strong id="EN-US_TOPIC_0000001188270538__b108531337163517">pg_dict.syn</strong> of the <strong id="EN-US_TOPIC_0000001188270538__b91247544353">Synonym</strong> dictionary contains the following contents:<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen1590115511111"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">postgres</span><span class="w"> </span><span class="n">pg</span><span class="w"> </span>
<span class="n">pgsql</span><span class="w"> </span><span class="n">pg</span><span class="w"> </span>
<span class="n">postgresql</span><span class="w"> </span><span class="n">pg</span>
</pre></div></td></tr></table></div>
</div>
</div>
<p id="EN-US_TOPIC_0000001188270538__p649322635418">Run the following statement to create the <strong id="EN-US_TOPIC_0000001188270538__b1782029131819">Synonym</strong> dictionary:</p>
<div class="notice" id="EN-US_TOPIC_0000001188270538__note8911201315239"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="EN-US_TOPIC_0000001188270538__p13912121322315">// Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.</p>
</div></div>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen205297242611"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="k">DICTIONARY</span><span class="w"> </span><span class="n">pg_dict</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">TEMPLATE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">synonym</span><span class="p">,</span>
<span class="w"> </span><span class="n">SYNONYMS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span>
<span class="w"> </span><span class="n">FILEPATH</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx'</span>
<span class="w"> </span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188270538__li1965315010218"><span>Create an <strong id="EN-US_TOPIC_0000001188270538__b5404102215319">Ispell</strong> dictionary <strong id="EN-US_TOPIC_0000001188270538__b17891011372">english_ispell</strong> (the dictionary definition file is from the open source dictionary).</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen47831721181510"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="k">DICTIONARY</span><span class="w"> </span><span class="n">english_ispell</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="k">TEMPLATE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ispell</span><span class="p">,</span>
<span class="w"> </span><span class="n">DictFile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">english</span><span class="p">,</span>
<span class="w"> </span><span class="n">AffFile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">english</span><span class="p">,</span>
<span class="w"> </span><span class="n">StopWords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">english</span><span class="p">,</span>
<span class="w"> </span><span class="n">FILEPATH</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'obs://bucket01/obs.xxx.xxx.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx'</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188270538__li1963517426316"><span>Modify the text search configuration <strong id="EN-US_TOPIC_0000001188270538__b9922821389">ts_conf</strong> and change the dictionary list for tokens of certain types. For details about token types, see <a href="dws_06_0101.html">Text Search Parser</a>.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen15561519315"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">ALTER</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="n">CONFIGURATION</span><span class="w"> </span><span class="n">ts_conf</span>
<span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="n">MAPPING</span><span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">asciiword</span><span class="p">,</span><span class="w"> </span><span class="n">asciihword</span><span class="p">,</span><span class="w"> </span><span class="n">hword_asciipart</span><span class="p">,</span>
<span class="w"> </span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="n">hword</span><span class="p">,</span><span class="w"> </span><span class="n">hword_part</span>
<span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="w"> </span><span class="n">english_ispell</span><span class="p">,</span><span class="w"> </span><span class="n">english_stem</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188270538__li1496721514219"><span>In the text search configuration, set non-index or set the search for tokens of certain types.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen077132543517"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">ALTER</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">SEARCH</span><span class="w"> </span><span class="n">CONFIGURATION</span><span class="w"> </span><span class="n">ts_conf</span>
<span class="w"> </span><span class="k">DROP</span><span class="w"> </span><span class="n">MAPPING</span><span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="n">email</span><span class="p">,</span><span class="w"> </span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">url_path</span><span class="p">,</span><span class="w"> </span><span class="n">sfloat</span><span class="p">,</span><span class="w"> </span><span class="nb">float</span><span class="p">;</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188270538__li12448020827"><span>Use the text retrieval commissioning function ts_debug() to test the text search configuration <strong id="EN-US_TOPIC_0000001188270538__b117844118429">ts_conf</strong>.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen2079291854711"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_debug</span><span class="p">(</span><span class="s1">'ts_conf'</span><span class="p">,</span><span class="w"> </span><span class="s1">'</span>
<span class="s1">PostgreSQL, the highly scalable, SQL compliant, open source object-relational</span>
<span class="s1">database management system, is now undergoing beta testing of the next</span>
<span class="s1">version of our software.</span>
<span class="s1">'</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</p></li><li id="EN-US_TOPIC_0000001188270538__li3737192719211"><span>You can set the default text search configuration of the current session to <strong id="EN-US_TOPIC_0000001188270538__b3487369436">ts_conf</strong>. This setting is valid only for the current session.</span><p><div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188270538__screen84221551195120"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span></pre></div></td><td class="code"><div><pre><span></span><span class="err">\</span><span class="n">dF</span><span class="o">+</span><span class="w"> </span><span class="n">ts_conf</span>
<span class="w"> </span><span class="nb">Text</span><span class="w"> </span><span class="k">search</span><span class="w"> </span><span class="n">configuration</span><span class="w"> </span><span class="ss">&quot;public.ts_conf&quot;</span>
<span class="n">Parser</span><span class="p">:</span><span class="w"> </span><span class="ss">&quot;pg_catalog.default&quot;</span>
<span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Dictionaries</span><span class="w"> </span>
<span class="c1">-----------------+-------------------------------------</span>
<span class="w"> </span><span class="n">asciihword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="n">english_ispell</span><span class="p">,</span><span class="n">english_stem</span>
<span class="w"> </span><span class="n">asciiword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="n">english_ispell</span><span class="p">,</span><span class="n">english_stem</span>
<span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="k">host</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="n">hword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="n">english_ispell</span><span class="p">,</span><span class="n">english_stem</span>
<span class="w"> </span><span class="n">hword_asciipart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="n">english_ispell</span><span class="p">,</span><span class="n">english_stem</span>
<span class="w"> </span><span class="n">hword_numpart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="n">hword_part</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="n">english_ispell</span><span class="p">,</span><span class="n">english_stem</span>
<span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="n">numhword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="n">numword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="k">version</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">simple</span>
<span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">pg_dict</span><span class="p">,</span><span class="n">english_ispell</span><span class="p">,</span><span class="n">english_stem</span>
<span class="k">SET</span><span class="w"> </span><span class="n">default_text_search_config</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'public.ts_conf'</span><span class="p">;</span>
<span class="k">SET</span>
<span class="k">SHOW</span><span class="w"> </span><span class="n">default_text_search_config</span><span class="p">;</span>
<span class="w"> </span><span class="n">default_text_search_config</span><span class="w"> </span>
<span class="c1">----------------------------</span>
<span class="w"> </span><span class="k">public</span><span class="p">.</span><span class="n">ts_conf</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0081.html">Full Text Search</a></div>
</div>
</div>