forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
69 lines
13 KiB
HTML
69 lines
13 KiB
HTML
<a name="EN-US_TOPIC_0000001233628635"></a><a name="EN-US_TOPIC_0000001233628635"></a>
|
|
|
|
<h1 class="topictitle1">Creating a Gin Index</h1>
|
|
<div id="body8662426"><p id="EN-US_TOPIC_0000001233628635__en-us_topic_0059778028_p485373917112">You can create a <strong id="EN-US_TOPIC_0000001233628635__b84235270615179">GIN</strong> index to speed up text searches:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233628635__sbe39577906ed4f4f87f2a25732993e95"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">pgweb_idx_1</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">gin</span><span class="p">(</span><span class="n">to_tsvector</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="w"> </span><span class="n">body</span><span class="p">));</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001233628635__p7937118122215">The to_tsvector() function accepts one or two augments.</p>
|
|
<p id="EN-US_TOPIC_0000001233628635__p2848164384315">If the one-augment version of the index is used, the system will use the configuration specified by <strong id="EN-US_TOPIC_0000001233628635__b15614183510146">default_text_search_config</strong> by default.</p>
|
|
<p id="EN-US_TOPIC_0000001233628635__p78485438432">To create a Gin index, the two-augment version must be used, otherwise the index content may be inconsistent. Only text search functions that specify a configuration name can be used in expression indexes. Index content is not affected by <strong id="EN-US_TOPIC_0000001233628635__b15351434239211">default_text_search_config</strong>, because different entries could contain <strong id="EN-US_TOPIC_0000001233628635__b15629869479211">tsvector</strong>s that were created with different text search configurations, and there would be no way to guess which was which. It would be impossible to dump and restore such an index correctly.</p>
|
|
<p id="EN-US_TOPIC_0000001233628635__p68481443114317">Because the two-argument version of <strong id="EN-US_TOPIC_0000001233628635__b381768489211">to_tsvector</strong> was used in the index above, only a query reference that uses the two-argument version of <strong id="EN-US_TOPIC_0000001233628635__b10385574799211">to_tsvector</strong> with the same configuration name will use that index. That is, <strong id="EN-US_TOPIC_0000001233628635__b1760757369211">WHERE to_tsvector('english', body) @@ 'a & b'</strong> can use the index, but <strong id="EN-US_TOPIC_0000001233628635__b21125012549211">WHERE to_tsvector(body) @@ 'a & b'</strong> cannot. This ensures that an index will be used only with the same configuration used to create the index entries.</p>
|
|
<p id="EN-US_TOPIC_0000001233628635__p15848154313430">More complex expression indexes can be set up when the configuration name of the index is specified by another column. For example:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233628635__sea5cf34e148348d98d52cc69c48aa258"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">pgweb_idx_2</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">gin</span><span class="p">(</span><span class="n">to_tsvector</span><span class="p">(</span><span class="s1">'zhparser'</span><span class="p">,</span><span class="w"> </span><span class="n">body</span><span class="p">));</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<div class="note" id="EN-US_TOPIC_0000001233628635__note33933179437"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="EN-US_TOPIC_0000001233628635__p15638155191716">In this example, zhparser supports only the UTF-8 or GBK database encoding format. If the SQL_ASCII encoding is used, an error will be reported.</p>
|
|
</div></div>
|
|
<p id="EN-US_TOPIC_0000001233628635__ad260542d03cf4fbbb18a8c19f1ea4599">where <strong id="EN-US_TOPIC_0000001233628635__b842352706154158">body</strong> is a column in the <strong id="EN-US_TOPIC_0000001233628635__b87903412215">pgweb</strong> table. This allows mixed configurations in the same index while recording which configuration was used for each index entry. This can be useful when the document collection contains documents in different languages. Again, queries that are meant to use the index must be phrased to match, for example, <strong id="EN-US_TOPIC_0000001233628635__b842352706154259">WHERE to_tsvector(config_name, body) @@ 'a & b'</strong> must match <strong id="EN-US_TOPIC_0000001233628635__b842352706154411">to_tsvector</strong> in the index.</p>
|
|
<p id="EN-US_TOPIC_0000001233628635__a2235e8b0d6a14c00b03f18359cc7e436">Indexes can even concatenate columns:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233628635__s1768adcd847743a9ac23e4a75820bb38"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">pgweb_idx_3</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">gin</span><span class="p">(</span><span class="n">to_tsvector</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="w"> </span><span class="n">title</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="s1">' '</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">body</span><span class="p">));</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001233628635__a0efe3255075f43f99e4fee05908b9954">Another approach is to create a separate <strong id="EN-US_TOPIC_0000001233628635__b84235270615451">tsvector</strong> column to hold the output of <strong id="EN-US_TOPIC_0000001233628635__b842352706154459">to_tsvector</strong>. This example is a concatenation of <strong id="EN-US_TOPIC_0000001233628635__b842352706154528">title</strong> and <strong id="EN-US_TOPIC_0000001233628635__b842352706154530">body</strong>, using <strong id="EN-US_TOPIC_0000001233628635__b842352706154516">coalesce</strong> to ensure that one column will still be indexed when the other is <strong id="EN-US_TOPIC_0000001233628635__b842352706154519">NULL</strong>:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233628635__s6c346ea516894f71ad4fd4b2874d1fc1"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
|
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">textsearchable_index_col</span><span class="w"> </span><span class="n">tsvector</span><span class="p">;</span>
|
|
<span class="k">UPDATE</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">textsearchable_index_col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">to_tsvector</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="w"> </span><span class="k">coalesce</span><span class="p">(</span><span class="n">title</span><span class="p">,</span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="s1">' '</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">coalesce</span><span class="p">(</span><span class="n">body</span><span class="p">,</span><span class="s1">''</span><span class="p">));</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001233628635__a6e0f2a5d38db4888aed48d065f24b88d">Then, create a GIN index to speed up the search:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233628635__s9c0d87e507c447c7ac0c94723b05f1aa"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">textsearch_idx_4</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">gin</span><span class="p">(</span><span class="n">textsearchable_index_col</span><span class="p">);</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001233628635__a2cce59a7258e4a899d373c65dd263db9">Now you are ready to perform a fast full text search:</p>
|
|
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233628635__s018c311459274d6da4b8a43da3b06d72"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
|
<span class="normal"> 2</span>
|
|
<span class="normal"> 3</span>
|
|
<span class="normal"> 4</span>
|
|
<span class="normal"> 5</span>
|
|
<span class="normal"> 6</span>
|
|
<span class="normal"> 7</span>
|
|
<span class="normal"> 8</span>
|
|
<span class="normal"> 9</span>
|
|
<span class="normal">10</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">title</span><span class="w"> </span>
|
|
<span class="k">FROM</span><span class="w"> </span><span class="n">tsearch</span><span class="p">.</span><span class="n">pgweb</span><span class="w"> </span>
|
|
<span class="k">WHERE</span><span class="w"> </span><span class="n">textsearchable_index_col</span><span class="w"> </span><span class="o">@@</span><span class="w"> </span><span class="n">to_tsquery</span><span class="p">(</span><span class="s1">'science & Computer'</span><span class="p">)</span><span class="w"> </span>
|
|
<span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">last_mod_date</span><span class="w"> </span><span class="k">DESC</span><span class="w"> </span>
|
|
<span class="k">LIMIT</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"> </span>
|
|
|
|
<span class="w"> </span><span class="n">title</span><span class="w"> </span>
|
|
<span class="c1">--------</span>
|
|
<span class="w"> </span><span class="n">Computer</span><span class="w"> </span><span class="n">science</span>
|
|
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
|
|
</pre></div></td></tr></table></div>
|
|
|
|
</div>
|
|
<p id="EN-US_TOPIC_0000001233628635__a5a7e4d5ff5a74dbab1ea13f552986e8a">One advantage of the separate-column approach over an expression index is that it is unnecessary to explicitly specify the text search configuration in queries in order to use the index. As shown in the preceding example, the query can depend on <strong id="EN-US_TOPIC_0000001233628635__b842352706155046">default_text_search_config</strong>. Another advantage is that searches will be faster, since it will not be necessary to redo the <strong id="EN-US_TOPIC_0000001233628635__b84235270615512">to_tsvector</strong> calls to verify index matches. The expression-index approach is simpler to set up, however, and it requires less disk space since the <strong id="EN-US_TOPIC_0000001233628635__b842352706155126">tsvector</strong> representation is not stored explicitly.</p>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0087.html">Searching for Texts in Database Tables</a></div>
|
|
</div>
|
|
</div>
|
|
|