doc-exports/docs/dws/dev/dws_06_0097.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

20 lines
5.0 KiB
HTML

<a name="EN-US_TOPIC_0000001188270498"></a><a name="EN-US_TOPIC_0000001188270498"></a>
<h1 class="topictitle1">Manipulating tsvector</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001188270498__en-us_topic_0059779038_p928422017215"><span id="EN-US_TOPIC_0000001188270498__text1551525989">GaussDB(DWS)</span> provides functions and operators that can be used to manipulate documents that are already in tsvector type.</p>
<ul id="EN-US_TOPIC_0000001188270498__u8f3116334d3a466c87e6a7d1d6ad9cd7"><li id="EN-US_TOPIC_0000001188270498__l4fb9a2486ccc4758aa4ef670feeaada1">tsvector || tsvector<p id="EN-US_TOPIC_0000001188270498__a7ba3308b32e34bb2866964b7b2c04a73"><a name="EN-US_TOPIC_0000001188270498__l4fb9a2486ccc4758aa4ef670feeaada1"></a><a name="l4fb9a2486ccc4758aa4ef670feeaada1"></a>The tsvector concatenation operator returns a new tsvector which combines the lexemes and positional information of the two tsvectors given as arguments. Positions and weight labels are retained during the concatenation. Positions appearing in the right-hand tsvector are offset by the largest position mentioned in the left-hand tsvector, so that the result is nearly equivalent to the result of performing <strong id="EN-US_TOPIC_0000001188270498__b84235270616628">to_tsvector</strong> on the concatenation of the two original document strings. (The equivalence is not exact, because any stop-words removed from the end of the left-hand argument will not affect the result, whereas they would have affected the positions of the lexemes in the right-hand argument if textual concatenation were used.)</p>
<p id="EN-US_TOPIC_0000001188270498__a85682878bd5344fbb25231b94d1943c9">One advantage of using concatenation in the tsvector form, rather than concatenating text before applying <strong id="EN-US_TOPIC_0000001188270498__b7525229530">to_tsvector</strong>, is that you can use different configurations to parse different sections of the document. Also, because the <strong id="EN-US_TOPIC_0000001188270498__en-us_topic_0085033154_b84235270616753">setweight</strong> function marks all lexemes of the given tsvector the same way, it is necessary to parse the text and do <strong id="EN-US_TOPIC_0000001188270498__en-us_topic_0085033154_b8423527061681">setweight</strong> before concatenating if you want to label different parts of the document with different weights.</p>
</li><li id="EN-US_TOPIC_0000001188270498__l0daa2a6211744f6e93a6d1795e89d954">setweight(vector tsvector, weight "char") returns tsvector<p id="EN-US_TOPIC_0000001188270498__ab256b531386c43789c84c2d1bac2c032"><a name="EN-US_TOPIC_0000001188270498__l0daa2a6211744f6e93a6d1795e89d954"></a><a name="l0daa2a6211744f6e93a6d1795e89d954"></a><strong id="EN-US_TOPIC_0000001188270498__en-us_topic_0085033154_b84235270616820">setweight</strong> returns a copy of the input tsvector in which every position has been labeled with the given weight, either <strong id="EN-US_TOPIC_0000001188270498__b84235270616837">A</strong>, <strong id="EN-US_TOPIC_0000001188270498__b84235270616840">B</strong>, <strong id="EN-US_TOPIC_0000001188270498__b84235270616841">C</strong>, or <strong id="EN-US_TOPIC_0000001188270498__b84235270616844">D</strong>. (<strong id="EN-US_TOPIC_0000001188270498__b84235270616848">D</strong> is the default for new tsvectors and as such is not displayed on output.) These labels are retained when tsvectors are concatenated, allowing words from different parts of a document to be weighted differently by ranking functions.</p>
<div class="notice" id="EN-US_TOPIC_0000001188270498__n2ec70095fc7346c7b25c5fa4a41bce17"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="EN-US_TOPIC_0000001188270498__a833e1cc7939348ad9408f8e8e1d340af">Note that weight labels apply to positions, not lexemes. If the input tsvector has been stripped of positions then <strong id="EN-US_TOPIC_0000001188270498__b84235270616930">setweight</strong> does nothing.</p>
</div></div>
</li><li id="EN-US_TOPIC_0000001188270498__led887f097d244ea8a2ff822ff105060c">length(vector tsvector) returns integer<p id="EN-US_TOPIC_0000001188270498__abe3a6153cf984caabe98980bc832f3c7"><a name="EN-US_TOPIC_0000001188270498__led887f097d244ea8a2ff822ff105060c"></a><a name="led887f097d244ea8a2ff822ff105060c"></a>Returns the number of lexemes stored in the vector.</p>
</li><li id="EN-US_TOPIC_0000001188270498__le553a5896e564ade899b13f31ce9bfbc">strip(vector tsvector) returns tsvector<p id="EN-US_TOPIC_0000001188270498__ac4dc095df9d342dd9a2a345cb50d8e31"><a name="EN-US_TOPIC_0000001188270498__le553a5896e564ade899b13f31ce9bfbc"></a><a name="le553a5896e564ade899b13f31ce9bfbc"></a>Returns a tsvector which lists the same lexemes as the given tsvector, but which lacks any position or weight information. While the returned tsvector is much less useful than an unstripped tsvector for relevance ranking, it will usually be much smaller.</p>
</li></ul>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0096.html">Additional Features</a></div>
</div>
</div>