doc-exports/docs/dws/dev/dws_06_0095.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

58 lines
12 KiB
HTML

<a name="EN-US_TOPIC_0000001233708647"></a><a name="EN-US_TOPIC_0000001233708647"></a>
<h1 class="topictitle1">Highlighting Results</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001233708647__en-us_topic_0059778945_p152227101524">To present search results it is ideal to show a part of each document and how it is related to the query. Usually, search engines show fragments of the document with marked search terms. <span id="EN-US_TOPIC_0000001233708647__text1482301294">GaussDB(DWS)</span> provides function <strong id="EN-US_TOPIC_0000001233708647__b1499164010432">ts_headline</strong> that implements this functionality.</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233708647__s81f397fd76044c91802fc474c7f1e724"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">ts_headline</span><span class="p">([</span><span class="w"> </span><span class="n">config</span><span class="w"> </span><span class="n">regconfig</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="n">document</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="n">tsquery</span><span class="w"> </span><span class="p">[,</span><span class="w"> </span><span class="k">options</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="p">])</span><span class="w"> </span><span class="k">returns</span><span class="w"> </span><span class="nb">text</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001233708647__a55087febd1444dec9c9587d03b0fcb44"><strong id="EN-US_TOPIC_0000001233708647__b84235270615107">ts_headline</strong> accepts a document along with a query, and returns an excerpt from the document in which terms from the query are highlighted. The configuration to be used to parse the document can be specified by <strong id="EN-US_TOPIC_0000001233708647__b842352706151021">config</strong>. If <strong id="EN-US_TOPIC_0000001233708647__b842352706151033">config</strong> is omitted, the <strong id="EN-US_TOPIC_0000001233708647__b842352706151037">default_text_search_config</strong> configuration is used.</p>
<p id="EN-US_TOPIC_0000001233708647__a11f53198d7fb4a1e83929faa7cc935e4">If an options string is specified it must consist of a comma-separated list of one or more <strong id="EN-US_TOPIC_0000001233708647__b842352706151057">option=value</strong> pairs. The available options are:</p>
<ul id="EN-US_TOPIC_0000001233708647__u4c4185014119447da426654aaae47d39"><li id="EN-US_TOPIC_0000001233708647__lc1dd14dbe09a4ba68ffc532e5856e14e"><strong id="EN-US_TOPIC_0000001233708647__b842352706151120">StartSel</strong>, <strong id="EN-US_TOPIC_0000001233708647__b842352706151122">StopSel</strong>: The strings with which to delimit query words appearing in the document, to distinguish them from other excerpted words. You must double-quote these strings if they contain spaces or commas.</li><li id="EN-US_TOPIC_0000001233708647__lb62a860a9fb448c7911601f6cfd7de3d"><strong id="EN-US_TOPIC_0000001233708647__b406685480151156">MaxWords</strong>, <strong id="EN-US_TOPIC_0000001233708647__b1109039259151156">MinWords</strong>: These numbers determine the longest and shortest headlines to output.</li></ul>
<ul id="EN-US_TOPIC_0000001233708647__u272f396138f64e7d91bb2a451e67d5fb"><li id="EN-US_TOPIC_0000001233708647__l002e99725cf24cbc950d78a6ed7d97d8"><strong id="EN-US_TOPIC_0000001233708647__b842352706151226">ShortWord</strong>: Words of this length or less will be dropped at the start and end of a headline. The default value of three eliminates common English articles.</li></ul>
<ul id="EN-US_TOPIC_0000001233708647__u232c5cf5f6a74706b0fa92baf062c6fd"><li id="EN-US_TOPIC_0000001233708647__lfa3a74659c3f4c0e80e535c4943914d9"><strong id="EN-US_TOPIC_0000001233708647__b842352706151310">HighlightAll</strong>: Boolean flag. If the value is <strong id="EN-US_TOPIC_0000001233708647__b131070482515">true</strong>, the entire document is used as an excerpt, ignoring the values of the first three parameters.</li></ul>
<ul id="EN-US_TOPIC_0000001233708647__u7a763dff66bd4176b7fd81380b930e4e"><li id="EN-US_TOPIC_0000001233708647__ldb51a6de6eb94d479adc9c1eda681268"><strong id="EN-US_TOPIC_0000001233708647__en-us_topic_0058965961_b842352706151420">MaxFragments</strong>: Maximum number of text excerpts or fragments to display. The default value of zero selects a non-fragment-oriented headline generation method. A value greater than zero selects fragment-based headline generation. This method finds text fragments with as many query words as possible and stretches those fragments around the query words. As a result query words are close to the middle of each fragment and have words on each side. Each fragment will be of at most <strong id="EN-US_TOPIC_0000001233708647__b1351527185214">MaxWords</strong> and words of length <strong id="EN-US_TOPIC_0000001233708647__b187491529135210">ShortWord</strong> or less are dropped at the start and end of each fragment. If not all query words are found in the document, then a single fragment of the first <strong id="EN-US_TOPIC_0000001233708647__b842352706152644">MinWords</strong> in the document will be displayed.</li></ul>
<ul id="EN-US_TOPIC_0000001233708647__u4bf83115959f4a9fa8b3e028d4f72da6"><li id="EN-US_TOPIC_0000001233708647__l5b7c6250b29e460485a3f32acb6eb0d1"><strong id="EN-US_TOPIC_0000001233708647__b842352706152656">FragmentDelimiter</strong>: When more than one fragment is displayed, the fragments will be separated by this string.</li></ul>
<p id="EN-US_TOPIC_0000001233708647__a3e4f084e7a3b4808b11505d416caa382">Any unspecified options receive these defaults:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233708647__s3ff294c593424a96a459e0a6c553e581"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">StartSel</span><span class="o">=&lt;</span><span class="n">b</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="n">StopSel</span><span class="o">=&lt;/</span><span class="n">b</span><span class="o">&gt;</span><span class="p">,</span>
<span class="n">MaxWords</span><span class="o">=</span><span class="mi">35</span><span class="p">,</span><span class="w"> </span><span class="n">MinWords</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span><span class="w"> </span><span class="n">ShortWord</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">HighlightAll</span><span class="o">=</span><span class="k">FALSE</span><span class="p">,</span>
<span class="n">MaxFragments</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">FragmentDelimiter</span><span class="o">=</span><span class="ss">&quot; ... &quot;</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001233708647__af6a35175f89d4436b2d1f8bafd2894a6">For example:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233708647__s17f4462ca4f249a989b1d9cf7cf8763c"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">ts_headline</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="s1">'The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.'</span><span class="p">,</span><span class="n">to_tsquery</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="w"> </span><span class="s1">'query &amp; similarity'</span><span class="p">));</span>
<span class="w"> </span><span class="n">ts_headline</span>
<span class="c1">--------------------------------------------------------------------------------------------------------------</span>
<span class="w"> </span><span class="n">containing</span><span class="w"> </span><span class="n">given</span><span class="w"> </span><span class="o">&lt;</span><span class="n">b</span><span class="o">&gt;</span><span class="n">query</span><span class="o">&lt;/</span><span class="n">b</span><span class="o">&gt;</span><span class="w"> </span><span class="n">terms</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">them</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">order</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">their</span><span class="w"> </span><span class="o">&lt;</span><span class="n">b</span><span class="o">&gt;</span><span class="n">similarity</span><span class="o">&lt;/</span><span class="n">b</span><span class="o">&gt;</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="o">&lt;</span><span class="n">b</span><span class="o">&gt;</span><span class="n">query</span><span class="o">&lt;/</span><span class="n">b</span><span class="o">&gt;</span><span class="p">.</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">ts_headline</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="s1">'The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.'</span><span class="p">,</span><span class="n">to_tsquery</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span><span class="w"> </span><span class="s1">'query &amp; similarity'</span><span class="p">),</span><span class="s1">'StartSel = &lt;, StopSel = &gt;'</span><span class="p">);</span>
<span class="w"> </span><span class="n">ts_headline</span>
<span class="c1">-----------------------------------------------------------------------------------------------</span>
<span class="w"> </span><span class="n">containing</span><span class="w"> </span><span class="n">given</span><span class="w"> </span><span class="o">&lt;</span><span class="n">query</span><span class="o">&gt;</span><span class="w"> </span><span class="n">terms</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">them</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">order</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">their</span><span class="w"> </span><span class="o">&lt;</span><span class="n">similarity</span><span class="o">&gt;</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="o">&lt;</span><span class="n">query</span><span class="o">&gt;</span><span class="p">.</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001233708647__adff221c6d1504e2196ddfe321a04803b"><strong id="EN-US_TOPIC_0000001233708647__b84235270615312">ts_headline</strong> uses the original document, not a <strong id="EN-US_TOPIC_0000001233708647__b842352706153117">tsvector</strong> summary, so it can be slow and should be used with care.</p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0091.html">Controlling Text Search</a></div>
</div>
</div>