doc-exports/docs/dws/dev/dws_06_0113.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

104 lines
16 KiB
HTML

<a name="EN-US_TOPIC_0000001188110558"></a><a name="EN-US_TOPIC_0000001188110558"></a>
<h1 class="topictitle1">Testing a Parser</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001188110558__a2105cb99b3724de5889c5023b11ae870">The <strong id="EN-US_TOPIC_0000001188110558__b842352706185911">ts_parse</strong> function allows direct testing of a text search parser.</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110558__s909467d319a04280b4810393501f4f2c"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">ts_parse</span><span class="p">(</span><span class="n">parser_name</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="n">document</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
<span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">tokid</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">returns</span><span class="w"> </span><span class="k">setof</span><span class="w"> </span><span class="n">record</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188110558__aa6b6695032944e3b9362bb2fe3ef2f05"><strong id="EN-US_TOPIC_0000001188110558__b842352706185959">ts_parse</strong> parses the given <strong id="EN-US_TOPIC_0000001188110558__b8423527061903">document</strong> and returns a series of records, one for each token produced by parsing. Each record includes a <strong id="EN-US_TOPIC_0000001188110558__b84235270619111">tokid</strong> showing the assigned token type and a <strong id="EN-US_TOPIC_0000001188110558__b84235270619118">token</strong> which is the text of the token. For example:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110558__s2a51d76ab59842f3885a3e4aacc5e104"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_parse</span><span class="p">(</span><span class="s1">'default'</span><span class="p">,</span><span class="w"> </span><span class="s1">'123 - a number'</span><span class="p">);</span>
<span class="w"> </span><span class="n">tokid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">token</span>
<span class="c1">-------+--------</span>
<span class="w"> </span><span class="mi">22</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">123</span>
<span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">-</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">a</span>
<span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">number</span>
<span class="p">(</span><span class="mi">6</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110558__sec58b4a85814469e883b237c185511d2"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">ts_token_type</span><span class="p">(</span><span class="n">parser_name</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">tokid</span><span class="w"> </span><span class="nb">integer</span><span class="p">,</span>
<span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="k">alias</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">description</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">returns</span><span class="w"> </span><span class="k">setof</span><span class="w"> </span><span class="n">record</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188110558__a361e4e7780d940d8932ae87996796d12"><strong id="EN-US_TOPIC_0000001188110558__b84235270619338">ts_token_type</strong> returns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integer <strong id="EN-US_TOPIC_0000001188110558__b84235270619549">tokid</strong> that the parser uses to label a token of that type, the <strong id="EN-US_TOPIC_0000001188110558__b84235270619446">alias</strong> that names the token type in configuration commands, and a short description. For example:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188110558__s1a7998d6bc4e41229ce8b155fdbb97a3"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ts_token_type</span><span class="p">(</span><span class="s1">'default'</span><span class="p">);</span>
<span class="w"> </span><span class="n">tokid</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">alias</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">description</span><span class="w"> </span>
<span class="c1">-------+-----------------+------------------------------------------</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">asciiword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Word</span><span class="p">,</span><span class="w"> </span><span class="k">all</span><span class="w"> </span><span class="n">ASCII</span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Word</span><span class="p">,</span><span class="w"> </span><span class="k">all</span><span class="w"> </span><span class="n">letters</span>
<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">numword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Word</span><span class="p">,</span><span class="w"> </span><span class="n">letters</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">digits</span>
<span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">email</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Email</span><span class="w"> </span><span class="n">address</span>
<span class="w"> </span><span class="mi">5</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">URL</span>
<span class="w"> </span><span class="mi">6</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">host</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Host</span>
<span class="w"> </span><span class="mi">7</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">sfloat</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Scientific</span><span class="w"> </span><span class="n">notation</span>
<span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">version</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Version</span><span class="w"> </span><span class="nb">number</span>
<span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">hword_numpart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Hyphenated</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="n">part</span><span class="p">,</span><span class="w"> </span><span class="n">letters</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">digits</span>
<span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">hword_part</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Hyphenated</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="n">part</span><span class="p">,</span><span class="w"> </span><span class="k">all</span><span class="w"> </span><span class="n">letters</span>
<span class="w"> </span><span class="mi">11</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">hword_asciipart</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Hyphenated</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="n">part</span><span class="p">,</span><span class="w"> </span><span class="k">all</span><span class="w"> </span><span class="n">ASCII</span>
<span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">blank</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Space</span><span class="w"> </span><span class="n">symbols</span>
<span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">tag</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XML</span><span class="w"> </span><span class="n">tag</span>
<span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">protocol</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Protocol</span><span class="w"> </span><span class="n">head</span>
<span class="w"> </span><span class="mi">15</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">numhword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Hyphenated</span><span class="w"> </span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="n">letters</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">digits</span>
<span class="w"> </span><span class="mi">16</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">asciihword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Hyphenated</span><span class="w"> </span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="k">all</span><span class="w"> </span><span class="n">ASCII</span>
<span class="w"> </span><span class="mi">17</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">hword</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Hyphenated</span><span class="w"> </span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="k">all</span><span class="w"> </span><span class="n">letters</span>
<span class="w"> </span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">url_path</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">URL</span><span class="w"> </span><span class="n">path</span>
<span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">File</span><span class="w"> </span><span class="k">or</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="n">name</span>
<span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">float</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">Decimal</span><span class="w"> </span><span class="n">notation</span>
<span class="w"> </span><span class="mi">21</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Signed</span><span class="w"> </span><span class="nb">integer</span>
<span class="w"> </span><span class="mi">22</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Unsigned</span><span class="w"> </span><span class="nb">integer</span>
<span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">entity</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">XML</span><span class="w"> </span><span class="n">entity</span>
<span class="p">(</span><span class="mi">23</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0111.html">Testing and Debugging Text Search</a></div>
</div>
</div>