doc-exports/docs/dws/dev/dws_06_0084.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

41 lines
6.0 KiB
HTML

<a name="EN-US_TOPIC_0000001233510115"></a><a name="EN-US_TOPIC_0000001233510115"></a>
<h1 class="topictitle1">What Is a Document?</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001233510115__en-us_topic_0059777736_p650710510117">A document is the unit of searching in a full text search system; for example, a magazine article or email message. The text search engine must be able to parse documents and store associations of lexemes (keywords) with their parent document. Later, these associations are used to search for documents that contain query words.</p>
<p id="EN-US_TOPIC_0000001233510115__adb13274623254b67ae2d02b48919db8d">For searches within <span id="EN-US_TOPIC_0000001233510115__text178020191">GaussDB(DWS)</span>, a document is normally a textual column within a row of a database table, or possibly a combination (concatenation) of such columns, perhaps stored in several tables or obtained dynamically. In other words, document parts with indexes can be stored in different places. For example:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001233510115__sd7cd1c1e4a814b6cab6caf290e9e5415"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">d_dow</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="s1">'-'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">d_dom</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="s1">'-'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">d_fy_week_seq</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">identify_serials</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">tpcds</span><span class="p">.</span><span class="n">date_dim</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">d_fy_week_seq</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="n">identify_serials</span><span class="w"> </span>
<span class="c1">------------------</span>
<span class="w"> </span><span class="mi">5</span><span class="o">-</span><span class="mi">6</span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="mi">0</span><span class="o">-</span><span class="mi">8</span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="mi">2</span><span class="o">-</span><span class="mi">3</span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="mi">3</span><span class="o">-</span><span class="mi">4</span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="mi">4</span><span class="o">-</span><span class="mi">5</span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="mi">1</span><span class="o">-</span><span class="mi">2</span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="mi">6</span><span class="o">-</span><span class="mi">7</span><span class="o">-</span><span class="mi">1</span>
<span class="p">(</span><span class="mi">7</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span><span class="w"> </span>
</pre></div></td></tr></table></div>
</div>
<div class="notice" id="EN-US_TOPIC_0000001233510115__n296788f89a6d4b2b9ce15f1504c3bb27"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="EN-US_TOPIC_0000001233510115__a402756696df44b04ac96028fda43bfe4">Actually, in these example queries, <strong id="EN-US_TOPIC_0000001233510115__b842352706143326">coalesce</strong> should be used to prevent a single <strong id="EN-US_TOPIC_0000001233510115__b842352706143331">NULL</strong> attribute from causing a <strong id="EN-US_TOPIC_0000001233510115__b842352706143334">NULL</strong> result for the whole document.</p>
</div></div>
<p id="EN-US_TOPIC_0000001233510115__aba5b4f05c9fe476fb7f367a452b17ede">Another possibility is to store the documents as simple text files in the file system. In this case, the database can be used to store the full text index and to execute searches, and some unique identifier can be used to retrieve the document from the file system. However, retrieving files from outside the database requires system administrator permissions or special function support, so this is less convenient than keeping all the data inside the database. Also, keeping everything inside the database allows easy access to document metadata to assist in indexing and display. </p>
<p id="EN-US_TOPIC_0000001233510115__aa2dcba7ac9704a92a0dac4fb10e7dd8e">For text search purposes, each document must be reduced to the preprocessed <strong id="EN-US_TOPIC_0000001233510115__b842352706143617">tsvector</strong> format. Searching and relevance-based ranking are performed entirely on the <strong id="EN-US_TOPIC_0000001233510115__b842352706143631">tsvector</strong> representation of a document. The original text is retrieved only when the document has been selected for display to a user. We therefore often speak of the <strong id="EN-US_TOPIC_0000001233510115__b84235270614377">tsvector</strong> as being the document, but it is only a compact representation of the full document.</p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_06_0082.html">Introduction</a></div>
</div>
</div>