A dictionary is used to define stop words, that is, words to be ignored in full-text retrieval.
A dictionary can also be used to normalize words so that different derived forms of the same word will match. A normalized word is called a lexeme.
In addition to improving retrieval quality, normalization and removal of stop words can reduce the size of the tsvector representation of a document, thereby improving performance. Normalization and removal of stop words do not always have linguistic meaning. Users can define normalization and removal rules in dictionary definition files based on application environments.
A dictionary is a program that receives a token as input and returns:
An array of lexemes if the input token is known to the dictionary (note that one token can produce more than one lexeme).
An empty array if the input token is known to the dictionary but is a stop word.
GaussDB(DWS) provides predefined dictionaries for many languages and also provides five predefined dictionary templates, Simple, Synonym, Thesaurus, Ispell, and Snowball. These templates can be used to create new dictionaries with custom parameters.
When using full-text retrieval, you are advised to:
1 2 | ALTER TEXT SEARCH CONFIGURATION astro_en ADD MAPPING FOR asciiword WITH astro_syn, english_ispell, english_stem; |
A filtering dictionary can be placed anywhere in the list, except at the end where it would be useless. Filtering dictionaries are useful to partially normalize words to simplify the task of later dictionaries.