Overview

A dictionary is used to define stop words, that is, words to be ignored in full-text retrieval.

A dictionary can also be used to normalize words so that different derived forms of the same word will match. A normalized word is called a lexeme.

In addition to improving retrieval quality, normalization and removal of stop words can reduce the size of the tsvector representation of a document, thereby improving performance. Normalization and removal of stop words do not always have linguistic meaning. Users can define normalization and removal rules in dictionary definition files based on application environments.

A dictionary is a program that receives a token as input and returns:

GaussDB(DWS) provides predefined dictionaries for many languages and also provides five predefined dictionary templates, Simple, Synonym, Thesaurus, Ispell, and Snowball. These templates can be used to create new dictionaries with custom parameters.

When using full-text retrieval, you are advised to: