When you perform operations in Creating a Vector Index, if IVF_GRAPH and IVF_GRAPH_PQ index algorithms are selected, you need to pre-build and register the center point vector.
The vector index acceleration algorithms IVF_GRAPH and IVF_GRAPH_PQ are suitable for ultra-large-scale computing. These two algorithms allow you to narrow down the query range by dividing a vector space into subspaces through clustering or random sampling. Before pre-build, you need to obtain all center point vectors by clustering or random sampling.
Then, pre-construct and register the center point vectors to create the GRAPH or GRAPH_PQ index and register them with the Elasticsearch cluster. All nodes in the cluster can share the index file. Reuse of the center index among shards can effectively reduce the training overhead and the number of center index queries, improving the write and query performance.
PUT my_dict { "settings": { "index": { "vector": true }, "number_of_shards": 1, "number_of_replicas": 0 }, "mappings": { "properties": { "my_vector": { "type": "vector", "dimension": 2, "indexing": true, "algorithm": "GRAPH", "metric": "euclidean" } } } }
Write the center point vector obtained through sampling or clustering into the created my_dict index by referring to Importing Vector Data.
Register the created my_dict index with a Dict object with a globally unique identifier name (dict_name).
PUT _vector/register/my_dict { "dict_name": "my_dict" }
You do not need to specify the dimension and metric information. Simply specify the registered dictionary name.
PUT my_index { "settings": { "index": { "vector": true } }, "mappings": { "properties": { "my_vector": { "type": "vector", "indexing": true, "algorithm": "IVF_GRAPH", "dict_name": "my_dict", "offload_ivf": false } } } }
Parameter |
Description |
---|---|
dict_name |
Specifies the name of the depended central point index. The vector dimension and measurement metric of the index are the same as those of the Dict index. |
offload_ivf |
Unloads the IVF inverted index implemented by the underlying index to Elasticsearch. In this way, the use of non-heap memory and the overhead of write and merge operations are reduced. However, the query performance also deteriorates. You can use the default value. Value: true or false Default value: false |