Create an index named my_index that contains a vector field my_vector and a text field my_label. The vector field creates the graph index and uses Euclidean distance to measure similarity.
PUT my_index { "settings": { "index": { "vector": true } }, "mappings": { "properties": { "my_vector": { "type": "vector", "dimension": 2, "indexing": true, "algorithm": "GRAPH", "metric": "euclidean" }, "my_label": { "type": "text" } } } }
Type |
Parameter |
Description |
---|---|---|
Index settings parameters |
vector |
To use a vector index, set this parameter to true. |
Field mappings parameters |
type |
Field type, for example, vector. |
dimension |
Vector dimension. The default value is 768 and cannot be changed. Value range: [1, 4096] |
|
indexing |
Whether to enable vector index acceleration. The value can be:
Default value: false |
|
algorithm |
Index algorithm. This parameter is valid only when indexing is set to true. The value can be:
Default value: GRAPH
NOTE:
If IVF_GRAPH or IVF_GRAPH_PQ is specified, you need to pre-build and register a central point index. For details, see (Optional) Pre-Building and Registering a Center Point Vector. |
|
If Indexing is set to true, CSS provides optional parameters for vector search to achieve higher query performance or precision. |
||
metric |
Method of calculating the distance between vectors. The value can be:
Default value: euclidean |
|
dim_type |
Type of the vector dimension value. The value can be binary and float (default). |
Type |
Parameter |
Description |
---|---|---|
Graph index configuration parameters |
neighbors |
Number of neighbors of each vector in a graph index. The default value is 64. A larger value indicates higher query precision. A larger index results in a slower build and query speed. Value range: [10, 255] |
shrink |
Cropping coefficient during HNSW build. The default value is 1.0f. Value range: (0.1, 10) |
|
scaling |
Scaling ratio of the upper-layer graph nodes during HNSW build. The default value is 50. Value range: (0, 128] |
|
efc |
Queue size of the neighboring node during HNSW build. The default value is 200. A larger value indicates a higher precision and slower build speed. Value range: (0, 100000] |
|
max_scan_num |
Maximum number of nodes that can be scanned. The default value is 10000. A larger value indicates a higher precision and slower indexing speed. Value range: (0, 1000000] |
|
PQ index configuration parameters |
centroid_num |
Number of cluster centroids of each fragment. The default value is 255. Value range: (0, 65535] |
fragment_num |
Number of fragments. The default value is 0. The plug-in automatically sets the number of fragments based on the vector length. Value range: [0, 4096] |
Run the following command to import vector data. When writing vector data to the my_index index, you need to specify the vector field name and vector data.
POST my_index/_doc { "my_vector": [1.0, 2.0] }
POST my_index/_bulk {"index": {}} {"my_vector": [1.0, 2.0], "my_label": "red"} {"index": {}} {"my_vector": [2.0, 2.0], "my_label": "green"} {"index": {}} {"my_vector": [2.0, 3.0], "my_label": "red"}
Parameter |
Description |
---|---|
native.cache.circuit_breaker.enabled |
Whether to enable the circuit breaker for off-heap memory. Default value: true |
native.cache.circuit_breaker.cpu.limit |
Upper limit of off-heap memory usage of the vector index. For example, if the overall memory of a host is 128 GB and the heap memory occupies 31 GB, the default upper limit of the off-heap memory usage is 43.65 GB, that is, (128 - 31) x 45%. If the off-heap memory usage exceeds its upper limit, the circuit breaker will be triggered. Default value: 45% |
native.cache.expire.enabled |
Whether to enable the cache expiration policy. If this parameter is set to true, some cache items that have not been accessed for a long time will be cleared. Value: true or false Default value: false |
native.cache.expire.time |
Expiration time. Default value: 24h |
native.vector.index_threads |
Number of threads used for creating underlying indexes. Each shard uses multiple threads. Set a relatively small value to avoid resource preemption caused by the build queries of too many threads. Default value: 4 |