Overview
Vector search indexes in Firebolt enable fast similarity search across high-dimensional vector embeddings using the HNSW (Hierarchical Navigable Small World) algorithm. These indexes are designed for use cases like semantic search, recommendation systems, and AI applications where you need to find vectors that are similar (i.e., in close distance) to a query vector. Unlike traditional exact search methods, vector search indexes provide approximate nearest neighbor (ANN) search results, trading off some precision for significantly faster query performance. This approach is well-suited for machine learning applications where finding the top-k most similar items is more important than guaranteeing mathematically exact results.Vector search indexes provide approximate nearest neighbor results, not exact
matches. The quality depends on index parameters and dataset characteristics.
Syntax
Create a vector search index
Multiple vector search indexes can be created per table and column. The only requirement is a unique name per index - otherwise, indexes can reference the same column and have the same configuration.Parameters
| Parameter | Description |
|---|---|
<index_name> | A unique name for the vector search index. |
<table_name> | The name of the table on which the index is created. |
<column_name> | The name of the column that holds the embeddings which should be indexed. |
<distance_metric> | The distance operation that is used to compute distance between vectors. Supported are: vector_cosine_ops, vector_ip_ops, vector_l2sq_ops. See section distance metric for more information. |
<dimension> | The number of dimensions in the vector embeddings. This is enforced during ingest. |
m (optional) | The maximum number of connections per node in the HNSW graph. Default is 16. See section connectivity for more information. |
ef_construction (optional) | The size of the dynamic candidate list during index construction. Default is 128. See section ef_construction for more information. |
quantization (optional) | The quantization method for compressing vectors. Default is 'bf16'. Supported are: 'bf16', 'f16', 'f32', 'f64', 'i8'. See section quantization for more information. |
Firebolt builds vector search indexes per tablet and maintains/recreates these
when the table is updated (i.e., insert, update, vacuum, delete), ensuring it
stays up-to-date for queries.
Use a vector search index
To use a vector search index, you must explicitly reference it by name via thevector_search() table-valued function (TVF).
Named parameter syntax is supported in version 4.29
Parameters
| Parameter | Description |
|---|---|
<index_name> | The name of the vector search index to use for finding the nearest vectors. |
<target_vector> | The target vector for which the closest vector should be found. Must be a scalar value inclduing scalar subqueries or literals. Before version 4.29 the target vector must be a literal that is explicitly casted to ARRAY(DOUBLE) |
<top_k> | The number of nearest vectors to return from the index. |
ef_search (optional) | A hyperparameter that controls the quality and recall of the search. See ef_search for more details. Default is 64. Mandatory argument before version 4.29 |
load_strategy (optional) | Specifies how the index is loaded and managed in memory. Default is 'in_memory'. Supported values:- 'in_memory': The entire index is loaded into memory and cached, providing optimal performance for repeated queries regardless of the target vector, as the full index structure is already available. If the engine does not have enough memory to hold the entire index, eviction and reloading from disk may occur, impacting performance. For best results, ensure the index fits within the engine’s in-memory vector index cache.- 'disk': The index is read from disk and kept in memory on a best-effort basis. Under memory pressure, the index may be evicted and must be reloaded from disk as needed. Different target vectors may require loading different parts of the index, so performance can vary depending on workload and query patterns.Available in version 4.29 |
Drop a vector search index
Dropping a vector search index viaDROP INDEX <index_name> is a pure metadata operation and will not free up memory on storage level.
We recommend running VACUUM on the table after the index has been dropped.
Alter a vector search index
The only alter operation that is supported on a vector search index isRENAME TO:
Limitations
Creating and using vector search indexes currently has several limitations that are planned to be addressed in future releases.- The embedding column on which the vector search index is created on must be of the following data type:
ARRAY([real,float] NOT NULL) NOT NULL.
- only
realandfloat/doubleare supported as the array’s nested type - nullability is not supported, neither inner nor outer nullability
- Creating vector search indexes is only supported on empty tables.
This limitation was removed in version
4.29. More information about how indexes behave on populated tables can be found here.- the indexes must be created before data is inserted into the table
- creating an index on a populated table will fail
- Once created, the index configuration (e.g., dimension, ef_construction, etc.) cannot be changed. The index must be dropped and recreated if vector dimensions change.
-
Before version 4.29 the target vector in the
vector_search()TVF must be an array literal cast to::DOUBLE[]. E.g.,vector_search(INDEX <index_name>, [0.1, 0.2, ..., 0.256]::double[], 10, 16)
Performance & Observability
Achieving optimal performance for vector search queries using vector indexes involves two points of optimization: the index and the table. The index can be configured in different ways to serve better precision or performance - this impacts build time and memory usage. An index optimized for better search performance will result in faster lookup times for the closest K tuples. However, the index only provides row numbers of the tuples that need to be loaded from the table (i.e., pointers to where the data is stored). Optimizing the table for data access is therefore another critical point of optimization, outlined below.Engine Sizing
For optimal search for performance, it is required that the whole vector index fits into memory. Once a part of an index is loaded into memory, it will be cached and kept in memory to allow fast search time for the next query. To ensure that the index can be kept fully in memory, the engine must be sized properly to have enough main memory. To determine how much memory you need, check the size of the index viainformation_schema.indexes
For optimal performance, it is required that the whole index fits into the in-memory cache which is reserved for vector search indexes. This cache is limited to 70% of the engine’s memory.
Therefore, you engine should have roughly 1.5x the index’s size as main memory. E.g., if the index is reported to have a size of 250 GiB, you should choose an engine with at least 350 GiB - a 3M storage optimized engine would be a good choice (3x 128GiB main memory).
information_schema.engine_query_history or via the EXPLAIN (ANALYZE) option, you can observe (1) how many parts of the index were fetched from S3 and (2) how many parts of the index had to be read from disk. For optimal performance, you will want both of these numbers to be 0.
Distance Metric
Three different distance metrics are available for use in vector search indexes to determine the distance between vectors:vector_cosine_opscosine distancevector_ip_opsinner productvector_l2sq_opssquared L2 (euclidean) distance
M (Connectivity)
TheM parameter during vector search index creation defines the number of edges each vertex in the graph structure has - that is, the number of nearest neighbors that each inserted vector will connect to.
Higher M values improve search quality but increase memory usage and index build time. The impact on index search time is minimal.
- Memory usage scales approximately linearly with the
Mfactor - Insert time per tuple scales approximately linearly with the
Mfactor. Each insertion requires more comparisons to establish links - A larger
Mvalue can make a big difference in recall performance - reduces the chance of search getting trapped in local minima - A larger
Mvalue can speed up search as it can allow faster traversal through the graph’s “shortcuts” (this depends on how well the graph is structured during construction) - A larger dataset may require higher M value for expected recall but will incur higher memory costs
EF_CONSTRUCTION
Theef_construction parameter defines the quality of inserts into the index. The higher the value, the more nodes will be explored during the insert to find the nearest neighbors, which leads to a higher-quality graph and better recall.
- Increases build time
- Memory usage is not affected
- Has a negligible effect on search time
- A higher
EF_CONSTRUCTIONvalue should achieve higher recall at lowerMandEF_SEARCHvalues
EF_SEARCH
TheEF_SEARCH parameter defines the quality of search on the index. The higher the value, the more nodes will be explored during the search to find the nearest neighbors, which improves the overall recall performance.
It is the only parameter that can be changed after the index is created, as it only applies to the scope of the individual query that uses the index.
- Memory usage is not affected
- A larger
EF_SEARCHvalue can make a difference in recall performance
Quantization
Quantization is the process of converting high-precision data into a lower-precision, discrete representation. The quantization setting defines which internal, lower-precision, discrete representation the high-precision input data is converted to (e.g., which data type is used in the index to store the vectors). A smaller data type requires less memory but may impact the quality of the index and thus recall performance. This is particularly relevant when vector clusters are very dense, as precision loss in the floating-point representation will decrease recall. Supported types are:bf16: 16-bit (brain) floating point developed by Google Brain. It is optimized for fast, large-scale numeric tasks where preserving the range is more important than fine-grained precision.f16: 16-bit floating pointf32: 32-bit floating point, equal to the SQL type realf64: 64-bit floating point, equal to the SQL type double precisioni8: 8-bit integer (this quantization is only support with cosine-like metrics)
Table index_granularity
The table’s index_granularity defines the maximum number of rows per granule, which directly impacts how data is retrieved. The default granularity is approximately 8,000 rows per granule. It is likely that the top K closest vectors are not stored in the same granule despite being semantically close to each other. Therefore, decreasing the index_granularity can improve performance. Our experiments have shown that decreasing the index_granularity to128 resulted in ~50x fewer scanned rows and an overall query performance boost of 40%.
Index creation on populated tables
Creating indexes on populated tables is only supported with version
4.29.vector_search() TVF. During query execution, a hybrid search is performed that accounts for both indexed and non-indexed tablets.
This can result in noticeable performance impacts, as non-indexed data must be fully scanned.
To ensure all data is properly indexed after creating an index on a populated table, you must manually execute VACUUM ( REINDEX = TRUE).
VACUUM and REINDEX
To ensure all data is properly indexed, runVACUUM ( REINDEX = TRUE). This command updates all existing tablets that are not yet indexed.
Regular execution of VACUUM is recommended to ensure that indexes are merged and optimized. This is particularly important for workloads with frequent small data ingests (trickle ingests), as it can significantly impact query performance if the base table contains too tablets.
Benchmarks
Vector search indexes trade some precision for significantly faster query performance. The internal lookup structure, that allows the fast search performance, is complex and requires a lot computation be build. This means, inserts into tables which hold a vector index take much more time than without an index. In our testing we experienced the following latencies: Setup:- 5M storage optimized engine
435,000,000embeddings of dimension 256430 GiBuncompressed data
- Insert:
~3 hours(the index is build as part of the insert) - Cold
LIMIT 1000query:~37 seconds(loads the vector index from cloud storage onto the engine and caches them) - Hot
LIMIT 1000query:~0.3 seconds(vector indexes are cached)