Vector search indexes are still in private preview. For your own safety, we recommend the following guidelines:
- Use a dedicated database to create and use vector search indexes. If anything breaks, dropping the database will do a full cleanup.
- Don’t use older or other engine versions to execute DML operations (e.g., inserts or drops) against tables which hold vector search indexes.
Overview
Vector search indexes in Firebolt enable fast similarity search across high-dimensional vector embeddings using the HNSW (Hierarchical Navigable Small World) algorithm. These indexes are designed for use cases like semantic search, recommendation systems, and AI applications where you need to find vectors that are similar (i.e., in close distance) to a query vector. Unlike traditional exact search methods, vector search indexes provide approximate nearest neighbor (ANN) search results, trading off some precision for significantly faster query performance. This approach is well-suited for machine learning applications where finding the top-k most similar items is more important than guaranteeing mathematically exact results.Vector search indexes provide approximate nearest neighbor results, not exact
matches. The quality depends on index parameters and dataset characteristics.
Syntax
Create a vector search index
Multiple vector search indexes can be created per table and column. The only requirement is a unique name per index - otherwise, indexes can reference the same column and have the same configuration.Parameters
Parameter | Description |
---|---|
<index_name> | A unique name for the vector search index. |
<table_name> | The name of the table on which the index is created. |
<column_name> | The name of the column that holds the embeddings which should be indexed. |
<distance_metric> | The distance operation that is used to compute distance between vectors. Supported are: vector_cosine_ops , vector_ip_ops , vector_l2sq_ops . See section distance metric for more information. |
<dimension> | The number of dimensions in the vector embeddings. This is enforced during ingest. |
m (optional) | The maximum number of connections per node in the HNSW graph. Default is 16 . See section connectivity for more information. |
ef_construction (optional) | The size of the dynamic candidate list during index construction. Default is 128 . See section ef_construction for more information. |
quantization (optional) | The quantization method for compressing vectors. Default is 'bf16' . Supported are: 'bf16' , 'f16' , 'f32' , 'f64' . See section quantization for more information. |
Firebolt builds vector search indexes per tablet and maintains/recreates these
when the table is updated (i.e., insert, update, vacuum, delete), ensuring it
stays up-to-date for queries.
Use a vector search index
To use a vector search index, you must explicitly reference it by name via thevector_search()
table-valued function (TVF).
Parameters
Parameter | Description |
---|---|
<index_name> | The name of the index that will be used to find the closets vectors. |
<target_vector_literal> | The target vector for which the closest vector should be found. |
<top_k> | The number of closest vectors that are returned from the index. Default is 10 |
<ef_search> | Hyperparameter controlling the quality of the search. See ef_search for more information. |
Drop a vector search index
Dropping a vector search index viaDROP INDEX <index_name>
is a pure metadata operation and will not free up memory on storage level.
We recommend running VACUUM
on the table after the index has been dropped.
Alter a vector search index
The only alter operation that is supported on a vector search index isRENAME TO
:
Limitations
Creating and using vector search indexes currently has several limitations that are planned to be addressed in future releases.- The embedding column on which the vector search index is created on must be of the following data type:
ARRAY([real,float] NOT NULL) NOT NULL
.
- only
real
andfloat/double
are supported as the array’s nested type - nullability is not supported, neither inner nor outer nullability
- Creating vector search indexes is only supported on empty tables.
- the indexes must be created before data is inserted into the table
- creating an index on a populated table will fail
- Once created, the index configuration (e.g., dimension, ef_construction, etc.) cannot be changed. The index must be dropped and recreated if vector dimensions change.
-
The target vector in the
vector_search()
TVF must be an array literal cast to::DOUBLE[]
. E.g.,vector_search(INDEX <index_name>, [0.1, 0.2, ..., 0.256]::double[], 10, 16)
Performance & Observability
Achieving optimal performance for vector search queries using vector indexes involves two points of optimization: the index and the table. The index can be configured in different ways to serve better precision or performance - this impacts build time and memory usage. An index optimized for better search performance will result in faster lookup times for the closest K tuples. However, the index only provides row numbers of the tuples that need to be loaded from the table (i.e., pointers to where the data is stored). Optimizing the table for data access is therefore another critical point of optimization, outlined below.Engine Sizing
For optimal search for performance, it is required that the whole vector index fits into memory. Once a part of an index is loaded into memory, it will be cached and kept in memory to allow fast search time for the next query. To ensure that the index can be kept fully in memory, the engine must be sized properly to have enough main memory. To determine how much memory you need, check the size of the index viainformation_schema.indexes
For optimal performance, it is required that the whole index fits into the in-memory cache which is reserved for vector search indexes. This cache is limited to 70% of the engine’s memory.
Therefore, you engine should have roughly 1.5x the index’s size as main memory. E.g., if the index is reported to have a size of 250 GiB, you should choose an engine with at least 350 GiB - a 3M storage optimized engine would be a good choice (3x 128GiB main memory).
information_schema.engine_query_history
or via the EXPLAIN (ANALYZE)
option, you can observe (1) how many parts of the index were fetched from S3 and (2) how many parts of the index had to be read from disk. For optimal performance, you will want both of these numbers to be 0.
Distance Metric
Three different distance metrics are available for use in vector search indexes to determine the distance between vectors:vector_cosine_ops
cosine distancevector_ip_ops
inner productvector_l2sq_ops
squared L2 (euclidean) distance
M (Connectivity)
TheM
parameter during vector search index creation defines the number of edges each vertex in the graph structure has - that is, the number of nearest neighbors that each inserted vector will connect to.
Higher M
values improve search quality but increase memory usage and index build time. The impact on index search time is minimal.
- Memory usage scales approximately linearly with the
M
factor - Insert time per tuple scales approximately linearly with the
M
factor. Each insertion requires more comparisons to establish links - A larger
M
value can make a big difference in recall performance - reduces the chance of search getting trapped in local minima - A larger
M
value can speed up search as it can allow faster traversal through the graph’s “shortcuts” (this depends on how well the graph is structured during construction) - A larger dataset may require higher M value for expected recall but will incur higher memory costs
EF_CONSTRUCTION
Theef_construction
parameter defines the quality of inserts into the index. The higher the value, the more nodes will be explored during the insert to find the nearest neighbors, which leads to a higher-quality graph and better recall.
- Increases build time
- Memory usage is not affected
- Has a negligible effect on search time
- A higher
EF_CONSTRUCTION
value should achieve higher recall at lowerM
andEF_SEARCH
values
EF_SEARCH
TheEF_SEARCH
parameter defines the quality of search on the index. The higher the value, the more nodes will be explored during the search to find the nearest neighbors, which improves the overall recall performance.
It is the only parameter that can be changed after the index is created, as it only applies to the scope of the individual query that uses the index.
- Memory usage is not affected
- A larger
EF_SEARCH
value can make a difference in recall performance
Quantization
Quantization is the process of converting high-precision data into a lower-precision, discrete representation. The quantization setting defines which internal, lower-precision, discrete representation the high-precision input data is converted to (e.g., which data type is used in the index to store the vectors). A smaller data type requires less memory but may impact the quality of the index and thus recall performance. This is particularly relevant when vector clusters are very dense, as precision loss in the floating-point representation will decrease recall. Supported types are:bf16
: 16-bit (brain) floating point developed by Google Brain. It is optimized for fast, large-scale numeric tasks where preserving the range is more important than fine-grained precision.f16
: 16-bit floating pointf32
: 32-bit floating point, equal to the SQL type realf64
: 64-bit floating point, equal to the SQL type double precision
Table index_granularity
The table’s index_granularity defines the maximum number of rows per granule, which directly impacts how data is retrieved. The default granularity is approximately 8,000 rows per granule. It is likely that the top K closest vectors are not stored in the same granule despite being semantically close to each other. Therefore, decreasing the index_granularity can improve performance. Our experiments have shown that decreasing the index_granularity to128
resulted in ~50x fewer scanned rows and an overall query performance boost of 40%.
Benchmarks
Vector search indexes trade some precision for significantly faster query performance. The internal lookup structure, that allows the fast search performance, is complex and requires a lot computation be build. This means, inserts into tables which hold a vector index take much more time than without an index. In our testing we experienced the following latencies: Setup:- 5M storage optimized engine
435,000,000
embeddings of dimension 256430 GiB
uncompressed data
- Insert:
~3 hours
(the index is build as part of the insert) - Cold
LIMIT 1000
query:~37 seconds
(loads the vector index from cloud storage onto the engine and caches them) - Hot
LIMIT 1000
query:~0.3 seconds
(vector indexes are cached)