Vector search index [PRIVATE PREVIEW]

Vector search indexes are still in private preview. For your own safety, we recommend the following guidelines:

Use a dedicated database to create and use vector search indexes. If anything breaks, dropping the database will do a full cleanup.
Don’t use older or other engine versions to execute DML operations (e.g., inserts or drops) against tables which hold vector search indexes.

Overview

Vector search indexes in Firebolt enable fast similarity search across high-dimensional vector embeddings using the HNSW (Hierarchical Navigable Small World) algorithm. These indexes are designed for use cases like semantic search, recommendation systems, and AI applications where you need to find vectors that are similar (i.e., in close distance) to a query vector. Unlike traditional exact search methods, vector search indexes provide approximate nearest neighbor (ANN) search results, trading off some precision for significantly faster query performance. This approach is well-suited for machine learning applications where finding the top-k most similar items is more important than guaranteeing mathematically exact results.

Vector search indexes provide approximate nearest neighbor results, not exact matches. The quality depends on index parameters and dataset characteristics.

Syntax

Create a vector search index

Multiple vector search indexes can be created per table and column. The only requirement is a unique name per index - otherwise, indexes can reference the same column and have the same configuration.

/*
  Vector Index creation on version >= 4.28
  Change: The required DIMENSION argument is now part of the WITH parameters.

  HNSW is the vector index algorithm indicating that an vector index should be created
*/
CREATE INDEX <index_name> ON <table_name> USING HNSW (
  <column_name> <distance_metric>
) WITH (
    <dimension>
    -- optional arguments for the index creation
    [, m = 16]
    [, ef_construction = 128]
    [, quantization = 'bf16']
)

-- Vector Index creation on version 4.27
CREATE INDEX <index_name> ON <table_name> USING HNSW (
  <column_name> <distance_metric>,
  <dimension>
) [ WITH (
    -- optional arguments for the index creation
    [  m = 16]
    [, ef_construction = 128]
    [, quantization = 'bf16']
)]

Parameters

Parameter	Description
`<index_name>`	A unique name for the vector search index.
`<table_name>`	The name of the table on which the index is created.
`<column_name>`	The name of the column that holds the embeddings which should be indexed.
`<distance_metric>`	The distance operation that is used to compute distance between vectors. Supported are: `vector_cosine_ops`, `vector_ip_ops`, `vector_l2sq_ops`. See section distance metric for more information.
`<dimension>`	The number of dimensions in the vector embeddings. This is enforced during ingest.
`m` (optional)	The maximum number of connections per node in the HNSW graph. Default is `16`. See section connectivity for more information.
`ef_construction` (optional)	The size of the dynamic candidate list during index construction. Default is `128`. See section ef_construction for more information.
`quantization` (optional)	The quantization method for compressing vectors. Default is `'bf16'`. Supported are: `'bf16'`, `'f16'`, `'f32'`, `'f64'`. See section quantization for more information.

Firebolt builds vector search indexes per tablet and maintains/recreates these when the table is updated (i.e., insert, update, vacuum, delete), ensuring it stays up-to-date for queries.

Use a vector search index

To use a vector search index, you must explicitly reference it by name via the vector_search() table-valued function (TVF).

SELECT *
FROM vector_search (
  INDEX <index_name>,
  <target_vector_literal>::double[], -- The array literal must be explicitly cast to array(double)!
  <top_k>,
  <ef_search>
);

Parameters

Parameter	Description
`<index_name>`	The name of the index that will be used to find the closets vectors.
`<target_vector_literal>`	The target vector for which the closest vector should be found.
`<top_k>`	The number of closest vectors that are returned from the index. Default is `10`
`<ef_search>`	Hyperparameter controlling the quality of the search. See ef_search for more information.

Drop a vector search index

Dropping a vector search index via DROP INDEX <index_name> is a pure metadata operation and will not free up memory on storage level. We recommend running VACUUM on the table after the index has been dropped.

Alter a vector search index

The only alter operation that is supported on a vector search index is RENAME TO:

ALTER INDEX <current_index_name> RENAME TO <new_index_name>;

Limitations

Creating and using vector search indexes currently has several limitations that are planned to be addressed in future releases.

The embedding column on which the vector search index is created on must be of the following data type: ARRAY([real,float] NOT NULL) NOT NULL.

only real and float/double are supported as the array’s nested type
nullability is not supported, neither inner nor outer nullability

Creating vector search indexes is only supported on empty tables.

the indexes must be created before data is inserted into the table
creating an index on a populated table will fail

Once created, the index configuration (e.g., dimension, ef_construction, etc.) cannot be changed. The index must be dropped and recreated if vector dimensions change.
The target vector in the vector_search() TVF must be an array literal cast to ::DOUBLE[]. E.g., vector_search(INDEX <index_name>, [0.1, 0.2, ..., 0.256]::double[], 10, 16)

Performance & Observability

Achieving optimal performance for vector search queries using vector indexes involves two points of optimization: the index and the table. The index can be configured in different ways to serve better precision or performance - this impacts build time and memory usage. An index optimized for better search performance will result in faster lookup times for the closest K tuples. However, the index only provides row numbers of the tuples that need to be loaded from the table (i.e., pointers to where the data is stored). Optimizing the table for data access is therefore another critical point of optimization, outlined below.

Engine Sizing

For optimal search for performance, it is required that the whole vector index fits into memory. Once a part of an index is loaded into memory, it will be cached and kept in memory to allow fast search time for the next query. To ensure that the index can be kept fully in memory, the engine must be sized properly to have enough main memory. To determine how much memory you need, check the size of the index via information_schema.indexes For optimal performance, it is required that the whole index fits into the in-memory cache which is reserved for vector search indexes. This cache is limited to 70% of the engine’s memory. Therefore, you engine should have roughly 1.5x the index’s size as main memory. E.g., if the index is reported to have a size of 250 GiB, you should choose an engine with at least 350 GiB - a 3M storage optimized engine would be a good choice (3x 128GiB main memory).

SELECT
  index_name,
  index_type,
  uncompressed_bytes, -- the total size of the vector index
  index_definition,
FROM
  information_schema.indexes
WHERE
  index_name = <index_name>;

Once the engine is properly sized and the index is used, the telemetry provides further information to validate that whole index is cached and used from memory. In the telemetry of the query, which is available in information_schema.engine_query_history or via the EXPLAIN (ANALYZE) option, you can observe (1) how many parts of the index were fetched from S3 and (2) how many parts of the index had to be read from disk. For optimal performance, you will want both of these numbers to be 0.

EXPLAIN(ANALYZE) SELECT * from vector_search(...)

  The execution metrics of the index scan contains the relevant telemetry, e.g.,
  [TableFuncScan] read_top_k_closest_vectors [...]
  |   $0 = read_top_k_closest_vectors( [...] )
  |   [Types]: [...]
  |   [Execution Metrics]: [...], index files downloaded from cloud storage: 0/3, index files loaded from disk: 1/3

Distance Metric

Three different distance metrics are available for use in vector search indexes to determine the distance between vectors:

vector_cosine_ops cosine distance
vector_ip_ops inner product
vector_l2sq_ops squared L2 (euclidean) distance

M (Connectivity)

The M parameter during vector search index creation defines the number of edges each vertex in the graph structure has - that is, the number of nearest neighbors that each inserted vector will connect to. Higher M values improve search quality but increase memory usage and index build time. The impact on index search time is minimal.

Memory usage scales approximately linearly with the M factor
Insert time per tuple scales approximately linearly with the M factor. Each insertion requires more comparisons to establish links
A larger M value can make a big difference in recall performance - reduces the chance of search getting trapped in local minima
A larger M value can speed up search as it can allow faster traversal through the graph’s “shortcuts” (this depends on how well the graph is structured during construction)
A larger dataset may require higher M value for expected recall but will incur higher memory costs

EF_CONSTRUCTION

The ef_construction parameter defines the quality of inserts into the index. The higher the value, the more nodes will be explored during the insert to find the nearest neighbors, which leads to a higher-quality graph and better recall.

Increases build time
Memory usage is not affected
Has a negligible effect on search time
A higher EF_CONSTRUCTION value should achieve higher recall at lower M and EF_SEARCH values

EF_SEARCH

The EF_SEARCH parameter defines the quality of search on the index. The higher the value, the more nodes will be explored during the search to find the nearest neighbors, which improves the overall recall performance. It is the only parameter that can be changed after the index is created, as it only applies to the scope of the individual query that uses the index.

Memory usage is not affected
A larger EF_SEARCH value can make a difference in recall performance

Quantization

Quantization is the process of converting high-precision data into a lower-precision, discrete representation. The quantization setting defines which internal, lower-precision, discrete representation the high-precision input data is converted to (e.g., which data type is used in the index to store the vectors). A smaller data type requires less memory but may impact the quality of the index and thus recall performance. This is particularly relevant when vector clusters are very dense, as precision loss in the floating-point representation will decrease recall. Supported types are:

bf16: 16-bit (brain) floating point developed by Google Brain. It is optimized for fast, large-scale numeric tasks where preserving the range is more important than fine-grained precision.
f16: 16-bit floating point
f32: 32-bit floating point, equal to the SQL type real
f64: 64-bit floating point, equal to the SQL type double precision

Table index_granularity

The table’s index_granularity defines the maximum number of rows per granule, which directly impacts how data is retrieved. The default granularity is approximately 8,000 rows per granule. It is likely that the top K closest vectors are not stored in the same granule despite being semantically close to each other. Therefore, decreasing the index_granularity can improve performance. Our experiments have shown that decreasing the index_granularity to 128 resulted in ~50x fewer scanned rows and an overall query performance boost of 40%.

Benchmarks

Vector search indexes trade some precision for significantly faster query performance. The internal lookup structure, that allows the fast search performance, is complex and requires a lot computation be build. This means, inserts into tables which hold a vector index take much more time than without an index. In our testing we experienced the following latencies: Setup:

5M storage optimized engine
435,000,000 embeddings of dimension 256
430 GiB uncompressed data

Latencies:

Insert: ~3 hours (the index is build as part of the insert)
Cold LIMIT 1000 query: ~37 seconds (loads the vector index from cloud storage onto the engine and caches them)
Hot LIMIT 1000 query: ~0.3 seconds (vector indexes are cached)

Examples

Consider a table storing document embeddings generated by a language model:

CREATE TABLE documents (
  id INT,
  title TEXT,
  content TEXT,
  embedding ARRAY(FLOAT NOT NULL) NOT NULL
);

Create a vector search index on the embedding column that enables fast cosine distance search across the 256-dimensional embeddings:

CREATE INDEX doc_embeddings_idx ON documents USING HNSW (
  embedding vector_cosine_ops
) WITH (
    256, -- the dimension of the embedding
    
    -- optional arguments for the index creation
    m = 16, 
    ef_construction = 128, 
    quantization = 'bf16'
);

Next, populate the table with random embeddings of dimension 256. The vector search index will be maintained and updated automatically.

-- This query generates 1000 random embeddings of dimension 256
INSERT INTO
  documents (id, embedding)
SELECT
  x % 1000 as id,
  array_agg(random()) as embedding
FROM
  generate_series(1, 1000 * 256) g (x)
GROUP BY
  id;

Vector search indexes appear in information_schema.indexes together with other types of indexes (e.g., aggregating indexes).

SELECT
  index_name,
  index_type,
  uncompressed_bytes, -- the total size of the vector index
  index_definition,
FROM
  information_schema.indexes
WHERE
  table_name = 'documents';

The index can now be used to perform semantic search to find documents similar to a query. This example finds the 10 most similar documents to a query embedding:

-- Find documents semantically similar to the target vector [0.1, 0.2, ..., 0.256]
SELECT
    id,
    vector_cosine_distance(embedding, [0.1, 0.2, ..., 0.256]) AS distance
FROM
  -- access the index directly via the "vector_search(...)" table value function
  vector_search(
    INDEX doc_embeddings_idx, -- the index that should be used
    [0.1, 0.2, ..., 0.256]::double[], -- the target vector
    10, -- top_k: The top K closest which should be returned
    16 -- ef_search: the quality of search
  )
ORDER BY
  distance;

What is Firebolt?

Overview

Guides

SQL reference

General reference

API reference

Overview

Syntax

Create a vector search index

Parameters

Use a vector search index

Parameters

Drop a vector search index

Alter a vector search index

Limitations

Performance & Observability

Engine Sizing

Distance Metric

M (Connectivity)

EF_CONSTRUCTION

EF_SEARCH

Quantization

Table index_granularity

Benchmarks

Examples

What is Firebolt?

Overview

Guides

SQL reference

General reference

API reference

​Overview

​Syntax

​Create a vector search index

​Parameters

​Use a vector search index

​Parameters

​Drop a vector search index

​Alter a vector search index

​Limitations

​Performance & Observability

​Engine Sizing

​Distance Metric

​M (Connectivity)

​EF_CONSTRUCTION

​EF_SEARCH

​Quantization

​Table index_granularity

​Benchmarks

​Examples

Overview

Syntax

Create a vector search index

Parameters

Use a vector search index

Parameters

Drop a vector search index

Alter a vector search index

Limitations

Performance & Observability

Engine Sizing

Distance Metric

M (Connectivity)

EF_CONSTRUCTION

EF_SEARCH

Quantization

Table index_granularity

Benchmarks

Examples