vectorlite是SQLite的運行時加載擴展,它可以基於HNSWLIB的快速向量搜索,並且可以在Windows,MacOS和Linux上使用。它提供具有SQL接口的快速矢量搜索功能,並使用SQLite驅動程序在每種語言上運行。
有關該項目的動力和背景,請在此處檢查。
以下是在sqlite cli shell中使用它的示例:
-- Load vectorlite
.load path / to / vectorlite.[so|dll|dylib]
-- shows vectorlite version and build info.
select vectorlite_info();
-- Calculate vector l2(squared) distance
select vector_distance(vector_from_json( ' [1,2,3] ' ), vector_from_json( ' [3,4,5] ' ), ' l2 ' );
-- Create a virtual table named my_table with one vector column my_embedding with dimention of 3
create virtual table my_table using vectorlite(my_embedding float32[ 3 ], hnsw(max_elements = 100 ));
-- Insert vectors into my_table. rowid can be used to relate to a vector's metadata stored elsewhere, e.g. another table.
insert into my_table(rowid, my_embedding) values ( 0 , vector_from_json( ' [1,2,3] ' ));
insert into my_table(rowid, my_embedding) values ( 1 , vector_from_json( ' [2,3,4] ' ));
insert into my_table(rowid, my_embedding) values ( 2 , vector_from_json( ' [7,7,7] ' ));
-- Find 2 approximate nearest neighbors of vector [3,4,5] with distances
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [3,4,5] ' ), 2 ));
-- Find the nearest neighbor of vector [3,4,5] among vectors with rowid 0 and 1. (requires sqlite_version>=3.38)
-- It is called metadata filter in vectorlite, because you could get rowid set beforehand based on vectors' metadata and then perform vector search.
-- Metadata filter is pushed down to the underlying index when traversing the HNSW graph.
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [3,4,5] ' ), 1 )) and rowid in ( 0 , 1 ) ;
當前,VectorLite是針對Windows-X64,Linux-X64,MacOS-X64,MacOS-ARM64預編譯的,並以Python Wheels和NPM軟件包的形式分發。可以簡單地安裝:
# For python
pip install vectorlite-py
# for nodejs
npm i vectorlite對於其他語言, vectorlite.[so|dll|dylib]可以從輪子中為您的平台提取,因為 *.WHL文件實際上是zip存檔。
VectorLite目前正在Beta中。可能會發生破壞的變化。
vector_from_json()和vector_to_json()向量JSON SERDE支持。 vectorlite提供以下API。請注意,VectorLite目前正在Beta中。可能會發生破壞的變化。
以下功能可以在任何情況下使用。
vectorlite_info() -- prints version info and some compile time info. e.g. Is SSE, AVX enabled.
vector_from_json(json_string) -- converts a json array of type TEXT into BLOB(a c-style float32 array)
vector_to_json(vector_blob) -- converts a vector of type BLOB(c-style float32 array) into a json array of type TEXT
vector_distance(vector_blob1, vector_blob2, distance_type_str) -- calculate vector distance between two vectors, distance_type_str could be 'l2', 'cosine', 'ip' 實際上,可以使用vector_distance輕鬆實施蠻力搜索,從而返回100%準確的搜索結果:
-- use a normal sqlite table
create table my_table (rowid integer primary key , embedding blob);
-- insert
insert into my_table(rowid, embedding) values ( 0 , {your_embedding});
-- search for 10 nearest neighbors using l2 squared distance
select rowid from my_table order by vector_distance({query_vector}, embedding, ' l2 ' ) asc limit 10
Vectorlite的核心是虛擬表模塊,該模塊用於保持矢量索引和比蠻力方法更快的速度,而成本不准確。可以使用:
-- Required fields: table_name, vector_name, dimension, max_elements
-- Optional fields:
-- 1. distance_type: defaults to l2
-- 2. ef_construction: defaults to 200
-- 3. M: defaults to 16
-- 4. random_seed: defaults to 100
-- 5. allow_replace_deleted: defaults to true
-- 6. index_file_path: no default value. If not provided, the table will be memory-only. If provided, vectorlite will try to load index from the file and save to it when db connection is closed.
create virtual table {table_name} using vectorlite({vector_name} float32[{dimension}] {distance_type}, hnsw(max_elements = {max_elements}, {ef_construction = 200 }, {M = 16 }, {random_seed = 100 }, {allow_replace_deleted = true}), {index_file_path});您可以插入,更新並刪除矢量光錶,就像它是普通的SQLite表一樣。
-- rowid is required during insertion, because rowid is used to connect the vector to its metadata stored elsewhere. Auto-generating rowid doesn't makes sense.
insert into my_vectorlite_table(rowid, vector_name) values ({your_rowid}, {vector_blob});
-- Note: update and delete statements that uses rowid filter require sqlite3_version >= 3.38 to run.
update my_vectorlite_table set vector_name = {new_vector_blob} where rowid = {your_rowid};
delete from my_vectorlite_table where rowid = {your_rowid};僅在查詢vectorlite表時才使用以下功能
-- returns knn_parameter that will be passed to knn_search().
-- vector_blob: vector to search
-- k: how many nearest neighbors to search for
-- ef: optional. A HNSW parameter that controls speed-accuracy trade-off. Defaults to 10 at first. If set to another value x, it will remain x if not specified again in another query within a single db connection.
knn_param(vector_blob, k, ef)
-- Should only be used in the `where clause` in a `select` statement to tell vectorlite to speed up the query using HNSW index
-- vector_name should match the vectorlite table's definition
-- knn_parameter is usually constructed using knn_param()
knn_search(vector_name, knn_parameter)
-- An example of vector search query. `distance` is an implicit column of a vectorlite table.
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k}))
-- An example of vector search query with pushed-down metadata(rowid) filter, requires sqlite_version >= 3.38 to run.
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k})) and rowid in ( 1 , 2 , 3 , 4 , 5 )請注意,僅使用小型數據集(帶有3000或20000個向量),因為使用較大的數據集對SQLite-VEC進行基準測試是不公平的。 SQLite-VEC僅使用蠻力,該蠻力不會與大數據集進行擴展,而vectorlite使用ANN(大約最近的鄰居),該數據以不准確的成本為不准確的成本將其縮放到大型數據集中。
基準如何完成:
select rowid from my_table order by vector_distance(query_vector, embedding, 'l2') limit 10是基於基礎的基線,以查看hnsw speed速度向量Query Query specte Query Query。tl; dr:
可以在基准文件夾中找到基準代碼,該文件夾可以用作如何通過調整HNSW參數來提高場景的召回率的示例。
處理3000個向量時(這是一個相當小的數據集):


Using local vectorlite: ../build/release/vectorlite/vectorlite.so
Benchmarking using 3000 randomly vectors. 100 10-nearest neighbor queries will be performed on each case.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 62.41 us │ 12.96 us │ 56.40% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 62.41 us │ 42.95 us │ 93.30% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 62.41 us │ 62.06 us │ 99.40% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 146.40 us │ 38.05 us │ 46.60% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 146.40 us │ 95.96 us │ 86.50% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 146.40 us │ 148.46 us │ 96.70% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 463.56 us │ 124.51 us │ 38.10% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 463.56 us │ 355.70 us │ 78.50% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 463.56 us │ 547.84 us │ 92.70% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 1323.25 us │ 391.57 us │ 36.60% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 1323.25 us │ 1041.37 us │ 78.60% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 1323.25 us │ 1443.10 us │ 93.10% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 59.75 us │ 15.27 us │ 58.30% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 59.75 us │ 36.72 us │ 94.60% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 59.75 us │ 63.67 us │ 99.30% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 148.19 us │ 36.98 us │ 51.00% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 148.19 us │ 102.46 us │ 88.10% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 148.19 us │ 143.41 us │ 96.90% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 427.21 us │ 106.94 us │ 42.10% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 427.21 us │ 285.50 us │ 83.30% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 427.21 us │ 441.66 us │ 95.60% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 970.17 us │ 289.00 us │ 42.20% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 970.17 us │ 848.03 us │ 83.90% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 970.17 us │ 1250.29 us │ 95.60% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk hnswlib as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 12.84 us │ 12.83 us │ 56.90% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 12.84 us │ 41.93 us │ 93.60% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 12.84 us │ 65.84 us │ 99.40% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 29.34 us │ 47.37 us │ 47.00% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 29.34 us │ 126.29 us │ 86.40% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 29.34 us │ 198.30 us │ 96.80% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 90.05 us │ 149.35 us │ 37.20% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 90.05 us │ 431.53 us │ 78.00% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 90.05 us │ 765.03 us │ 92.50% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 388.87 us │ 708.98 us │ 36.30% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 388.87 us │ 1666.87 us │ 78.90% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 388.87 us │ 2489.98 us │ 93.40% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 10.90 us │ 11.14 us │ 58.10% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 10.90 us │ 37.39 us │ 94.30% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 10.90 us │ 62.45 us │ 99.40% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 25.46 us │ 38.92 us │ 50.70% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 25.46 us │ 109.84 us │ 87.90% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 25.46 us │ 151.00 us │ 97.10% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 77.53 us │ 119.48 us │ 42.00% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 77.53 us │ 340.78 us │ 84.00% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 77.53 us │ 510.02 us │ 95.50% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 234.79 us │ 453.12 us │ 43.20% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 234.79 us │ 1380.79 us │ 83.80% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 234.79 us │ 1520.92 us │ 95.70% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk vectorlite brute force(select rowid from my_table order by vector_distance(query_vector, embedding, 'l2')) as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 2.38 us │ 299.14 us │ 100.00% │
│ l2 │ 512 │ 3.69 us │ 571.19 us │ 100.00% │
│ l2 │ 1536 │ 4.86 us │ 2237.64 us │ 100.00% │
│ l2 │ 3000 │ 7.69 us │ 5135.63 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Bencharmk sqlite_vss as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 395.24 us │ 2508.52 us │ 99.90% │
│ l2 │ 512 │ 2824.89 us │ 1530.77 us │ 100.00% │
│ l2 │ 1536 │ 8931.72 us │ 1602.36 us │ 100.00% │
│ l2 │ 3000 │ 17498.60 us │ 3142.38 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Bencharmk sqlite_vec as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 10.21 us │ 202.05 us │ 100.00% │
│ l2 │ 512 │ 14.43 us │ 989.64 us │ 100.00% │
│ l2 │ 1536 │ 31.68 us │ 3856.08 us │ 100.00% │
│ l2 │ 3000 │ 59.94 us │ 9503.91 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
在與20000個向量打交道時
請注意:


Using local vectorlite: ../build/release/vectorlite/vectorlite.so
Benchmarking using 20000 randomly vectors. 100 10-neariest neighbor queries will be performed on each case.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 187.41 us │ 46.58 us │ 29.10% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 187.41 us │ 95.16 us │ 70.20% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 187.41 us │ 179.51 us │ 85.70% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 820.80 us │ 105.80 us │ 18.10% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 820.80 us │ 361.83 us │ 50.40% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 820.80 us │ 628.88 us │ 67.00% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 2665.31 us │ 292.39 us │ 13.70% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 2665.31 us │ 1069.47 us │ 42.40% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 2665.31 us │ 1744.79 us │ 59.50% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 5236.76 us │ 558.56 us │ 13.80% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 5236.76 us │ 1787.83 us │ 39.30% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 5236.76 us │ 3039.94 us │ 56.60% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 164.31 us │ 25.35 us │ 34.70% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 164.31 us │ 78.33 us │ 71.20% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 164.31 us │ 133.75 us │ 87.60% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 711.35 us │ 100.90 us │ 19.00% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 711.35 us │ 406.08 us │ 51.10% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 711.35 us │ 582.51 us │ 71.50% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 2263.96 us │ 283.88 us │ 22.60% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 2263.96 us │ 919.98 us │ 54.50% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 2263.96 us │ 1674.77 us │ 72.40% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 4541.09 us │ 566.31 us │ 19.80% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 4541.09 us │ 1672.82 us │ 49.30% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 4541.09 us │ 2855.43 us │ 65.40% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk hnswlib as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 23.06 us │ 39.96 us │ 29.60% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 23.06 us │ 75.02 us │ 69.80% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 23.06 us │ 160.01 us │ 85.40% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 146.58 us │ 167.31 us │ 18.10% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 146.58 us │ 392.12 us │ 50.80% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 146.58 us │ 781.50 us │ 67.20% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 657.41 us │ 298.71 us │ 12.70% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 657.41 us │ 1031.61 us │ 40.60% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 657.41 us │ 1764.34 us │ 57.90% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 1842.77 us │ 852.88 us │ 13.80% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 1842.77 us │ 2905.57 us │ 39.60% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 1842.77 us │ 4936.35 us │ 56.50% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 19.25 us │ 23.27 us │ 34.20% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 19.25 us │ 72.66 us │ 71.40% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 19.25 us │ 134.11 us │ 87.60% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 112.80 us │ 106.90 us │ 22.70% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 112.80 us │ 341.23 us │ 54.20% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 112.80 us │ 609.93 us │ 72.40% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 615.04 us │ 268.00 us │ 22.50% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 615.04 us │ 898.82 us │ 54.00% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 615.04 us │ 1557.51 us │ 71.90% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 1425.49 us │ 546.18 us │ 20.60% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 1425.49 us │ 2008.53 us │ 49.20% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 1425.49 us │ 3106.51 us │ 65.00% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk vectorlite brute force(select rowid from my_table order by vector_distance(query_vector, embedding, 'l2')) as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 0.93 us │ 2039.69 us │ 100.00% │
│ l2 │ 512 │ 2.73 us │ 7177.23 us │ 100.00% │
│ l2 │ 1536 │ 4.64 us │ 17163.25 us │ 100.00% │
│ l2 │ 3000 │ 6.62 us │ 25378.79 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Bencharmk sqlite_vec as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 3.49 us │ 1560.17 us │ 100.00% │
│ l2 │ 512 │ 6.73 us │ 7778.39 us │ 100.00% │
│ l2 │ 1536 │ 17.13 us │ 26344.76 us │ 100.00% │
│ l2 │ 3000 │ 35.30 us │ 60652.58 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
最快入門的方法是使用Python安裝矢量線。
# Note: vectorlite-py not vectorlite. vectorlite is another project.
pip install vectorlite-py apsw numpy vectorlite的元數據過濾器需要sqlite> = 3.38。 Python的內置sqlite模塊通常使用舊的SQLite版本構建。因此,此處將apsw用作SQLITE驅動程序,因為它提供了與最新SQLite的綁定。如果不需要元數據過濾器支持,則vectorlite仍然可以使用舊的SQLite版本。以下是使用vectorlite的最小示例。也可以在示例文件夾中找到。
import vectorlite_py
import apsw
import numpy as np
"""
Quick start of using vectorlite extension.
"""
conn = apsw . Connection ( ':memory:' )
conn . enable_load_extension ( True ) # enable extension loading
conn . load_extension ( vectorlite_py . vectorlite_path ()) # load vectorlite
cursor = conn . cursor ()
# check if vectorlite is loaded
print ( cursor . execute ( 'select vectorlite_info()' ). fetchall ())
# Vector distance calculation
for distance_type in [ 'l2' , 'cosine' , 'ip' ]:
v1 = "[1, 2, 3]"
v2 = "[4, 5, 6]"
# Note vector_from_json can be used to convert a JSON string to a vector
distance = cursor . execute ( f'select vector_distance(vector_from_json(?), vector_from_json(?), " { distance_type } ")' , ( v1 , v2 )). fetchone ()
print ( f' { distance_type } distance between { v1 } and { v2 } is { distance [ 0 ] } ' )
# generate some test data
DIM = 32 # dimension of the vectors
NUM_ELEMENTS = 10000 # number of vectors
data = np . float32 ( np . random . random (( NUM_ELEMENTS , DIM ))) # Only float32 vectors are supported by vectorlite for now
# Create a virtual table using vectorlite using l2 distance (default distance type) and default HNSW parameters
cursor . execute ( f'create virtual table my_table using vectorlite(my_embedding float32[ { DIM } ], hnsw(max_elements= { NUM_ELEMENTS } ))' )
# Vector distance type can be explicitly set to cosine using:
# cursor.execute(f'create virtual table my_table using vectorlite(my_embedding float32[{DIM}] cosine, hnsw(max_elements={NUM_ELEMENTS}))')
# Insert the test data into the virtual table. Note that the rowid MUST be explicitly set when inserting vectors and cannot be auto-generated.
# The rowid is used to uniquely identify a vector and serve as a "foreign key" to relate to the vector's metadata.
# Vectorlite takes vectors in raw bytes, so a numpy vector need to be converted to bytes before inserting into the table.
cursor . executemany ( 'insert into my_table(rowid, my_embedding) values (?, ?)' , [( i , data [ i ]. tobytes ()) for i in range ( NUM_ELEMENTS )])
# Query the virtual table to get the vector at rowid 12345. Note the vector needs to be converted back to json using vector_to_json() to be human-readable.
result = cursor . execute ( 'select vector_to_json(my_embedding) from my_table where rowid = 1234' ). fetchone ()
print ( f'vector at rowid 1234: { result [ 0 ] } ' )
# Find 10 approximate nearest neighbors of data[0] and there distances from data[0].
# knn_search() is used to tell vectorlite to do a vector search.
# knn_param(V, K, ef) is used to pass the query vector V, the number of nearest neighbors K to find and an optional ef parameter to tune the performance of the search.
# If ef is not specified, ef defaults to 10. For more info on ef, please check https://github.com/nmslib/hnswlib/blob/v0.8.0/ALGO_PARAMS.md
result = cursor . execute ( 'select rowid, distance from my_table where knn_search(my_embedding, knn_param(?, 10))' , [ data [ 0 ]. tobytes ()]). fetchall ()
print ( f'10 nearest neighbors of row 0 is { result } ' )
# Find 10 approximate nearest neighbors of the first embedding in vectors with rowid within [1000, 2000) using metadata(rowid) filtering.
rowids = ',' . join ([ str ( rowid ) for rowid in range ( 1000 , 2000 )])
result = cursor . execute ( f'select rowid, distance from my_table where knn_search(my_embedding, knn_param(?, 10)) and rowid in ( { rowids } )' , [ data [ 0 ]. tobytes ()]). fetchall ()
print ( f'10 nearest neighbors of row 0 in vectors with rowid within [1000, 2000) is { result } ' )
conn . close ()可以在示例和綁定/python/vectorlite_py/test文件夾中找到更多示例。
如果您想為自己的平台貢獻或編譯矢量線,則可以按照以下說明進行構建。
git clone --recurse-submodules [email protected]:1yefuwang1/vectorlite.git
python3 bootstrap_vcpkg.py
# install dependencies for running python tests
python3 -m pip install -r requirements-dev.txt
sh build.sh # for debug build
sh build_release.sh # for release build
vecorlite.[so|dll|dylib]可以在build/release/vectorlite或build/dev/vectorlite文件夾中找到
python3 -m build -w
vectorlite_py車輪可以在dist文件夾中找到
select rowid from my_table where rowid in ( 1 , 2 , 3 ) and rowid in ( 2 , 3 , 4 ) # multiple rowid constraints
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) and rowid in ( 1 , 2 , 3 ) and rowid in ( 3 , 4 ) # multiple rowid constraints
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) and knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) # multiple vector constraints
但是,可以將多個約束與or組合,因為查詢將多次搜索基礎HNSW索引。結果將由sqlite結合。以下查詢將起作用。
select rowid from my_table where rowid in ( 1 , 2 , 3 ) or rowid in ( 2 , 3 , 4 )
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) and rowid in ( 1 , 2 , 3 ) or rowid in ( 3 , 4 )
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) or knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) [0, min(max value of size_t, max value of int64_t)] 。原因是ROWID在HNSW索引中用作labeltype ,該索引具有類型size_t (通常取決於平台為32位或64位)。sqlite模塊通常是由舊版本構建的。如果要使用元數據過濾器,請使用較新的SQLITE綁定,例如apsw 。沒有Rowid Fitler的Knn_Search()仍然適用於舊的SQLite3。該項目受到以下項目的極大啟發