VectorLite adalah ekstensi yang dapat dimuat runtime untuk SQLite yang memungkinkan pencarian vektor cepat berdasarkan HNSWLIB dan berfungsi pada Windows, MacOS dan Linux. Ini memberikan kemampuan pencarian vektor cepat dengan antarmuka SQL dan berjalan pada setiap bahasa dengan driver SQLite.
Untuk motivasi dan latar belakang proyek ini, silakan periksa di sini.
Di bawah ini adalah contoh menggunakannya di SQLite CLI Shell:
-- Load vectorlite
.load path / to / vectorlite.[so|dll|dylib]
-- shows vectorlite version and build info.
select vectorlite_info();
-- Calculate vector l2(squared) distance
select vector_distance(vector_from_json( ' [1,2,3] ' ), vector_from_json( ' [3,4,5] ' ), ' l2 ' );
-- Create a virtual table named my_table with one vector column my_embedding with dimention of 3
create virtual table my_table using vectorlite(my_embedding float32[ 3 ], hnsw(max_elements = 100 ));
-- Insert vectors into my_table. rowid can be used to relate to a vector's metadata stored elsewhere, e.g. another table.
insert into my_table(rowid, my_embedding) values ( 0 , vector_from_json( ' [1,2,3] ' ));
insert into my_table(rowid, my_embedding) values ( 1 , vector_from_json( ' [2,3,4] ' ));
insert into my_table(rowid, my_embedding) values ( 2 , vector_from_json( ' [7,7,7] ' ));
-- Find 2 approximate nearest neighbors of vector [3,4,5] with distances
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [3,4,5] ' ), 2 ));
-- Find the nearest neighbor of vector [3,4,5] among vectors with rowid 0 and 1. (requires sqlite_version>=3.38)
-- It is called metadata filter in vectorlite, because you could get rowid set beforehand based on vectors' metadata and then perform vector search.
-- Metadata filter is pushed down to the underlying index when traversing the HNSW graph.
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [3,4,5] ' ), 1 )) and rowid in ( 0 , 1 ) ;
Saat ini, Vectorlite telah dikompilasi sebelumnya untuk Windows-X64, Linux-X64, MacOS-X64, MacOS-Arm64 dan didistribusikan sebagai roda Python dan paket NPM. Itu dapat diinstal hanya dengan:
# For python
pip install vectorlite-py
# for nodejs
npm i vectorlite Untuk bahasa lain, vectorlite.[so|dll|dylib] dapat diekstraksi dari roda untuk platform Anda, mengingat bahwa file *.whl sebenarnya adalah arsip zip.
Vectorlite saat ini dalam beta. Mungkin ada perubahan.
vector_from_json() dan vector_to_json() . Vectorlite menyediakan API berikut. Harap perhatikan Vectorlite saat ini dalam beta. Mungkin ada perubahan.
Fungsi berikut dapat digunakan dalam konteks apa pun.
vectorlite_info() -- prints version info and some compile time info. e.g. Is SSE, AVX enabled.
vector_from_json(json_string) -- converts a json array of type TEXT into BLOB(a c-style float32 array)
vector_to_json(vector_blob) -- converts a vector of type BLOB(c-style float32 array) into a json array of type TEXT
vector_distance(vector_blob1, vector_blob2, distance_type_str) -- calculate vector distance between two vectors, distance_type_str could be 'l2', 'cosine', 'ip' Bahkan, seseorang dapat dengan mudah menerapkan pencarian brute force menggunakan vector_distance , yang mengembalikan 100% hasil pencarian yang akurat:
-- use a normal sqlite table
create table my_table (rowid integer primary key , embedding blob);
-- insert
insert into my_table(rowid, embedding) values ( 0 , {your_embedding});
-- search for 10 nearest neighbors using l2 squared distance
select rowid from my_table order by vector_distance({query_vector}, embedding, ' l2 ' ) asc limit 10
Inti dari VectorLite adalah modul tabel virtual, yang digunakan untuk memegang indeks vektor dan jauh lebih cepat daripada pendekatan brute force dengan biaya tidak 100% akurat. Tabel Vectorlite dapat dibuat menggunakan:
-- Required fields: table_name, vector_name, dimension, max_elements
-- Optional fields:
-- 1. distance_type: defaults to l2
-- 2. ef_construction: defaults to 200
-- 3. M: defaults to 16
-- 4. random_seed: defaults to 100
-- 5. allow_replace_deleted: defaults to true
-- 6. index_file_path: no default value. If not provided, the table will be memory-only. If provided, vectorlite will try to load index from the file and save to it when db connection is closed.
create virtual table {table_name} using vectorlite({vector_name} float32[{dimension}] {distance_type}, hnsw(max_elements = {max_elements}, {ef_construction = 200 }, {M = 16 }, {random_seed = 100 }, {allow_replace_deleted = true}), {index_file_path});Anda dapat memasukkan, memperbarui, dan menghapus tabel VectorLite seolah -olah itu adalah tabel SQLite normal.
-- rowid is required during insertion, because rowid is used to connect the vector to its metadata stored elsewhere. Auto-generating rowid doesn't makes sense.
insert into my_vectorlite_table(rowid, vector_name) values ({your_rowid}, {vector_blob});
-- Note: update and delete statements that uses rowid filter require sqlite3_version >= 3.38 to run.
update my_vectorlite_table set vector_name = {new_vector_blob} where rowid = {your_rowid};
delete from my_vectorlite_table where rowid = {your_rowid};Fungsi berikut hanya boleh digunakan saat menanyakan tabel vektorlit
-- returns knn_parameter that will be passed to knn_search().
-- vector_blob: vector to search
-- k: how many nearest neighbors to search for
-- ef: optional. A HNSW parameter that controls speed-accuracy trade-off. Defaults to 10 at first. If set to another value x, it will remain x if not specified again in another query within a single db connection.
knn_param(vector_blob, k, ef)
-- Should only be used in the `where clause` in a `select` statement to tell vectorlite to speed up the query using HNSW index
-- vector_name should match the vectorlite table's definition
-- knn_parameter is usually constructed using knn_param()
knn_search(vector_name, knn_parameter)
-- An example of vector search query. `distance` is an implicit column of a vectorlite table.
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k}))
-- An example of vector search query with pushed-down metadata(rowid) filter, requires sqlite_version >= 3.38 to run.
select rowid, distance from my_vectorlite_table where knn_search(vector_name, knn_param({vector_blob}, {k})) and rowid in ( 1 , 2 , 3 , 4 , 5 )Harap dicatat hanya kumpulan data kecil (dengan vektor 3000 atau 20000) yang digunakan karena tidak adil untuk membandingkan SQLite-VEC menggunakan dataset yang lebih besar. SQLite-VEC hanya menggunakan brute-force, yang tidak skala dengan set data besar, sementara Vectorlite menggunakan Ann (perkiraan tetangga terdekat), yang skala ke set data besar dengan biaya tidak 100% akurat.
Bagaimana tolok ukur dilakukan:
select rowid from my_table order by vector_distance(query_vector, embedding, 'l2') limit 10 ) dibandingkan sebagai garis dasar untuk melihat berapa banyak hnsw mempercepat query vektor.TL; DR:
Kode benchmark dapat ditemukan di folder benchmark, yang dapat digunakan sebagai contoh bagaimana meningkatkan laju penarikan untuk skenario Anda dengan menyetel parameter HNSW.
Saat berhadapan dengan 3000 vektor (yang merupakan dataset yang cukup kecil):


Using local vectorlite: ../build/release/vectorlite/vectorlite.so
Benchmarking using 3000 randomly vectors. 100 10-nearest neighbor queries will be performed on each case.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 62.41 us │ 12.96 us │ 56.40% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 62.41 us │ 42.95 us │ 93.30% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 62.41 us │ 62.06 us │ 99.40% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 146.40 us │ 38.05 us │ 46.60% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 146.40 us │ 95.96 us │ 86.50% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 146.40 us │ 148.46 us │ 96.70% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 463.56 us │ 124.51 us │ 38.10% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 463.56 us │ 355.70 us │ 78.50% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 463.56 us │ 547.84 us │ 92.70% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 1323.25 us │ 391.57 us │ 36.60% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 1323.25 us │ 1041.37 us │ 78.60% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 1323.25 us │ 1443.10 us │ 93.10% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 59.75 us │ 15.27 us │ 58.30% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 59.75 us │ 36.72 us │ 94.60% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 59.75 us │ 63.67 us │ 99.30% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 148.19 us │ 36.98 us │ 51.00% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 148.19 us │ 102.46 us │ 88.10% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 148.19 us │ 143.41 us │ 96.90% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 427.21 us │ 106.94 us │ 42.10% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 427.21 us │ 285.50 us │ 83.30% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 427.21 us │ 441.66 us │ 95.60% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 970.17 us │ 289.00 us │ 42.20% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 970.17 us │ 848.03 us │ 83.90% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 970.17 us │ 1250.29 us │ 95.60% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk hnswlib as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 12.84 us │ 12.83 us │ 56.90% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 12.84 us │ 41.93 us │ 93.60% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 12.84 us │ 65.84 us │ 99.40% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 29.34 us │ 47.37 us │ 47.00% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 29.34 us │ 126.29 us │ 86.40% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 29.34 us │ 198.30 us │ 96.80% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 90.05 us │ 149.35 us │ 37.20% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 90.05 us │ 431.53 us │ 78.00% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 90.05 us │ 765.03 us │ 92.50% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 388.87 us │ 708.98 us │ 36.30% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 388.87 us │ 1666.87 us │ 78.90% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 388.87 us │ 2489.98 us │ 93.40% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 10.90 us │ 11.14 us │ 58.10% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 10.90 us │ 37.39 us │ 94.30% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 10.90 us │ 62.45 us │ 99.40% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 25.46 us │ 38.92 us │ 50.70% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 25.46 us │ 109.84 us │ 87.90% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 25.46 us │ 151.00 us │ 97.10% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 77.53 us │ 119.48 us │ 42.00% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 77.53 us │ 340.78 us │ 84.00% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 77.53 us │ 510.02 us │ 95.50% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 234.79 us │ 453.12 us │ 43.20% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 234.79 us │ 1380.79 us │ 83.80% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 234.79 us │ 1520.92 us │ 95.70% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk vectorlite brute force(select rowid from my_table order by vector_distance(query_vector, embedding, 'l2')) as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 2.38 us │ 299.14 us │ 100.00% │
│ l2 │ 512 │ 3.69 us │ 571.19 us │ 100.00% │
│ l2 │ 1536 │ 4.86 us │ 2237.64 us │ 100.00% │
│ l2 │ 3000 │ 7.69 us │ 5135.63 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Bencharmk sqlite_vss as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 395.24 us │ 2508.52 us │ 99.90% │
│ l2 │ 512 │ 2824.89 us │ 1530.77 us │ 100.00% │
│ l2 │ 1536 │ 8931.72 us │ 1602.36 us │ 100.00% │
│ l2 │ 3000 │ 17498.60 us │ 3142.38 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Bencharmk sqlite_vec as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 10.21 us │ 202.05 us │ 100.00% │
│ l2 │ 512 │ 14.43 us │ 989.64 us │ 100.00% │
│ l2 │ 1536 │ 31.68 us │ 3856.08 us │ 100.00% │
│ l2 │ 3000 │ 59.94 us │ 9503.91 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Saat berhadapan dengan vektor 20000,
Harap dicatat:


Using local vectorlite: ../build/release/vectorlite/vectorlite.so
Benchmarking using 20000 randomly vectors. 100 10-neariest neighbor queries will be performed on each case.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 187.41 us │ 46.58 us │ 29.10% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 187.41 us │ 95.16 us │ 70.20% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 187.41 us │ 179.51 us │ 85.70% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 820.80 us │ 105.80 us │ 18.10% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 820.80 us │ 361.83 us │ 50.40% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 820.80 us │ 628.88 us │ 67.00% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 2665.31 us │ 292.39 us │ 13.70% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 2665.31 us │ 1069.47 us │ 42.40% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 2665.31 us │ 1744.79 us │ 59.50% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 5236.76 us │ 558.56 us │ 13.80% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 5236.76 us │ 1787.83 us │ 39.30% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 5236.76 us │ 3039.94 us │ 56.60% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 164.31 us │ 25.35 us │ 34.70% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 164.31 us │ 78.33 us │ 71.20% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 164.31 us │ 133.75 us │ 87.60% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 711.35 us │ 100.90 us │ 19.00% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 711.35 us │ 406.08 us │ 51.10% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 711.35 us │ 582.51 us │ 71.50% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 2263.96 us │ 283.88 us │ 22.60% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 2263.96 us │ 919.98 us │ 54.50% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 2263.96 us │ 1674.77 us │ 72.40% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 4541.09 us │ 566.31 us │ 19.80% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 4541.09 us │ 1672.82 us │ 49.30% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 4541.09 us │ 2855.43 us │ 65.40% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk hnswlib as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┓
┃ distance ┃ vector ┃ ef ┃ ┃ ef ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ construction ┃ M ┃ search ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━┩
│ l2 │ 128 │ 100 │ 30 │ 10 │ 23.06 us │ 39.96 us │ 29.60% │
│ l2 │ 128 │ 100 │ 30 │ 50 │ 23.06 us │ 75.02 us │ 69.80% │
│ l2 │ 128 │ 100 │ 30 │ 100 │ 23.06 us │ 160.01 us │ 85.40% │
│ l2 │ 512 │ 100 │ 30 │ 10 │ 146.58 us │ 167.31 us │ 18.10% │
│ l2 │ 512 │ 100 │ 30 │ 50 │ 146.58 us │ 392.12 us │ 50.80% │
│ l2 │ 512 │ 100 │ 30 │ 100 │ 146.58 us │ 781.50 us │ 67.20% │
│ l2 │ 1536 │ 100 │ 30 │ 10 │ 657.41 us │ 298.71 us │ 12.70% │
│ l2 │ 1536 │ 100 │ 30 │ 50 │ 657.41 us │ 1031.61 us │ 40.60% │
│ l2 │ 1536 │ 100 │ 30 │ 100 │ 657.41 us │ 1764.34 us │ 57.90% │
│ l2 │ 3000 │ 100 │ 30 │ 10 │ 1842.77 us │ 852.88 us │ 13.80% │
│ l2 │ 3000 │ 100 │ 30 │ 50 │ 1842.77 us │ 2905.57 us │ 39.60% │
│ l2 │ 3000 │ 100 │ 30 │ 100 │ 1842.77 us │ 4936.35 us │ 56.50% │
│ cosine │ 128 │ 100 │ 30 │ 10 │ 19.25 us │ 23.27 us │ 34.20% │
│ cosine │ 128 │ 100 │ 30 │ 50 │ 19.25 us │ 72.66 us │ 71.40% │
│ cosine │ 128 │ 100 │ 30 │ 100 │ 19.25 us │ 134.11 us │ 87.60% │
│ cosine │ 512 │ 100 │ 30 │ 10 │ 112.80 us │ 106.90 us │ 22.70% │
│ cosine │ 512 │ 100 │ 30 │ 50 │ 112.80 us │ 341.23 us │ 54.20% │
│ cosine │ 512 │ 100 │ 30 │ 100 │ 112.80 us │ 609.93 us │ 72.40% │
│ cosine │ 1536 │ 100 │ 30 │ 10 │ 615.04 us │ 268.00 us │ 22.50% │
│ cosine │ 1536 │ 100 │ 30 │ 50 │ 615.04 us │ 898.82 us │ 54.00% │
│ cosine │ 1536 │ 100 │ 30 │ 100 │ 615.04 us │ 1557.51 us │ 71.90% │
│ cosine │ 3000 │ 100 │ 30 │ 10 │ 1425.49 us │ 546.18 us │ 20.60% │
│ cosine │ 3000 │ 100 │ 30 │ 50 │ 1425.49 us │ 2008.53 us │ 49.20% │
│ cosine │ 3000 │ 100 │ 30 │ 100 │ 1425.49 us │ 3106.51 us │ 65.00% │
└──────────┴───────────┴──────────────┴────┴────────┴─────────────┴─────────────┴────────┘
Bencharmk vectorlite brute force(select rowid from my_table order by vector_distance(query_vector, embedding, 'l2')) as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 0.93 us │ 2039.69 us │ 100.00% │
│ l2 │ 512 │ 2.73 us │ 7177.23 us │ 100.00% │
│ l2 │ 1536 │ 4.64 us │ 17163.25 us │ 100.00% │
│ l2 │ 3000 │ 6.62 us │ 25378.79 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Bencharmk sqlite_vec as comparison.
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ distance ┃ vector ┃ insert_time ┃ search_time ┃ recall ┃
┃ type ┃ dimension ┃ per vector ┃ per query ┃ rate ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━┩
│ l2 │ 128 │ 3.49 us │ 1560.17 us │ 100.00% │
│ l2 │ 512 │ 6.73 us │ 7778.39 us │ 100.00% │
│ l2 │ 1536 │ 17.13 us │ 26344.76 us │ 100.00% │
│ l2 │ 3000 │ 35.30 us │ 60652.58 us │ 100.00% │
└──────────┴───────────┴─────────────┴─────────────┴─────────┘
Cara tercepat untuk memulai adalah dengan menginstal Vectorlite menggunakan Python.
# Note: vectorlite-py not vectorlite. vectorlite is another project.
pip install vectorlite-py apsw numpy Fitur filter metadata VectorLite membutuhkan sqlite> = 3.38. Modul sqlite Builtin Python biasanya dibangun dengan versi SQLite lama. Jadi apsw digunakan di sini sebagai driver SQLite, karena memberikan binding ke SQLite terbaru. Vectorlite masih berfungsi dengan versi SQLite lama jika dukungan filter metadata tidak diperlukan. Di bawah ini adalah contoh minimal menggunakan vectorlite. Itu juga dapat ditemukan di folder Contoh.
import vectorlite_py
import apsw
import numpy as np
"""
Quick start of using vectorlite extension.
"""
conn = apsw . Connection ( ':memory:' )
conn . enable_load_extension ( True ) # enable extension loading
conn . load_extension ( vectorlite_py . vectorlite_path ()) # load vectorlite
cursor = conn . cursor ()
# check if vectorlite is loaded
print ( cursor . execute ( 'select vectorlite_info()' ). fetchall ())
# Vector distance calculation
for distance_type in [ 'l2' , 'cosine' , 'ip' ]:
v1 = "[1, 2, 3]"
v2 = "[4, 5, 6]"
# Note vector_from_json can be used to convert a JSON string to a vector
distance = cursor . execute ( f'select vector_distance(vector_from_json(?), vector_from_json(?), " { distance_type } ")' , ( v1 , v2 )). fetchone ()
print ( f' { distance_type } distance between { v1 } and { v2 } is { distance [ 0 ] } ' )
# generate some test data
DIM = 32 # dimension of the vectors
NUM_ELEMENTS = 10000 # number of vectors
data = np . float32 ( np . random . random (( NUM_ELEMENTS , DIM ))) # Only float32 vectors are supported by vectorlite for now
# Create a virtual table using vectorlite using l2 distance (default distance type) and default HNSW parameters
cursor . execute ( f'create virtual table my_table using vectorlite(my_embedding float32[ { DIM } ], hnsw(max_elements= { NUM_ELEMENTS } ))' )
# Vector distance type can be explicitly set to cosine using:
# cursor.execute(f'create virtual table my_table using vectorlite(my_embedding float32[{DIM}] cosine, hnsw(max_elements={NUM_ELEMENTS}))')
# Insert the test data into the virtual table. Note that the rowid MUST be explicitly set when inserting vectors and cannot be auto-generated.
# The rowid is used to uniquely identify a vector and serve as a "foreign key" to relate to the vector's metadata.
# Vectorlite takes vectors in raw bytes, so a numpy vector need to be converted to bytes before inserting into the table.
cursor . executemany ( 'insert into my_table(rowid, my_embedding) values (?, ?)' , [( i , data [ i ]. tobytes ()) for i in range ( NUM_ELEMENTS )])
# Query the virtual table to get the vector at rowid 12345. Note the vector needs to be converted back to json using vector_to_json() to be human-readable.
result = cursor . execute ( 'select vector_to_json(my_embedding) from my_table where rowid = 1234' ). fetchone ()
print ( f'vector at rowid 1234: { result [ 0 ] } ' )
# Find 10 approximate nearest neighbors of data[0] and there distances from data[0].
# knn_search() is used to tell vectorlite to do a vector search.
# knn_param(V, K, ef) is used to pass the query vector V, the number of nearest neighbors K to find and an optional ef parameter to tune the performance of the search.
# If ef is not specified, ef defaults to 10. For more info on ef, please check https://github.com/nmslib/hnswlib/blob/v0.8.0/ALGO_PARAMS.md
result = cursor . execute ( 'select rowid, distance from my_table where knn_search(my_embedding, knn_param(?, 10))' , [ data [ 0 ]. tobytes ()]). fetchall ()
print ( f'10 nearest neighbors of row 0 is { result } ' )
# Find 10 approximate nearest neighbors of the first embedding in vectors with rowid within [1000, 2000) using metadata(rowid) filtering.
rowids = ',' . join ([ str ( rowid ) for rowid in range ( 1000 , 2000 )])
result = cursor . execute ( f'select rowid, distance from my_table where knn_search(my_embedding, knn_param(?, 10)) and rowid in ( { rowids } )' , [ data [ 0 ]. tobytes ()]). fetchall ()
print ( f'10 nearest neighbors of row 0 in vectors with rowid within [1000, 2000) is { result } ' )
conn . close ()Lebih banyak contoh dapat ditemukan dalam contoh dan binding/python/vectorlite_py/test folder.
Jika Anda ingin berkontribusi atau menyusun vektorlite untuk platform Anda sendiri, Anda dapat mengikuti instruksi berikut untuk membangunnya.
git clone --recurse-submodules [email protected]:1yefuwang1/vectorlite.git
python3 bootstrap_vcpkg.py
# install dependencies for running python tests
python3 -m pip install -r requirements-dev.txt
sh build.sh # for debug build
sh build_release.sh # for release build
vecorlite.[so|dll|dylib] dapat ditemukan di folder build/release/vectorlite atau build/dev/vectorlite
python3 -m build -w
roda vectorlite_py dapat ditemukan di folder dist
select rowid from my_table where rowid in ( 1 , 2 , 3 ) and rowid in ( 2 , 3 , 4 ) # multiple rowid constraints
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) and rowid in ( 1 , 2 , 3 ) and rowid in ( 3 , 4 ) # multiple rowid constraints
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) and knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) # multiple vector constraints
Namun, beberapa kendala dapat dikombinasikan dengan or , karena kueri akan mencari indeks HNSW yang mendasarinya beberapa kali. Hasilnya akan dikonsumsi oleh SQLite. Pertanyaan berikut akan berhasil.
select rowid from my_table where rowid in ( 1 , 2 , 3 ) or rowid in ( 2 , 3 , 4 )
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) and rowid in ( 1 , 2 , 3 ) or rowid in ( 3 , 4 )
select rowid, distance from my_table where knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) or knn_search(my_embedding, knn_param(vector_from_json( ' [1,2,3] ' ), 10 )) [0, min(max value of size_t, max value of int64_t)] . Alasannya adalah RowID digunakan sebagai labeltype dalam indeks HNSW, yang memiliki tipe size_t (biasanya 32-bit atau 64-bit tergantung pada platform).sqlite bawaan Python biasanya dibangun dengan versi lama. Silakan gunakan pengikatan SQLite yang lebih baru seperti apsw jika Anda ingin menggunakan filter metadata. knn_search () tanpa Rowid Fitler masih berfungsi untuk SQLite3 lama.Proyek ini sangat terinspirasi oleh proyek berikut