
Database vektor ramah server tanpa server di tangan Anda.
Paket flechasdb adalah perpustakaan inti dari sistem flechasdb yang ditulis dalam karat.
Sistem FLECHASDB bertujuan untuk menjadi database vektor yang sangat cocok di lingkungan tanpa server. Keyakinan sistem FLECHASDB sederhana; Tidak memerlukan server khusus terus berjalan .
*: disediakan oleh paket lain flechasdb-s3 .
Belum ada peti yang diterbitkan. Harap tambahkan baris berikut ke file Cargo.toml Anda:
[ dependencies ]
flechasdb = { git = " https://github.com/codemonger-io/flechasdb.git " }Berikut adalah exmple membangun database vektor dari vektor yang dihasilkan secara acak.
use rand :: Rng ;
use flechasdb :: db :: build :: {
DatabaseBuilder ,
proto :: serialize_database ,
} ;
use flechasdb :: io :: LocalFileSystem ;
use flechasdb :: vector :: BlockVectorSet ;
fn main ( ) {
const M : usize = 100000 ; // number of vectors
const N : usize = 1536 ; // vector size
const D : usize = 12 ; // number of subvector divisions
const P : usize = 100 ; // number of partitions
const C : usize = 256 ; // number of clusters for product quantization
let time = std :: time :: Instant :: now ( ) ;
let mut data : Vec < f32 > = Vec :: with_capacity ( M * N ) ;
unsafe { data . set_len ( M * N ) ; }
let mut rng = rand :: thread_rng ( ) ;
rng . fill ( & mut data [ .. ] ) ;
let vs = BlockVectorSet :: chunk ( data , N . try_into ( ) . unwrap ( ) ) . unwrap ( ) ;
println ! ( "prepared data in {} s" , time . elapsed ( ) . as_secs_f32 ( ) ) ;
let time = std :: time :: Instant :: now ( ) ;
let mut db = DatabaseBuilder :: new ( vs )
. with_partitions ( P . try_into ( ) . unwrap ( ) )
. with_divisions ( D . try_into ( ) . unwrap ( ) )
. with_clusters ( C . try_into ( ) . unwrap ( ) )
. build ( )
. unwrap ( ) ;
println ! ( "built database in {} s" , time . elapsed ( ) . as_secs_f32 ( ) ) ;
for i in 0 .. M {
db . set_attribute_at ( i , ( "datum_id" , i as u64 ) ) . unwrap ( ) ;
}
let time = std :: time :: Instant :: now ( ) ;
serialize_database ( & db , & mut LocalFileSystem :: new ( "testdb" ) ) . unwrap ( ) ;
println ! ( "serialized database in {} s" , time . elapsed ( ) . as_secs_f32 ( ) ) ;
} Anda dapat menemukan contoh lengkap dalam folder examples/build-random .
FYI: Butuh beberapa saat di mesin saya (Apple M1 Pro, 32GB RAM, 1TB SSD).
prepared data in 0.9123601 s
built database in 906.51526 s
serialized database in 0.14329213 s
Berikut adalah contoh memuat database vektor dan menanyakan vektor yang dihasilkan secara acak untuk tetangga k-nearest (K-NN).
use rand :: Rng ;
use std :: env :: args ;
use std :: path :: Path ;
use flechasdb :: db :: stored :: { Database , LoadDatabase } ;
use flechasdb :: io :: LocalFileSystem ;
fn main ( ) {
const K : usize = 10 ; // k-nearest neighbors
const NPROBE : usize = 5 ; // number of partitions to query
let time = std :: time :: Instant :: now ( ) ;
let db_path = args ( ) . nth ( 1 ) . expect ( "no db path given" ) ;
let db_path = Path :: new ( & db_path ) ;
let db = Database :: < f32 , _ > :: load_database (
LocalFileSystem :: new ( db_path . parent ( ) . unwrap ( ) ) ,
db_path . file_name ( ) . unwrap ( ) . to_str ( ) . unwrap ( ) ,
) . unwrap ( ) ;
println ! ( "loaded database in {} s" , time . elapsed ( ) . as_secs_f32 ( ) ) ;
let mut qv : Vec < f32 > = Vec :: with_capacity ( db . vector_size ( ) ) ;
unsafe { qv . set_len ( db . vector_size ( ) ) ; }
let mut rng = rand :: thread_rng ( ) ;
rng . fill ( & mut qv [ .. ] ) ;
for r in 0 .. 2 { // second round should run faster
let time = std :: time :: Instant :: now ( ) ;
let results = db . query (
& qv ,
K . try_into ( ) . unwrap ( ) ,
NPROBE . try_into ( ) . unwrap ( ) ,
) . unwrap ( ) ;
println ! ( "[{}] queried k-NN in {} s" , r , time . elapsed ( ) . as_secs_f32 ( ) ) ;
let time = std :: time :: Instant :: now ( ) ;
for ( i , result ) in results . into_iter ( ) . enumerate ( ) {
// getting attributes will incur additional disk reads
let attr = result . get_attribute ( "datum_id" ) . unwrap ( ) ;
println ! (
" t {}: partition={}, approx. distance²={}, datum_id={:?}" ,
i ,
result . partition_index ,
result . squared_distance ,
attr ,
) ;
}
println ! (
"[{}] printed results in {} s" ,
r ,
time . elapsed ( ) . as_secs_f32 ( ) ,
) ;
}
} Anda dapat menemukan contoh lengkap dalam folder examples/query-sync .
FYI: Output pada mesin saya (Apple M1 Pro, 32GB RAM, 1TB SSD):
loaded database in 0.000142083 s
[0] queried k-NN in 0.0078015 s
0: partition=95, approx. distance²=126.23533, datum_id=Some(Uint64(90884))
1: partition=29, approx. distance²=127.76597, datum_id=Some(Uint64(30864))
2: partition=95, approx. distance²=127.80611, datum_id=Some(Uint64(75236))
3: partition=56, approx. distance²=127.808174, datum_id=Some(Uint64(27890))
4: partition=25, approx. distance²=127.85459, datum_id=Some(Uint64(16417))
5: partition=95, approx. distance²=127.977425, datum_id=Some(Uint64(70910))
6: partition=25, approx. distance²=128.06209, datum_id=Some(Uint64(3237))
7: partition=95, approx. distance²=128.22603, datum_id=Some(Uint64(41942))
8: partition=79, approx. distance²=128.26906, datum_id=Some(Uint64(89799))
9: partition=25, approx. distance²=128.27995, datum_id=Some(Uint64(6593))
[0] printed results in 0.003392833 s
[1] queried k-NN in 0.001475625 s
0: partition=95, approx. distance²=126.23533, datum_id=Some(Uint64(90884))
1: partition=29, approx. distance²=127.76597, datum_id=Some(Uint64(30864))
2: partition=95, approx. distance²=127.80611, datum_id=Some(Uint64(75236))
3: partition=56, approx. distance²=127.808174, datum_id=Some(Uint64(27890))
4: partition=25, approx. distance²=127.85459, datum_id=Some(Uint64(16417))
5: partition=95, approx. distance²=127.977425, datum_id=Some(Uint64(70910))
6: partition=25, approx. distance²=128.06209, datum_id=Some(Uint64(3237))
7: partition=95, approx. distance²=128.22603, datum_id=Some(Uint64(41942))
8: partition=79, approx. distance²=128.26906, datum_id=Some(Uint64(89799))
9: partition=25, approx. distance²=128.27995, datum_id=Some(Uint64(6593))
[1] printed results in 0.0000215 s
Berikut adalah contoh memuat database vektor secara asinkron dan menanyakan vektor yang dihasilkan secara acak untuk K-NN.
use rand :: Rng ;
use std :: env :: args ;
use std :: path :: Path ;
use flechasdb :: asyncdb :: io :: LocalFileSystem ;
use flechasdb :: asyncdb :: stored :: { Database , LoadDatabase } ;
# [ tokio :: main ]
async fn main ( ) {
const K : usize = 10 ; // k-nearest neighbors
const NPROBE : usize = 5 ; // number of partitions to search
let time = std :: time :: Instant :: now ( ) ;
let db_path = args ( ) . nth ( 1 ) . expect ( "missing db path" ) ;
let db_path = Path :: new ( & db_path ) ;
let db = Database :: < f32 , _ > :: load_database (
LocalFileSystem :: new ( db_path . parent ( ) . unwrap ( ) ) ,
db_path . file_name ( ) . unwrap ( ) . to_str ( ) . unwrap ( ) ,
) . await . unwrap ( ) ;
println ! ( "loaded database in {} s" , time . elapsed ( ) . as_secs_f32 ( ) ) ;
let mut qv = Vec :: with_capacity ( db . vector_size ( ) ) ;
unsafe { qv . set_len ( db . vector_size ( ) ) ; }
let mut rng = rand :: thread_rng ( ) ;
rng . fill ( & mut qv [ .. ] ) ;
for r in 0 .. 2 { // second round should run faster
let time = std :: time :: Instant :: now ( ) ;
let results = db . query (
& qv ,
K . try_into ( ) . unwrap ( ) ,
NPROBE . try_into ( ) . unwrap ( ) ,
) . await . unwrap ( ) ;
println ! ( "[{}] queried k-NN in {} s" , r , time . elapsed ( ) . as_secs_f32 ( ) ) ;
let time = std :: time :: Instant :: now ( ) ;
for ( i , result ) in results . into_iter ( ) . enumerate ( ) {
// getting attributes will incur additional disk reads
let attr = result . get_attribute ( "datum_id" ) . await . unwrap ( ) ;
println ! (
" t {}: partition={}, approx. distance²={}, datum_id={:?}" ,
i ,
result . partition_index ,
result . squared_distance ,
attr ,
) ;
}
println ! (
"[{}] printed results in {} s" ,
r ,
time . elapsed ( ) . as_secs_f32 ( ) ,
) ;
}
} Contoh lengkapnya ada di folder examples/query-async .
FYI: Output pada mesin saya (Apple M1 Pro, 32GB RAM, 1TB SSD):
loaded database in 0.000170959 s
[0] queried k-NN in 0.008041208 s
0: partition=67, approx. distance²=128.50703, datum_id=Some(Uint64(69632))
1: partition=9, approx. distance²=129.98079, datum_id=Some(Uint64(73093))
2: partition=9, approx. distance²=130.10867, datum_id=Some(Uint64(7536))
3: partition=20, approx. distance²=130.29523, datum_id=Some(Uint64(67750))
4: partition=67, approx. distance²=130.71976, datum_id=Some(Uint64(77054))
5: partition=9, approx. distance²=130.80556, datum_id=Some(Uint64(93180))
6: partition=9, approx. distance²=130.90681, datum_id=Some(Uint64(22473))
7: partition=9, approx. distance²=130.94006, datum_id=Some(Uint64(40167))
8: partition=67, approx. distance²=130.9795, datum_id=Some(Uint64(8590))
9: partition=9, approx. distance²=131.03018, datum_id=Some(Uint64(53138))
[0] printed results in 0.00194175 s
[1] queried k-NN in 0.000789417 s
0: partition=67, approx. distance²=128.50703, datum_id=Some(Uint64(69632))
1: partition=9, approx. distance²=129.98079, datum_id=Some(Uint64(73093))
2: partition=9, approx. distance²=130.10867, datum_id=Some(Uint64(7536))
3: partition=20, approx. distance²=130.29523, datum_id=Some(Uint64(67750))
4: partition=67, approx. distance²=130.71976, datum_id=Some(Uint64(77054))
5: partition=9, approx. distance²=130.80556, datum_id=Some(Uint64(93180))
6: partition=9, approx. distance²=130.90681, datum_id=Some(Uint64(22473))
7: partition=9, approx. distance²=130.94006, datum_id=Some(Uint64(40167))
8: partition=67, approx. distance²=130.9795, datum_id=Some(Uint64(8590))
9: partition=9, approx. distance²=131.03018, datum_id=Some(Uint64(53138))
[1] printed results in 0.000011084 s
Ada tolok ukur pada data yang lebih realistis.
https://codemonger-io.github.io/flechasdb/api/flechasdb/
flechasdb mengimplementasikan IndexIVFPQ yang dijelaskan dalam artikel ini.
flechasdb mengimplementasikan K-Means ++ untuk menginisialisasi centroid untuk pengelompokan k-means näive.
Tbd
cargo buildcargo doc --lib --no-deps --releasePinecone
Database vektor yang dikelola sepenuhnya.
Milvus
Database vektor open-source dengan banyak fitur.
LANDINGB
Salah satu fitur mereka juga tanpa server , dan intinya ditulis dengan karat!
Mit
Bahan berikut oleh Codemonger dilisensikan di bawah CC BY-SA 4.0: