vectin
1.0.0
Vectin adalah toko vektor sederhana yang dibangun dari awal untuk embedding teks dan pencarian kesamaan.
Mendukung penyimpanan vektor yang menyematkan dengan kegigihan disk.
# Tokens Size
max_tokens_count = 768
vectin = Vectin ( name = "short_term_logs" , max_tokens = max_tokens_count , persist_data = True , data_storage_path = None )
# Build Corpus
corpus = [
"Machine learning and artificial intelligence are gaining popularity." ,
"Python is widely used in data science and web development." ,
"Data science involves analyzing large datasets to extract meaningful insights." ,
"Artificial intelligence is transforming various industries." ,
"Data Science is the most popular field of study in 2021." ,
]
# Preprocess
corpus = vectin . chunk_sentences_to_max_tokens ( sentences = corpus , max_tokens = max_tokens_count )
# Encode
corpus_vectors = vectin . encode ( corpus )
# Save
vectin . insert_vectors ( corpus_vectors )
# Similarity Search
query = "data science"
result = vectin . similarity_search ( query )
print ( result )
# Disk Persistence
vectin . save_to_disk ()[ 'Python is widely used in data science and web development.' , '0.447214' ],
[ 'Data science involves analyzing large datasets to extract meaningful insights.' , '0.447214' ],
[ 'Data Science is the most popular field of study in 2021.' , '0.426401' ]