vector_lake
1.0.0
VectorLake是一個可靠的矢量數據庫,旨在低維護,成本,有效的存儲和ANN查詢對跨S3文件分佈的任何大小向量數據。
受文章的啟發,我應該使用哪個矢量數據庫?比較備忘單
以折衷創建的vectorlake,以最大程度地減少數據庫維護,成本並提供自定義數據分配策略
本機大數據支持:專門用於處理大數據集的設計,使其非常適合大數據項目。
向量數據處理:能夠存儲和查詢高維向量,通常用於將存儲嵌入機器學習項目中。專案.
有效的搜索:有效的最近鄰居搜索,非常適合在高維空間中查詢相似向量。這使得它對於在高維空間中的類似向量查詢特別有用。
數據持久性:支持磁盤,網絡量和S3上的數據持久性,從而可以長期存儲和檢索索引數據。
可定制的分區:權衡設計,以最大程度地減少數據庫維護,成本並提供自定義數據分配策略。
LLM代理的本地支持。
功能存儲用於實驗數據。
要開始使用vectorlake,只需使用PIP安裝軟件包:
pip install vector_lake import numpy as np
from vector_lake import VectorLake
db = VectorLake ( location = "s3://vector-lake" , dimension = 5 , approx_shards = 243 )
N = 100 # for example
D = 5 # Dimensionality of each vector
embeddings = np . random . rand ( N , D )
for em in embeddings :
db . add ( em , metadata = {}, document = "some document" )
db . persist ()
db = VectorLake ( location = "s3://vector-lake" , dimension = 5 , approx_shards = 243 )
# re-init test
db . query ([ 0.56325391 , 0.1500543 , 0.88579166 , 0.73536349 , 0.7719873 ])按自定義分區自定義分區按自定義類別
import numpy as np
from vector_lake . core . index import Partition
if __name__ == "__main__" :
db = Partition ( location = "s3://vector-lake" , partition_key = "feature" , dimension = 5 )
N = 100 # for example
D = 5 # Dimensionality of each vector
embeddings = np . random . rand ( N , D )
for em in embeddings :
db . add ( em , metadata = {}, document = "some document" )
db . persist ()
db = Partition ( location = "s3://vector-lake" , key = "feature" , dimension = 5 )
# re-init test
db . buckets
db . query ([ 0.56325391 , 0.1500543 , 0.88579166 , 0.73536349 , 0.7719873 ]) import numpy as np
from vector_lake import VectorLake
db = VectorLake ( location = "/mnt/db" , dimension = 5 , approx_shards = 243 )
N = 100 # for example
D = 5 # Dimensionality of each vector
embeddings = np . random . rand ( N , D )
for em in embeddings :
db . add ( em , metadata = {}, document = "some document" )
db . persist ()
db = VectorLake ( location = "/mnt/db" , dimension = 5 , approx_shards = 243 )
# re-init test
db . query ([ 0.56325391 , 0.1500543 , 0.88579166 , 0.73536349 , 0.7719873 ]) from langchain . document_loaders import TextLoader
from langchain . embeddings . sentence_transformer import SentenceTransformerEmbeddings
from langchain . text_splitter import CharacterTextSplitter
from vector_lake . langchain import VectorLakeStore
loader = TextLoader ( "Readme.md" )
documents = loader . load ()
# split it into chunks
text_splitter = CharacterTextSplitter ( chunk_size = 100 , chunk_overlap = 0 )
docs = text_splitter . split_documents ( documents )
# create the open-source embedding function
embedding = SentenceTransformerEmbeddings ( model_name = "all-MiniLM-L6-v2" )
db = VectorLakeStore . from_documents ( documents = docs , embedding = embedding )
query = "What is Vector Lake?"
docs = db . similarity_search ( query )
# print results
print ( docs [ 0 ]. page_content )vectorlake為您提供了一個簡單,有彈性的矢量數據庫的功能,但是設置非常簡單,操作開銷低。有了它,您擁有輕巧且可靠的分佈式矢量商店。
VectorLake利用層次可導航的小世界(HNSW)在所有矢量數據碎片上進行數據劃分。這樣可以確保每個對系統的修改都與向量距離對齊。您可以在此處了解有關設計的更多信息。
TBD
歡迎對Vectorlake的貢獻!如果您想做出貢獻,請按照以下步驟:
在貢獻之前,請閱讀貢獻指南。
VectorLake根據MIT許可發布。