A lightweight vector database designed for small projects.
Features
Performance
More than 10x Faster than numpy-based vector operations.
It is currently compatible with g++ or clang++.
You may need to modify compile_config parameter in VectorDatabase initialization to inject your compile commands.
To make it work with other compiler, you may need to change tiny_vectordb.jit module.
pip install tiny_vectordbGood to go!
The package will emit some compiled files in the source directory, which may not be automatically removed using pip uninstall, so you need to run the following command manually if you want to uninstall the package comletely.
python -c "import tiny_vectordb; tiny_vectordb.cleanup()"After that, you can safely uninstall the package with:
pip uninstall tiny_vectordbfrom tiny_vectordb import VectorDatabase
collection_configs = [
{
"name": "hello",
"dimension": 256,
},
{
"name": "world",
"dimension": 1000,
}
]
database = VectorDatabase("test.db", collection_configs)
collection = database["hello"]
# add vectors
collection.setBlock(
["id1", "id2"], # ids
[[1] * 256, [2] * 256] # vectors
)
# search for nearest vectors
search_ids, search_scores = collection.search([1.9] * 256) For more usage, see example.py.
Designing Note:
No numpy array is used in the database, because I want it to be as lightweight as possible, and lists of numbers are eaiser to be converted into json for communication with http requests.
The data are always stored in contiguous memory to ensure the best searching performance.
So the addition and deletion are preferred to be done in batches as they envolve memory reallocation.
Here are some useful functions for batch operations:
class VectorCollection(Generic[NumVar]):
def addBlock(self, ids: list[str], vectors: list[list[NumVar]]) -> None:
def setBlock(self, ids: list[str], vectors: list[list[NumVar]]) -> None:
def deleteBlock(self, ids: list[str]) -> None:
def getBlock(self, ids: list[str]) -> list[list[NumVar]]: