build vector db from scratch
1.0.0
ゼロからベクトルストアを作成する方法
命令はLinuxサーバーに基づいています
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" |
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
この画像の良いところは、PGVectorがプリインストールされ、時間を節約できることです。
version: "3.8"
services:
db:
image: ankane/pgvector:latest
container_name: local_pgdb
restart: always
ports:
- "5432:5432"
environment:
POSTGRES_USER: POSTGRES_USER
POSTGRES_PASSWORD: POSTGRES_PASSWORD
volumes:
- local_pgdata:/var/lib/postgresql/data
pgadmin:
image: dpage/pgadmin4
container_name: pgadmin4_container
restart: always
ports:
- "8888:80"
environment:
PGADMIN_DEFAULT_EMAIL: PGADMIN_DEFAULT_EMAIL
PGADMIN_DEFAULT_PASSWORD: PGADMIN_DEFAULT_PASSWORD
volumes:
- pgadmin-data:/var/lib/pgadmin
volumes:
local_pgdata:
pgadmin-data:
docker compose up -d
docker exec -it local_pgdb psql -U labadmin -c 'CREATE EXTENSION vector'
これで、両方を正常にインストールしました

!pip install --quiet -U langchain_cohere
!pip install --quiet -U langchain_postgres
任意の埋め込み(Huggingface)モデルを使用することができます。そのためのLangchainのドキュメントをご覧ください。
from langchain_cohere import CohereEmbeddings
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3!
collection_name = "my_docs"
embeddings = CohereEmbeddings()
vectorstore = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
)
]
vectorstore.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])
vectorstore.similarity_search("cat", k=`)
>>> [Document(page_content='there are cats in the pond', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'})]