VectorChord下載VectorChord源代碼下載

vectorchord

毫不費力地在AWS I4i.xlarge實例（每月250美元）上託管了1億768維矢量（250GB+），其中4個VCPU和32GB的RAM帶有VectorChord。

docker for pgvecto.rs：

VectorChord（VCHORD）是一種postgreSQL擴展，旨在可擴展，高性能和磁盤有效的矢量相似性搜索，並作為pgvecto.rs的後繼產品。

借助Vectorchord，您可以以1美元的價格存儲400,000個向量，從而可以節省大量資金：與Pinecone優化的存儲相比，向量增加了6倍，而PGVector/pgvecto.rs的價格為26倍，價格相同¹ 。有關更多見解，請查看我們的啟動博客文章。

特徵

VectorChord引入了PGVecto.RS和PGVECTOR的顯著增強：

⚡增強性能：與PGVECTOR的HNSW實現相比，提供最高5倍的查詢，更高的插入量插入速度，更高16倍，插入16倍，插入量更快¹⁶倍。

？負擔得起的向量搜索：查詢100m 768維向量僅使用32GB的內存，在95％的top10召回中達到35ms P50延遲，幫助您保持基礎架構的成本降低，同時保持高搜索質量。

？無縫集成：與PGVECTOR數據類型和語法完全兼容，同時提供最佳的默認值 - 無需手動參數調整。只需介紹vectorchord即可增強性能。

？外部索引構建：利用IVF在外部構建索引（例如，在GPU上）以更快的Kmeans聚類，並結合Rabitq ³壓縮，以有效地存儲向量，同時通過自動reranking維持搜索質量。

？長量向量支持：存儲和搜索矢量最大為65,535個維度，從而可以輕鬆地使用最佳的高維模型，例如Text-ex-ebbedding-3-large。

快速開始

對於新用戶，我們建議使用Docker Image快速啟動。

docker run 
  --name vectorchord-demo 
  -e POSTGRES_PASSWORD=mysecretpassword 
  -p 5432:5432 
  -d tensorchord/vchord-postgres:pg17-v0.1.0

然後，您可以使用psql命令行工具連接到數據庫。默認用戶名是postgres ，默認密碼是mysecretpassword 。

psql -h localhost -p 5432 -U postgres

運行以下SQL以確保啟用擴展名。

CREATE EXTENSION IF NOT EXISTS vchord CASCADE;

並確保將vchord.so添加到postgresql.conf中的shared_preload_libraries _preload_libraries。

 -- Add vchord and pgvector to shared_preload_libraries --
ALTER SYSTEM SET shared_preload_libraries = ' vchord.so ' ;

要創建VectorChord Rabitq（VCHORDRQ）索引，您可以使用以下SQL。

 -- Set residual_quantization to true and spherical_centroids to false for L2 distance --
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[ build . internal ]
lists = [ 4096 ]
spherical_centroids = false
$$);


-- Set residual_quantization to false and spherical_centroids to true for cos/dot distance --
CREATE INDEX ON laion USING vchordrq (embedding vector_cos_ops) WITH (options = $$
residual_quantization = false
[ build . internal ]
lists = [ 4096 ]
spherical_centroids = true
$$);

文件

詢問

查詢語句與PGVECTOR完全相同。 vectorchord支持任何過濾器操作以及使用vbase（例如pgvecto.rs）等子句。

 SELECT * FROM items ORDER BY embedding < - > ' [3,1,2] ' LIMIT 5 ;

支持的距離功能是：

<-> -L2距離
<＃> - （負）內部產品
<=> - 餘弦距離

查詢性能調整

您可以通過調整probes和epsilon參數來微調搜索性能：

 -- Set probes to control the number of lists scanned. 
-- Recommended range: 3%–10% of the total `lists` value.
SET vchordrq . probes = 100 ;

-- Set epsilon to control the reranking precision.
-- Larger value means more rerank for higher recall rate.
-- Don't change it unless you only have limited memory.
-- Recommended range: 1.0–1.9. Default value is 1.9.
SET vchordrq . epsilon = 1 . 9 ;

-- vchordrq relies on a projection matrix to optimize performance.
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
-- Default value: '64,128,256,384,512,768,1024,1536'
-- Note: This setting requires a database restart to take effect.
ALTER SYSTEM SET vchordrq . prewarm_dim = ' 64,128,256,384,512,768,1024,1536 ' ;

以及Postgres的設置

 -- If using SSDs, set `effective_io_concurrency` to 200 for faster disk I/O.
SET effective_io_concurrency = 200 ;

-- Disable JIT (Just-In-Time Compilation) as it offers minimal benefit (1–2%) 
-- and adds overhead for single-query workloads.
SET jit = off;

-- Allocate at least 25% of total memory to `shared_buffers`. 
-- For disk-heavy workloads, you can increase this to up to 90% of total memory. You may also want to disable swap with network storage to avoid io hang.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET shared_buffers = ' 8GB ' ;

索引prewarm

要預熱索引，您可以使用以下SQL。使用有限的內存時，它將大大提高性能。

 -- vchordrq_prewarm(index_name::regclass) to prewarm the index into the shared buffer
SELECT vchordrq_prewarm( ' gist_train_embedding_idx ' ::regclass) "

索引建立時間

索引構建可以平行，並且在外部質心預典符中，總時間主要受磁盤速度的限制。使用以下設置優化並行性：

 -- Set this to the number of CPU cores available for parallel operations.
SET max_parallel_maintenance_workers = 8 ;
SET max_parallel_workers = 8 ;

-- Adjust the total number of worker processes. 
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET max_worker_processes = 8 ;

索引進度

您可以通過查詢pg_stat_progress_create_index視圖來檢查索引進度。

 SELECT phase, round( 100 . 0 * blocks_done / nullif(blocks_total, 0 ), 1 ) AS " % " FROM pg_stat_progress_create_index;

外部索引預約

與純SQL不同，外部索引預抄錄將首先在外部進行聚類，然後將質心插入後gresql表。儘管它可能更複雜，但外部構建在較大的數據集（> 5m）上肯定要快得多。

首先，您需要使用faiss ， scikit-learn或任何其他聚類庫進行矢量聚類。

質心應在具有3列的任何名稱的表中預設：

ID（整數）：每個質心的ID，應該是唯一的
父（整數，無效）：每種質心的父ID，應為正常群集null
向量（向量）：每種質心pgvector矢量類型的表示

例子可能是這樣：

 -- Create table of centroids
CREATE TABLE public .centroids (id integer NOT NULL UNIQUE, parent integer , vector vector( 768 ));
-- Insert centroids into it
INSERT INTO public . centroids (id, parent, vector) VALUES ( 1 , NULL , ' {0.1, 0.2, 0.3, ..., 0.768} ' );
INSERT INTO public . centroids (id, parent, vector) VALUES ( 2 , NULL , ' {0.4, 0.5, 0.6, ..., 0.768} ' );
INSERT INTO public . centroids (id, parent, vector) VALUES ( 3 , NULL , ' {0.7, 0.8, 0.9, ..., 0.768} ' );
-- ...

-- Create index using the centroid table
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
[ build . external ]
table = ' public.centroids '
$$);

為了簡化工作流程，我們為外部索引預計算提供端到端腳本，請參見腳本。

從源安裝

根據PGRX的說明安裝PGRX。

cargo install --locked cargo-pgrx
cargo pgrx init --pg17 $( which pg_config ) # To init with system postgres, with pg_config in PATH
cargo pgrx install --release --sudo # To install the extension into the system postgres with sudo

限制

Kmeans聚類：內置的Kmeans聚類取決於多線程中的內存構建，可能需要大量的內存。我們強烈建議將外部質心預成像用於有效的指數構建。

執照

該軟件是根據雙重許可模型許可的：

GNU AFFERO通用公共許可證V3（AGPLV3） ：您可以根據AGPLV3的條款使用，修改和分發此軟件。
彈性許可證V2（ELV2） ：您還可以在具有特定限制的彈性許可證V2下使用，修改和分發此軟件。

您可以根據自己的需求選擇一個許可證。我們歡迎任何商業協作或支持，因此請給我們發送有關許可證的任何疑問或請求。

基於Myscale基準測試，具有768維矢量和95％的召回。 ↩
基於具有768維向量的Myscale基準測試。請查看我們的博客文章以獲取更多詳細信息。 ↩
Gao，Jianyang和Cheng Long。 “ Rabitq：量化具有大約最近鄰居搜索的理論誤差的高維矢量。” ACM在數據管理方面的論文集2.3（2024）：1-27。 ↩

展開

VectorChord

vectorchord

毫不費力地在AWS I4i.xlarge實例（每月250美元）上託管了1億768維矢量（250GB+），其中4個VCPU和32GB的RAM帶有VectorChord。

特徵

快速開始

文件

詢問

查詢性能調整

索引prewarm

索引建立時間

索引進度

外部索引預約

從源安裝

限制

執照

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express

VectorChord

vectorchord

毫不費力地在AWS I4i.xlarge實例（每月250美元）上託管了1億768維矢量（250GB+），其中4個VCPU和32GB的RAM帶有VectorChord。

特徵

快速開始

文件

詢問

查詢性能調整

索引prewarm

索引建立時間

索引進度

外部索引預約

從源安裝

限制

執照

腳註