VectorChord下载VectorChord源代码下载

vectorchord

毫不费力地在AWS I4i.xlarge实例（每月250美元）上托管了1亿768维矢量（250GB+），其中4个VCPU和32GB的RAM带有VectorChord。

docker for pgvecto.rs：

VectorChord（VCHORD）是一种postgreSQL扩展，旨在可扩展，高性能和磁盘有效的矢量相似性搜索，并作为pgvecto.rs的后继产品。

借助Vectorchord，您可以以1美元的价格存储400,000个向量，从而可以节省大量资金：与Pinecone优化的存储相比，向量增加了6倍，而PGVector/pgvecto.rs的价格为26倍，价格相同¹ 。有关更多见解，请查看我们的启动博客文章。

特征

VectorChord引入了PGVecto.RS和PGVECTOR的显着增强：

⚡增强性能：与PGVECTOR的HNSW实现相比，提供最高5倍的查询，更高的插入量插入速度，更高16倍，插入16倍，插入量更快¹⁶倍。

？负担得起的向量搜索：查询100m 768维向量仅使用32GB的内存，在95％的top10召回中达到35ms P50延迟，帮助您保持基础架构的成本降低，同时保持高搜索质量。

？无缝集成：与PGVECTOR数据类型和语法完全兼容，同时提供最佳的默认值 - 无需手动参数调整。只需介绍vectorchord即可增强性能。

？外部索引构建：利用IVF在外部构建索引（例如，在GPU上）以更快的Kmeans聚类，并结合Rabitq ³压缩，以有效地存储向量，同时通过自动reranking维持搜索质量。

？长量向量支持：存储和搜索矢量最大为65,535个维度，从而可以轻松地使用最佳的高维模型，例如Text-ex-ebbedding-3-large。

快速开始

对于新用户，我们建议使用Docker Image快速启动。

docker run 
  --name vectorchord-demo 
  -e POSTGRES_PASSWORD=mysecretpassword 
  -p 5432:5432 
  -d tensorchord/vchord-postgres:pg17-v0.1.0

然后，您可以使用psql命令行工具连接到数据库。默认用户名是postgres ，默认密码是mysecretpassword 。

psql -h localhost -p 5432 -U postgres

运行以下SQL以确保启用扩展名。

CREATE EXTENSION IF NOT EXISTS vchord CASCADE;

并确保将vchord.so添加到postgresql.conf中的shared_preload_libraries _preload_libraries。

 -- Add vchord and pgvector to shared_preload_libraries --
ALTER SYSTEM SET shared_preload_libraries = ' vchord.so ' ;

要创建VectorChord Rabitq（VCHORDRQ）索引，您可以使用以下SQL。

 -- Set residual_quantization to true and spherical_centroids to false for L2 distance --
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[ build . internal ]
lists = [ 4096 ]
spherical_centroids = false
$$);


-- Set residual_quantization to false and spherical_centroids to true for cos/dot distance --
CREATE INDEX ON laion USING vchordrq (embedding vector_cos_ops) WITH (options = $$
residual_quantization = false
[ build . internal ]
lists = [ 4096 ]
spherical_centroids = true
$$);

文档

询问

查询语句与PGVECTOR完全相同。 vectorchord支持任何过滤器操作以及使用vbase（例如pgvecto.rs）等子句。

 SELECT * FROM items ORDER BY embedding < - > ' [3,1,2] ' LIMIT 5 ;

支持的距离功能是：

<-> -L2距离
<＃> - （负）内部产品
<=> - 余弦距离

查询性能调整

您可以通过调整probes和epsilon参数来微调搜索性能：

 -- Set probes to control the number of lists scanned. 
-- Recommended range: 3%–10% of the total `lists` value.
SET vchordrq . probes = 100 ;

-- Set epsilon to control the reranking precision.
-- Larger value means more rerank for higher recall rate.
-- Don't change it unless you only have limited memory.
-- Recommended range: 1.0–1.9. Default value is 1.9.
SET vchordrq . epsilon = 1 . 9 ;

-- vchordrq relies on a projection matrix to optimize performance.
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
-- Default value: '64,128,256,384,512,768,1024,1536'
-- Note: This setting requires a database restart to take effect.
ALTER SYSTEM SET vchordrq . prewarm_dim = ' 64,128,256,384,512,768,1024,1536 ' ;

以及Postgres的设置

 -- If using SSDs, set `effective_io_concurrency` to 200 for faster disk I/O.
SET effective_io_concurrency = 200 ;

-- Disable JIT (Just-In-Time Compilation) as it offers minimal benefit (1–2%) 
-- and adds overhead for single-query workloads.
SET jit = off;

-- Allocate at least 25% of total memory to `shared_buffers`. 
-- For disk-heavy workloads, you can increase this to up to 90% of total memory. You may also want to disable swap with network storage to avoid io hang.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET shared_buffers = ' 8GB ' ;

索引prewarm

要预热索引，您可以使用以下SQL。使用有限的内存时，它将大大提高性能。

 -- vchordrq_prewarm(index_name::regclass) to prewarm the index into the shared buffer
SELECT vchordrq_prewarm( ' gist_train_embedding_idx ' ::regclass) "

索引建立时间

索引构建可以平行，并且在外部质心预典符中，总时间主要受磁盘速度的限制。使用以下设置优化并行性：

 -- Set this to the number of CPU cores available for parallel operations.
SET max_parallel_maintenance_workers = 8 ;
SET max_parallel_workers = 8 ;

-- Adjust the total number of worker processes. 
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET max_worker_processes = 8 ;

索引进度

您可以通过查询pg_stat_progress_create_index视图来检查索引进度。

 SELECT phase, round( 100 . 0 * blocks_done / nullif(blocks_total, 0 ), 1 ) AS " % " FROM pg_stat_progress_create_index;

外部索引预约

与纯SQL不同，外部索引预抄录将首先在外部进行聚类，然后将质心插入后gresql表。尽管它可能更复杂，但外部构建在较大的数据集（> 5m）上肯定要快得多。

首先，您需要使用faiss ， scikit-learn或任何其他聚类库进行矢量聚类。

质心应在具有3列的任何名称的表中预设：

ID（整数）：每个质心的ID，应该是唯一的
父（整数，无效）：每种质心的父ID，应为正常群集null
向量（向量）：每种质心pgvector矢量类型的表示

例子可能是这样：

 -- Create table of centroids
CREATE TABLE public .centroids (id integer NOT NULL UNIQUE, parent integer , vector vector( 768 ));
-- Insert centroids into it
INSERT INTO public . centroids (id, parent, vector) VALUES ( 1 , NULL , ' {0.1, 0.2, 0.3, ..., 0.768} ' );
INSERT INTO public . centroids (id, parent, vector) VALUES ( 2 , NULL , ' {0.4, 0.5, 0.6, ..., 0.768} ' );
INSERT INTO public . centroids (id, parent, vector) VALUES ( 3 , NULL , ' {0.7, 0.8, 0.9, ..., 0.768} ' );
-- ...

-- Create index using the centroid table
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
[ build . external ]
table = ' public.centroids '
$$);

为了简化工作流程，我们为外部索引预计算提供端到端脚本，请参见脚本。

从源安装

根据PGRX的说明安装PGRX。

cargo install --locked cargo-pgrx
cargo pgrx init --pg17 $( which pg_config ) # To init with system postgres, with pg_config in PATH
cargo pgrx install --release --sudo # To install the extension into the system postgres with sudo

限制

Kmeans聚类：内置的Kmeans聚类取决于多线程中的内存构建，可能需要大量的内存。我们强烈建议将外部质心预成像用于有效的指数构建。

执照

该软件是根据双重许可模型许可的：

GNU AFFERO通用公共许可证V3（AGPLV3） ：您可以根据AGPLV3的条款使用，修改和分发此软件。
弹性许可证V2（ELV2） ：您还可以在具有特定限制的弹性许可证V2下使用，修改和分发此软件。

您可以根据自己的需求选择一个许可证。我们欢迎任何商业协作或支持，因此请给我们发送有关许可证的任何疑问或请求。

基于Myscale基准测试，具有768维矢量和95％的召回。 ↩
基于具有768维向量的Myscale基准测试。请查看我们的博客文章以获取更多详细信息。 ↩
Gao，Jianyang和Cheng Long。 “ Rabitq：量化具有大约最近邻居搜索的理论误差的高维矢量。” ACM在数据管理方面的论文集2.3（2024）：1-27。 ↩

展开

VectorChord

vectorchord

毫不费力地在AWS I4i.xlarge实例（每月250美元）上托管了1亿768维矢量（250GB+），其中4个VCPU和32GB的RAM带有VectorChord。

特征

快速开始

文档

询问

查询性能调整

索引prewarm

索引建立时间

索引进度

外部索引预约

从源安装

限制

执照

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express

VectorChord

vectorchord

毫不费力地在AWS I4i.xlarge实例（每月250美元）上托管了1亿768维矢量（250GB+），其中4个VCPU和32GB的RAM带有VectorChord。

特征

快速开始

文档

询问

查询性能调整

索引prewarm

索引建立时间

索引进度

外部索引预约

从源安装

限制

执照

脚注