Vector Search with FAISS
1.0.0
該項目使用嵌入式,faiss和產品量化的嵌入式索引和產品量化實現了有效的相似性搜索系統,並使用自定義索引和Kmeans實現。它使您可以根據文本內容找到類似的講座,從而快速檢索並推薦講座。
克隆存儲庫
git clone https://github.com/bariscamli/Vector-Search-with-FAISS.git
cd Vector-Search-with-FAISS創建虛擬環境(可選但建議)
python -m venv venv
source venv/bin/activate # On Windows use `venvScriptsactivate`創建虛擬環境(可選但建議)
pip install -r requirements.txt講座數據:將您的講座文本置於config.py中的LECTURE_FILE指定的文件中。每行應包含一個講座。
查詢數據:將查詢文本放在config.py中的QUERY_FILE指定的文件中。每行應包含一個查詢。講座的示例格式.txt:
Introduction to Machine Learning
Advanced Topics in Deep Learning
Statistical Methods in Data Science
...
查詢的示例格式.txt:
Basics of Neural Networks
Regression Analysis Techniques
Clustering Algorithms Overview
...
所有配置均通過config.py文件管理。關鍵參數包括:
File Paths
- LECTURE_FILE: Path to the lecture data file.
- QUERY_FILE: Path to the query data file.
Embedding Model
- EMBEDDING_MODEL_NAME: Name or path of the embedding model to use.
- BATCH_SIZE: Batch size for computing embeddings.
FAISS Parameters
- FAISS_EFSEARCH_VALUES: List of efSearch values for performance evaluation.
Quantization Parameters
- PQ_M: Number of sub-vector quantizers.
- PQ_NBITS: Number of bits per sub-vector.
- KMEANS_MAX_ITER: Maximum iterations for k-means during PQ training.
運行主腳本以執行完整管道:
python main.py數據加載和預處理
嵌入計算
EMBEDDING_MODEL_NAME加載嵌入模型。基線計算
FAISS索引建設和評估
efSearch值評估。性能可視化

量化
CustomIndexPQ )。示例搜索

numpymatplotlibfaiss (如果有GPU,則通過pip install faiss-cpu或faiss-gpu )loggingtransformers )