This project implements an efficient similarity search system for lecture content using embeddings, FAISS and Product Quantization with custom index & KMeans implementations. It allows you to find similar lectures based on textual content, enabling quick retrieval and recommendation of lectures.
Clone the Repository
git clone https://github.com/bariscamli/Vector-Search-with-FAISS.git
cd Vector-Search-with-FAISSCreate a Virtual Environment (Optional but Recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venvScriptsactivate`Create a Virtual Environment (Optional but Recommended)
pip install -r requirements.txtLecture Data: Place your lecture texts in a file specified by LECTURE_FILE in config.py. Each line should contain one lecture.
Query Data: Place your query texts in a file specified by QUERY_FILE in config.py. Each line should contain one query.
Example format for lectures.txt:
Introduction to Machine Learning
Advanced Topics in Deep Learning
Statistical Methods in Data Science
...
Example format for queries.txt:
Basics of Neural Networks
Regression Analysis Techniques
Clustering Algorithms Overview
...
All configurations are managed through the config.py file. Key parameters include:
File Paths
- LECTURE_FILE: Path to the lecture data file.
- QUERY_FILE: Path to the query data file.
Embedding Model
- EMBEDDING_MODEL_NAME: Name or path of the embedding model to use.
- BATCH_SIZE: Batch size for computing embeddings.
FAISS Parameters
- FAISS_EFSEARCH_VALUES: List of efSearch values for performance evaluation.
Quantization Parameters
- PQ_M: Number of sub-vector quantizers.
- PQ_NBITS: Number of bits per sub-vector.
- KMEANS_MAX_ITER: Maximum iterations for k-means during PQ training.
Run the main script to execute the full pipeline:
python main.pyData Loading and Preprocessing
Embedding Computation
EMBEDDING_MODEL_NAME.Baseline Computation
FAISS Index Building and Evaluation
efSearch values.Performance Visualization

Quantization
CustomIndexPQ) is created.Example Search

numpymatplotlibfaiss (Install via pip install faiss-cpu or faiss-gpu if you have a GPU)loggingtransformers if using Hugging Face models)