Context based document search
1.0.0
This project provides a system for performing context-based search across documents stored in a vector database. Using OpenAI's embedding models and Chroma, this tool allows you to efficiently search through a collection of text documents and retrieve the most relevant results based on a given query.
Python 3.7 or higher
OpenAI API key
Install the required packages by running:
pip install -r requirements.txtgit clone https://github.com/your-username/contextual-documents-search.gitcd contextual-documents-searchpython -m venv venv
source venv/bin/activate # On Windows: venvScriptsactivatepip install -r requirements.txtOPENAI_API_KEY = your_openai_api_keyPrepare a directory of .txt files you want to search through and place them in the ./resumes folder or specify a different directory in the code.
In your main script, instantiate the VectorDBHandler class and call load_or_create_db() to initialize the vector store.
from dotenv import load_dotenv
from vector_db_handler import VectorDBHandler
# Load environment variables
load_dotenv()
# Set up directory paths and collection name
files_directory = "./resumes"
persist_directory = "./vector_db"
collection_name = "resumes_collection"
# Initialize the vector database handler
vector_db_handler = VectorDBHandler(files_directory, persist_directory, collection_name)
# Load or create the vector store database
vector_db_handler.load_or_create_db()
# Define the query for the search
query = "I am looking for a software engineer with OpenAI hard skill."
docs = vector_db_handler.query_vector_store(query)
# Output the top result
if docs:
print("Top matching document:")
print(docs[0].page_content)
else:
print("No matching documents found.")