This project came from a simple idea: what if you could provide an entire codebase to an LLM instead of just small pieces? Most coding assistants, like co-pilots, work on a limited scope, but I wanted something that could handle the full context of a project.
By integrating the full codebase with Retrieval-Augmented Generation (RAG), this POC aims to improve the quality and relevance of code suggestions. The goal is to see how having the complete code available for real-time querying can enhance productivity.
CodeRAG is an AI-powered code retrieval and augmentation tool that leverages OpenAI's models (such as gpt-4 or gpt-3.5-turbo) for real-time codebase querying, indexing, and improvement. This project integrates a Retrieval-Augmented Generation (RAG) system to help developers seamlessly search through code, receive suggestions, and implement improvements.
.env file for API keys, model selection, and directories.git clone https://github.com/yourusername/CodeRAG.git
cd CodeRAGCreate a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # On Windows use `venvScriptsactivate`Install required packages:
pip install -r requirements.txtCreate a .env file in the root of the project and add the following variables:
OPENAI_API_KEY=your_openai_api_key
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
OPENAI_CHAT_MODEL=gpt-4o
WATCHED_DIR=path_to_your_code_directory
FAISS_INDEX_FILE=path_to_faiss_index
EMBEDDING_DIM=1536 # Modify if you're using a different embedding modelStart the Backend:
To start the backend (indexing, embeddings, and monitoring):
python main.pyStart the Frontend:
To launch the Streamlit UI:
streamlit run app.pymain.py: The main script to run the application.prompt_flow.py: Handles querying OpenAI's API and manages the search and conversational history.coderag/config.py: Stores configuration and environment variables.coderag/search.py: Manages vector database (FAISS) searches for relevant code snippets..env: Holds environment-specific settings (OpenAI API keys, model configuration, etc.).requirements.txt: Lists the Python dependencies needed to run the project.Feel free to fork this repository, open issues, and submit pull requests.
git checkout -b feature/your-feature).git commit -am 'Add new feature').git push origin feature/your-feature).This project is licensed under the Apache License. See the LICENSE file for details.