
Reading Time: ~10 minutes
Building Art Deco RAG ChatBot using PulseJet Github Repo: https://github.com/Jet-Engine/art-deco-chatbot
This blogpost can be read from the following links:
Large Language Models (LLMs) have significantly advanced, improving their ability to answer a broad array of questions. However, they still encounter challenges, particularly with specific or recent information, often resulting in inaccuracies or "hallucinations." To address these issues, the Retrieval Augmented Generation (RAG) approach integrates a document retrieval step into the response generation process. This approach uses a corpus of documents and employs vector databases for efficient retrieval, enhancing the accuracy and reliability of LLM responses through three key steps:
Vector databases facilitate quick similarity searches and efficient data management, making RAG a powerful solution for enhancing LLM capabilities.
The Art Deco era, spanning the roaring 1920s to the 1940s, left a dazzling legacy in architecture. Despite the capabilities of models like Meta's Llama3.1, their responses can be unreliable, especially for nuanced or detailed queries specific to Art Deco. Our goal with the Art Deco ChatBot is to use RAG to improve the quality of responses about Art Deco architecture, comparing these with those generated by traditional LLMs in both quality and time efficiency.
By designing the Art Deco ChatBot, we also aim to show how a complex RAG system can be built. You can access the complete code at the Art Deco ChatBot GitHub repository. By examining the code and reading this README, you will learn:
Ollama is a program that facilitates running LLM models easily on local machines.
ollama pull llama3.1 (LLM that will be used for RAG)ollama pull nomic-embed-text (embedding model that will be used for RAG)In this project, we not only aim to write code to show how RAG can be done but also to compare and benchmark results of RAG with queries to different LLMs. Some of these LLMs cannot be run locally (like GPT-4o), while others are compute-heavy and are run on cloud services (like Llama3.1:70b on Groq).
LiteLLM provides a unified interface to query different LLMs, making our code cleaner and more readable. Checking out the LiteLLM Python library is recommended but not required for this project.
Get your API keys from OpenAI and Groq to use them in the project. Be aware that you may be billed for using these services. While the Groq API can be used for free at the time of writing, the OpenAI API is not free.
PulseJet is a high-performance vector database that enables efficient storage and retrieval of document embeddings. To set up PulseJet:
pip install pulsejetdocker run --name pulsejet_container -p 47044-47045:47044-47045 jetngine/pulsejetNote: You can skip the first step since pulsejet is already included in the requirements.txt file.
Check PulseJet Docs for details about running Pulsejet Docker images and using the pulsejet Python library for vector database operations.
Install all necessary dependencies by running:
pip install -r requirements.txt
This project was developed using a
condaenvironment withPython 3.11.
As we have not tested the project in different environments, we recommend adhering to this configuration for optimal performance and compatibility.
The Art Deco ChatBot uses two YAML files for configuration: config.template.yaml and secrets.yaml. Here's a detailed breakdown of each section:
Create a secrets.yaml file with your API keys:
#api_keys:
openai_key: "your_openai_key_here"
groq_key: "your_groq_key_here"#models:
main_model: "llama3.1"
embed_model: "nomic-embed-text"
#vector_db:
vector_db: "pulsejet"
#pulsejet:
pulsejet_location: "remote"
pulsejet_collection_name: "art-deco"
#paths:
rag_files_path: "rag_files/"
questions_file_path: "evaluation/questions.csv"
evaluation_path: "evaluation/"
rag_prompt_path: "evaluation/rag_prompt.txt"
metrics_file_path: "evaluation/metrics.json"
#embeddings:
embeddings_file_path: "embeddings_data/all_embeddings_HSNW.h5"
use_precalculated_embeddings: true
#llm_models:
all_models:
gpt-4o: "gpt-4o"
groq-llama3.1-8b: "groq/llama-3.1-8b-instant"
groq-llama3.1-70b: "groq/llama-3.1-70b-versatile"
ollama-llama3.1: "ollama/llama3.1"
ollama-llama3.1-70b: "ollama/llama3.1:70b"
selected_models:
- "gpt-4o"
- "groq-llama3.1-70b"
- "ollama-llama3.1"
#rag_parameters:
sentences_per_chunk: 10
chunk_overlap: 2
file_extension: ".txt"Here's a detailed explanation of each section:
true, the system will load embeddings from the specified file. When false, it will generate new embeddings and save them to this file.Ensure you update these configuration files with your specific settings before running the project. Adjusting the RAG parameters can significantly impact the performance and accuracy of the RAG system. Experimentation with different values may be necessary to find the optimal configuration for your specific use case and document set.
wiki-bot.pyThis step is optional since the content files of all scraped articles from Wikipedia are available in the https://huggingface.co/datasets/JetEngine/Art_Deco_USA_DS.
You can download this dataset and copy all text files from it into the rag_files directory. If you plan to use pre-calculated embeddings, which will be explained in the next section, you don't actually need to download this dataset.
There is no need to repeat the scraping process. You could skip reading rest of this section if you are not interested in data scraping process.
Our initial step involves gathering knowledge about Art-Deco architecture. We focus on U.S. structures, given their prominence in the Art-Deco movement. The wiki-bot.py script automates the collection of relevant Wikipedia articles, organizing them into a structured directory for ease of access.
Run the bot using:
python wiki-bot.py
When you run wiki-bot.py with an empty rag_files directory, it saves the contents of the scraped Wikipedia articles in a sub-folder named text under rag_files. The bot also creates various sub-folders to organize different types of data such as article URLs, references, etc. Since our current focus is only on the contents of the Wikipedia articles, to reduce clutter, we only transferred the contents from the text sub-folder to our HG dataset and removed all other sub-folders.
Thus, if you want to run the bot yourself which is optional since the scraped documents are already available in Hugging Face, you would need to either copy all files from the text sub-folder to the rag_files directory and then delete all sub-folders within rag_files, or simply change the rag_files_path in config.yaml to rag_files/text.
indexing.pyIndex the documents by running:
python indexing.py
This script processes the documents, generates embeddings, and stores them in PulseJet.
If you don't want to lose time for generating embeddings, you can download pre-calculated embeddings
from https://huggingface.co/JetEngine/rag_art_deco_embeddings
and set use_precalculated_embeddings: true in the configuration.
In our setup generation of embeddings takes around 15 minutes to complete and insertion of vectors to Pulsejet takes around 4 seconds.
The script outputs timing information for:
chat.pyEnsure your configuration is correct, then run:
python chat.py
This script queries different LLMs and the RAG system, outputting results in HTML, JSON, and CSV formats for comparison.
Pulsejet is used in this project for efficient vector storage and retrieval. Here's a detailed overview of how Pulsejet is integrated into our Art Deco ChatBot project:
Initializing the Pulsejet Client:
client = pj.PulsejetClient(location=config['pulsejet_location'])This creates a Pulsejet client. In our project, we're using a remote Pulsejet instance, so the location is set to "remote". This connects to a Pulsejet server running in a Docker container.
Creating a Collection:
client.create_collection(collection_name, vector_config)This creates a new collection in Pulsejet to store our document embeddings. The vector_config parameter specifies the configuration for the vector storage, such as the vector size and index type (e.g., HNSW for efficient similarity search).
Inserting Vectors: In our project, we use the following pattern for inserting vectors:
collection[0].insert_single(collection[1], embed, meta)This might look confusing at first, but here's what it means:
collection[0] is actually our Pulsejet client instance.collection[1] is the name of the collection we're inserting into.embed is the vector we're inserting.meta is additional metadata associated with the vector.This is equivalent to calling:
client.insert_single(collection_name, vector, meta)For bulk insertions, we use:
client.insert_multi(collection_name, embeds)This inserts multiple embeddings at once, which is more efficient for large datasets.
Searching Vectors:
results = client['db'].search_single(collection, query_embed, limit=5, filter=None)This performs a similarity search in the specified Pulsejet collection to find the most relevant documents for a given query vector. The limit parameter specifies the maximum number of results to return.
In our project, client['db'] is used to access the database methods of the Pulsejet client. This is equivalent to using the client directly:
results = client.search_single(collection_name, query_vector, limit=5, filter=None)Closing the Connection:
client.close()This closes the connection to the Pulsejet database when it's no longer needed.
The PulsejetRagClient class is defined in pulsejet_rag_client.py and provides a high-level interface for interacting with PulseJet in the context of our RAG system. Here's a breakdown of its key components:
Initialization:
class PulsejetRagClient:
def __init__(self, config):
self.config = config
self.collection_name = config['pulsejet_collection_name']
self.main_model = config['main_model']
self.embed_model = config['embed_model']
self.client = pj.PulsejetClient(location=config['pulsejet_location'])The client is initialized with configuration parameters, setting up the PulseJet client and storing relevant config values.
Creating a Collection:
def create_collection(self):
vector_size = get_vector_size(self.config['embed_model'])
vector_params = pj.VectorParams(size=vector_size, index_type=pj.IndexType.HNSW)
try:
self.client.create_collection(self.collection_name, vector_params)
logger.info(f"Created new collection: {self.collection_name}")
except Exception as e:
logger.info(f"Collection '{self.collection_name}' already exists or error occurred: {str(e)}")This method creates a new collection in PulseJet with the specified parameters. It uses the get_vector_size function to determine the appropriate vector size for the embeddings.
Inserting Vectors:
def insert_vector(self, vector, metadata=None):
try:
self.client.insert_single(self.collection_name, vector, metadata)
logger.debug(f"Inserted vector with metadata: {metadata}")
except Exception as e:
logger.error(f"Error inserting vector: {str(e)}")
def insert_vectors(self, vectors, metadatas=None):
try:
self.client.insert_multi(self.collection_name, vectors, metadatas)
logger.debug(f"Inserted {len(vectors)} vectors")
except Exception as e:
logger.error(f"Error inserting multiple vectors: {str(e)}")These methods handle the insertion of single and multiple vectors into the PulseJet collection, along with their associated metadata.
Searching Vectors:
def search_similar_vectors(self, query_vector, limit=5):
try:
results = self.client.search_single(self.collection_name, query_vector, limit=limit, filter=None)
return results
except Exception as e:
logger.error(f"Error searching for similar vectors: {str(e)}")
return []This method performs a similarity search in the PulseJet collection to find the most relevant documents for a given query vector.
Closing the Connection:
def close(self):
try:
self.client.close()
logger.info("Closed Pulsejet client connection")
except Exception as e:
logger.error(f"Error closing Pulsejet client connection: {str(e)}")This method closes the connection to the PulseJet database when it's no longer needed.
The PulsejetRagClient is used throughout the project to interact with PulseJet. Here's how it's typically instantiated and used:
Creation:
from pulsejet_rag_client import create_pulsejet_rag_client
config = get_config()
rag_client = create_pulsejet_rag_client(config)Indexing Documents:
In indexing.py, we use the client to create the collection and insert vectors:
rag_client.create_collection()
for file_name, file_embeddings in embeddings_data.items():
for chunk_id, content, embed in file_embeddings:
metadata = {"filename": file_name, "chunk_id": chunk_id, "content": content}
rag_client.insert_vector(embed, metadata)In rag.py, we use the client to search for similar vectors during the RAG process:
results = rag_client.search_similar_vectors(query_embed, limit=5)After operations are complete, we close the connection:
rag_client.close()This implementation provides a clean, encapsulated interface for all PulseJet operations in our RAG system.
LLama3.1 take longer than simple question answering due to the increased query length.The Art Deco ChatBot demonstrates how LLMs could be better utilized with RAG. Our project offers a comprehensive exploration of RAG implementation, covering every step from data scraping and document chunking to embedding creation and the integration of vector databases.
As the document base for a RAG system grows larger, the performance of insertion and search operations becomes increasingly critical. By learning how to integrate the Pulsejet vector database into a full-fledged RAG system, one can significantly benefit from its capabilities, particularly when dealing with RAG applications on large document bases.
Our RAG responses could have been more accurate. To enhance our Art Deco ChatBot's performance, we are considering several experimental approaches:
We plan to expand this project through the following initiatives:
We encourage you to experiment with the Art Deco ChatBot, modify its parameters, and adapt it to your own domains of interest.
Author: Güvenç USANMAZ