Multi Modal using RAG Download - Multi Modal using RAG Source code download

Multi Modal using RAG

Other source code

1.0.0

Download

Conversational Agent with LangChain, OpenAI API, and RAG Concept

Project Overview

This project is a conversational agent that leverages LangChain, OpenAI API, and the RAG (Retrieval-Augmented Generation) concept. The agent is designed to read lengthy PDF documents, extract various components such as text, images, and tables, and store them in a vector database for efficient retrieval during conversations with users.

Features

PDF Processing: The agent is capable of parsing and extracting information from long PDF documents.
Multi-Modal Extraction: Extracts text, images, and tables from PDFs for a comprehensive understanding.
Vector Database: Utilizes a vector database to store and retrieve information efficiently.
Conversational AI: Implements the RAG concept to enhance conversational interactions with users.

Multi-Modal RAG

We will use Unstructured to parse images, text, and tables from documents (PDFs).
We will use the multi-vector retriever with Chroma to store raw text and images along with their summaries for retrieval.
We will use GPT-4V for both image summarization (for retrieval) as well as final answer synthesis from join review of images and texts (or tables).

Dependencies

LangChain <- Visit here to understand langchain installation
OpenAI API <- Instructions for setting up and using OpenAI API.
Chroma DB <- Instructions for setting up and using the vector database.

Usage

Provide path to the source pdf
Change the prompt_text according to your needs.
Replace your questions in the query line.
The agent will use the stored information for intelligent responses.

Considerations

Retrieval

Retrieval is performed based upon similarity to image summaries as well as text chunks. This requires some careful consideration because image retrieval can fail if there are competing text chunks. To mitigate this, I produce larger (4k token) text chunks and summarize them for retrieval.
Image Size

The quality of answer synthesis appears to be sensitive to image size, as expected. I'll do evals soon to test this more carefully.