Retrieval Augmented Generation RAG Using Hugging Face Embeddings Download - Retrieval Augmented Generation RAG Using Hugging Face Embeddings Source code download

Retrieval Augmented Generation RAG Using Hugging Face Embeddings

Other source code

1.0.0

Download

Retrieval-Augmented Generation (RAG) Using Hugging Face Embeddings

This project demonstrates how to implement a Retrieval-Augmented Generation (RAG) pipeline using Hugging Face embeddings and ChromaDB for efficient semantic search. The solution reads, processes, and embeds textual data, enabling a user to perform accurate and fast queries on the data.

Features

Dataset Integration: Load and process datasets from Hugging Face.
Text Chunking: Split large text into manageable chunks for embedding.
Embeddings Generation: Utilize Hugging Face embeddings (BAAI/bge-base-en-v1.5) to convert text chunks into vector representations.
ChromaDB Storage: Store embeddings in ChromaDB for easy retrieval.
Semantic Search: Query the stored data for relevant text based on a provided prompt using semantic similarity.

Installation

Before running the notebook, ensure the necessary libraries are installed:

pip install chromadb
pip install llama-index

You also need to clone the required datasets from Hugging Face If u you just want to check it out and test the working :) :

git clone https://huggingface.co/datasets/NahedAbdelgaber/evaluating-student-writing
git clone https://huggingface.co/datasets/transformersbook/emotion-train-split

How It Works

Load Datasets:
- The notebook loads the "Evaluating Student Writing" dataset and splits the text into chunks for embedding.
Embedding Creation:
- Using the BAAI/bge-base-en-v1.5 model, text chunks are converted into vector embeddings. You can any model of your liking.
ChromaDB Integration:
- The generated embeddings, along with their corresponding text chunks, are stored in ChromaDB for persistence and later querying.
Semantic Search:
- A query function is provided to search the vector database using a given input query. The relevant chunks are returned based on similarity to the query.

Usage

To use the code, simply run the notebook after installing the dependencies and cloning the required datasets. The following command can be used to query the stored embeddings:

query_collection("Your search query here", n_results=1)

This will return the most relevant text chunk based on the provided query.

Example

query_collection(
  "Even though the planet is very similar to Earth, there are challenges to get accurate data because of the harsh conditions on the planet.", 
  n_results=1
)

Files

There are 2 files in here. The simple one just create a vector database of a single file and the advance one can work on multiple files with differnt extensions and create vector database of them and you can also test it out on a text-gen model.

Dependencies

ChromaDB
Hugging Face Embeddings
llama-index

Future Enhancements

Improve the chunking mechanism for more flexible handling of overlapping sentences.
Fine-tune the embedding model for more specific domain applications.
Add support for multiple datasets.

License

This repository is licensed under the MIT License.

Thanks for checking it out :)

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-06-01
size 113KB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
Retrieval based Voice Conversion WebUI

2024-11-01
Generation Zero Challenges CODEX

2022-11-02
Generation Zero – Alpine Unrest

2022-08-20

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All