CUDAQuest Semantic Crawl to Answer Engine Download - CUDAQuest Semantic Crawl to Answer Engine Source code download

CUDAQuest Semantic Crawl to Answer Engine

Other source code

1.0.0

Download

CUDA Documentation QA System

This project implements a Question Answering (QA) system for CUDA documentation. It crawls the NVIDIA CUDA documentation, processes the data, stores it in a vector database, and uses advanced retrieval techniques to answer user queries.

Features

Web crawling of NVIDIA CUDA documentation
Advanced data chunking based on semantic similarity
Vector embedding creation and storage in Milvus database
Query expansion for improved retrieval
Hybrid retrieval combining BM25 and BERT-based methods
Question answering using a Language Model

Setup Instructions

Prerequisites

Python 3.7+
pip (Python package installer)

Installation

Clone the repository:
Create a virtual environment (optional but recommended):
Install the required dependencies:

Dependencies

The main dependencies for this project are:

scrapy: For web crawling
sentence-transformers: For text embeddings
nltk: For natural language processing tasks
rank_bm25: For BM25 retrieval
torch and transformers: For working with transformer models
streamlit: For creating web applications
selenium and webdriver_manager: For web scraping
pymilvus: For interacting with the Milvus vector database

For a complete list of dependencies, refer to the requirements.txt file.

Running the System

Ensure that you have a Milvus server running. Refer to the Milvus documentation for installation and setup instructions.
Run the main script: 3. The system will start by crawling the CUDA documentation, processing the data, and storing it in the Milvus database. This initial setup may take some time.
Once the setup is complete, you can start asking questions about CUDA. The system will provide answers based on the retrieved information.
To exit the system, type 'quit' when prompted for a question.

Project Structure

main.py: The main script that orchestrates the entire process.
crawler/web_crawler.py: Contains the web crawling logic.
data_processing/chunking.py: Implements advanced data chunking techniques.
data_processing/embedding.py: Handles the creation of vector embeddings.
vector_db/milvus_db.py: Manages interactions with the Milvus database.
retrieval/query_expansion.py: Implements query expansion techniques.
retrieval/hybrid_retrieval.py: Contains the hybrid retrieval logic.
qa/llm_qa.py: Manages the question answering process using a language model.

Customization

You can adjust the embedding model by modifying the SentenceTransformer model in main.py.
The depth of web crawling can be adjusted in the crawl_data function (currently set to 5 levels).
The number of retrieved chunks for answering can be modified by changing the top_k parameter in the retrieve method call.

Troubleshooting

If you encounter any issues:

Ensure all dependencies are correctly installed.
Check that the Milvus server is running and accessible.
Verify that you have a stable internet connection for web crawling and model downloads.

For any persistent problems, please open an issue in the GitHub repository.

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-05-31
size 8.78KB
From Github

Related Applications

FNF JS Engine

2024-11-10
Project Crawl

2023-06-15
PHPnow template engine

2013-10-31
DataLife Engine

2011-05-16
XOOPS Engine

2011-05-05
Xmark Template Engine

2010-06-25

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All