rag career portfolio querying Download - rag career portfolio querying Source code download

rag career portfolio querying

Other source code

1.0.0

Download

Career Portfolio RAG Assistant

A Retrieval-Augmented Generation (RAG) system that enables natural language querying of career portfolio data stored in Notion databases. The system uses LlamaIndex and OpenAI's embedding/LLM services to provide intelligent responses about work experience, projects, and skills.

Overview

This system consists of two main components:

ETL Pipeline: Extracts data from Notion databases, processes it into embeddings, and stores them in a vector database
Query Interface: A Streamlit web application that enables natural language interaction with the portfolio data

Key Features

Intelligent natural language querying of portfolio data
Dual-index architecture (text and keywords) for improved retrieval
Real-time response streaming
Debug panel for transparency into the RAG process
Sample queries to demonstrate capabilities
Configurable through environment variables

Project Structure

Core Files

notion_data_etl.ipynb: Jupyter notebook for extracting and processing Notion data
- Handles authentication and database connections
- Processes documents into text and keyword nodes
- Creates vector indices for efficient retrieval
streamlit_app_rag.py: Main web application interface
- Implements the RAG Assistant UI
- Manages chat history and debug output
- Handles real-time response streaming
prompts.py: Contains system prompts for:
- Context setting for the LLM
- Keyword extraction

Key Components

NotionProcessor Class

A comprehensive data processing class that:

Extracts data from Notion databases
Handles nested content structures
Processes text and metadata
Supports multiple extraction modes (header, whole, granular)

RAGApp Class

The main application class that:

Manages the Streamlit interface
Handles chat interactions
Provides debugging capabilities
Maintains session state

Setup Requirements

Environment Variables

NOTION_TOKEN=your_notion_api_token
NOTION_PROJECTS_DATABASE_ID=notion_database_id_for_projects
NOTION_EXPERIENCE_DATABASE_ID=notion_database_id_for_experiences
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_api_key
OPENAI_API_KEY=your_openai_api_key