ClassGPT
ChatGPT for my lecture slides

Built with Streamlit, powered by LlamaIndex and LangChain.
Uses the latest ChatGPT API from OpenAI.
Inspired by AthensGPT
App Demo
demo.mp4
How this works
- Parses pdf with pypdf
- Index Construction with LlamaIndex's
GPTSimpleVectorIndex
- the
text-embedding-ada-002 model is used to create embeddings
- see vector store index page to learn more
- here's a sample index
- indexes and files are stored on s3
- Query the index
- uses the latest ChatGPT model
gpt-3.5-turbo
Usage
Configuration and secrets
- configure aws (quickstart)
-
create an s3 bucket with a unique name
-
Change the bucket name in the codebase (look for bucket_name = "classgpt" to whatever you created.
-
rename [.env.local.example] to .env and add your openai credentials
Locally
- create python env
conda create -n classgpt python=3.9
conda activate classgpt
- install dependencies
pip install -r requirements.txt
- run streamlit app
cd app/
streamlit run app/01_❓_Ask.py
Docker
Alternative, you can use Docker
Then open up a new tab and navigate to http://localhost:8501/
TODO
FAQ
Tokens
Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:
- 1 token ~= 4 chars in English
- 1 token ~= ¾ words
- 100 tokens ~= 75 words
- 1-2 sentence ~= 30 tokens
- 1 paragraph ~= 100 tokens
- 1,500 words ~= 2048 tokens
Try the OpenAI Tokenizer tool
Source
Embeddings
An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
For text-embedding-ada-002, cost is $0.0004 / 1k tokens or 3000 pages/dollar
- Embeddings - OpenAI API
- What Are Word and Sentence Embeddings?
Models
For gpt-3.5-turbo model (ChatGPTAPI) cost is $0.002 / 1K tokens
For text-davinci-003 model, cost is $0.02 / 1K tokens
- Chat completion - OpenAI API
References
Streamlit
- Increase upload limit of st.file_uploader
- st.cache_resource - Streamlit Docs
- Session State
- hayabhay/whisper-ui: Streamlit UI for OpenAI's Whisper
Deplyoment
- Streamlit Deployment Guide (wiki) - Deployment - Streamlit
- How to Deploy a streamlit application to AWS? Part-3
LlamaIndex
- LlamaIndex Usage Pattern
- Saving index
Loading data
- PDF Loader
- llama-hub github repo
- document class
- PDFReader class
multimodal
- llama_index/Multimodal.ipynb at main
ChatGPT
- gpt_index/SimpleIndexDemo-ChatGPT.ipynb
Langchain
- gpt_index/LangchainDemo.ipynb
- OpenAIChat
Boto3
- boto3 file_upload does it check if file exists
- Boto 3: Resource vs Client
- Writing json to file in s3 bucket
Docker stuff
- amazon web services - What is the best way to pass AWS credentials to a Docker container?
- docker-compose up failing due to: error: can't find Rust compiler · Issue #572 · acheong08/ChatGPT
- linux - When installing Rust toolchain in Docker, Bash
source command doesn't work
- software installation - How to install a package with apt without the "Do you want to continue [Y/n]?" prompt? - Ask Ubuntu
- How to use sudo inside a docker container?