knowledge gpt
1.0.0

InswardingGPT旨在從各種來源(包括Internet和本地數據)收集信息,這些信息可用於創建提示。然後,可以通過OpenAI的GPT-3模型來利用這些提示來生成隨後存儲在數據庫中的答案,以供將來參考。
為了實現這一目標,首先使用開源模型或OpenAI模型將文本轉換為固定尺寸的向量。提交查詢時,文本也將轉換為向量並將其與存儲的知識嵌入進行比較。然後選擇最相關的信息並用於生成及時上下文。
知識依據支持各種信息來源,包括網站,PDF,PowerPoint文件(PPTX)和文檔(文檔)。此外,它可以從YouTube字幕和音頻(使用語音到文本技術)中提取文本,並將其用作信息來源。這允許收集和用於生成提示和答案的各種信息。
PYPI安裝,以終端運行: pip install knowledgegpt
或者,您可以使用來自存儲庫的最新版本: pip install -r requirements.txt ,然後pip install .
下載用於解析的需要的語言模型: python3 -m spacy download en_core_web_sm
uvicorn server:app --reload
# Import the library
from knowledgegpt . extractors . web_scrape_extractor import WebScrapeExtractor
# Import OpenAI and Set the API Key
import openai
from example_config import SECRET_KEY
openai . api_key = SECRET_KEY
# Define target website
url = "https://en.wikipedia.org/wiki/Bombard_(weapon)"
# Initialize the WebScrapeExtractor
scrape_website = WebScrapeExtractor ( url = url , embedding_extractor = "hf" , model_lang = "en" )
# Prompt the OpenAI Model
answer , prompt , messages = scrape_website . extract ( query = "What is a bombard?" , max_tokens = 300 , to_save = True , mongo_client = db )
# See the answer
print ( answer )
# Output: 'A bombard is a type of large cannon used during the 14th to 15th centuries.'其他示例可以在示例文件夾中找到。但是,為了更好地了解如何使用庫,這是一個簡單的例子:
# Basic Usage
basic_extractor = BaseExtractor ( df )
answer , prompt , messages = basic_extractor . extract ( "What is the title of this PDF?" , max_tokens = 300 ) # PDF Extraction
pdf_extractor = PDFExtractor ( pdf_file_path , extraction_type = "page" , embedding_extractor = "hf" , model_lang = "en" )
answer , prompt , messages = pdf_extractor . extract ( query , max_tokens = 1500 ) # PPTX Extraction
ppt_extractor = PowerpointExtractor ( file_path = ppt_file_path , embedding_extractor = "hf" , model_lang = "en" )
answer , prompt , messages = ppt_extractor . extract ( query , max_tokens = 500 ) # DOCX Extraction
docs_extractor = DocsExtractor ( file_path = "../example.docx" , embedding_extractor = "hf" , model_lang = "en" , is_turbo = False )
answer , prompt , messages =
docs_extractor . extract ( query = "What is an object detection system?" , max_tokens = 300 ) # Extraction from Youtube video (audio)
scrape_yt_audio = YoutubeAudioExtractor ( video_id = url , model_lang = 'tr' , embedding_extractor = 'hf' )
answer , prompt , messages = scrape_yt_audio . extract ( query = query , max_tokens = 1200 )
# Extraction from Youtube video (transcript)
scrape_yt_subs = YTSubsExtractor ( video_id = url , embedding_extractor = 'hf' , model_lang = 'en' )
answer , prompt , messages = scrape_yt_subs . extract ( query = query , max_tokens = 1200 )docker build -t knowledgegptimage .
docker run -p 8888:8888 knowledgegptimage(要擴展...)
(用更好的圖像更新)