Medical_ChatBot
1.0.0
該項目構建了一個醫療聊天機器人,該聊天機器人從醫學PDF書籍中檢索信息,並利用蘭班進行處理和松果以進行有效的信息檢索。
from langchain_community . embeddings import HuggingFaceEmbeddings
from langchain_community . document_loaders import PyPDFLoader
from langchain . text_splitter import RecursiveCharacterTextSplitter
def create_knowledge_base ( pdf_path ):
# Load PDF text
loader = PyPDFLoader ( pdf_path , glob = "*.pdf" )
text_data = loader . load ()
# Text processing and chunking
text_splitter = RecursiveCharacterTextSplitter ( chunk_size = 500 , chunk_overlap = 20 )
text_chunks = text_splitter . split_documents ( text_data )
# Download Llama2 embeddings (or your chosen model)
embeddings = HuggingFaceEmbeddings ( model_name = "sentence-transformers/all-MiniLM-L6-v2" )
# Generate embeddings for each text chunk
# ... (code to generate embeddings for each chunk using embeddings object)
# Store text chunks and embeddings in a data structure (e.g., list of dictionaries)
knowledge_base = []
for i , chunk in enumerate ( text_chunks ):
chunk_embedding = embeddings . encode ( chunk ) # Generate embedding for the chunk
knowledge_base . append ({
"text" : chunk ,
"embedding" : chunk_embedding
})
return knowledge_base from langchain_pinecone import PineconeVectorStore
from pinecone . data . index import Index
from dotenv import load_dotenv
import os
def store_knowledge_base_in_pinecone ( knowledge_base ):
load_dotenv ()
PINECONE_API_KEY = os . getenv ( "PINECONE_API_KEY" )
PINECONE_ENV = os . getenv ( "PINECONE_ENV" )
PINECONE_INDEX_NAME = os . getenv ( "PINECONE_INDEX_NAME" )
# Connect to Pinecone
pc = pinecone . Pinecone ( api_key = PINECONE_API_KEY , environment = PINECONE_ENV )
index = pc . Index ( PINECONE_INDEX_NAME )
# Extract text and embeddings from knowledge base
text_data = [ kb [ "text" ] for kb in knowledge_base ]
embeddings = [ kb [ "embedding" ] for kb in knowledge_base ]
# Store embeddings in Pinecone
PineconeVectorStore . from_documents ( text_data , embeddings , index_name = PINECONE_INDEX_NAME )
print ( f"Knowledge base stored in Pinecone index: { PINECONE_INDEX_NAME } " ) # This section is a placeholder as the full chatbot development requires additional libraries
# like Rasa or Dialogflow. Here's a basic outline to illustrate the concept.
def chatbot_loop ():
while True :
user_query = input ( "Ask me a medical question (or type 'quit' to exit): " )
if user_query . lower () == "quit" :
break
# Process user query (similar to text processing in knowledge base creation)
processed_query = # (code to clean and process the user query)
# Generate embedding for the user query
query_embedding = embeddings . encode ( processed_query )
# Retrieve similar text snippets from Pinecone using query embedding
similar_results = retrieve_from_pinecone ( query_embedding )
# Extract and present relevant information to the user
if similar_results :
for result in similar_results :
print ( f"Relevant Information: { result [ 'text' ] } " )
else :
print ( "Sorry, I couldn't find any information related to your question." )該項目展示了Langchain和Pinecone在創建醫療聊天機器人方面的潛力,該聊天機器人提供了一種訪問和理解醫療信息的訪問和有效方法。請記住要適應和擴展此概念,以適應您的醫學PDF書籍和所需功能的特定需求。
要設置Langchain Pinecone Vector Store項目,請執行以下步驟:
克隆存儲庫:
git clone https://github.com/ < username > / < repository > .git
cd < repository >安裝依賴項:
pip install -r requirements.txt配置環境變量:
在根目錄中創建.env文件並指定以下變量:
PINECONE_API_KEY=<your_pinecone_api_key>
PINECONE_ENV=<pinecone_environment>
PINECONE_INDEX_NAME=<pinecone_index_name>
要將向量存儲在Pinecone Vector數據庫中,請執行以下命令:
python store_vectors.py鼓勵和讚賞對Langchain Pinecone Vector Store項目的貢獻!如果您有增強功能,錯誤修復或新功能的想法,請提交拉動請求。請務必遵循存儲庫中概述的貢獻指南。
該項目是根據MIT許可證的,允許不受限制的使用,分發和修改,但遵守許可協議中指定的條款和條件。