Stanford_LLM_Tutor下載 - Stanford_LLM

Stanford_LLM_Tutor

其他源碼

1.0.0

下載

斯坦福大學LLM導師| AI機器人

項目概述

該存儲庫包含通過擁抱臉部使用變壓器模型（ gpt2 ）構建的AI機器人的實現。聊天機器人利用faiss進行矢量數據庫存儲，以有效地將用戶查詢與相關數據匹配。用於培訓和響應生成的數據是從官方的Stanford LLM課程中刪除的。

特徵

語言模型：通過擁抱面孔利用gpt2模型。
向量數據庫：實現FAISS以有效存儲和檢索密鑰。
數據來源：從斯坦福大學LLM課程的各種講座中刮除。
內容類型：處理各種內容類型，包括段落，表，方程式，鏈接，有序列表和無序列表。
查詢匹配：使用FAISS將用戶查詢匹配到前2個最接近的密鑰，並在檢索到的數據中構造提示。

它如何工作

數據刮擦：數據是從斯坦福大學LLM課程的各種講座中刪除的。 h2 ， h3和<strong>標籤用作密鑰，相應的內容分為段落，表，鏈接，方程式，有序列表和無序列表。
向量數據庫（FAISS） ：使用L2距離將鍵存儲在FAISS矢量數據庫中，以有效檢索。當收到用戶查詢時，Faiss會根據向量相似性找到最接近的匹配鍵。
提示生成：聊天機器人使用FAISS檢索的數據構建一個結構化提示。該提示包括段落，表，方程式，鏈接，有序列表和與匹配密鑰相關的無序列表。
響應生成：構造的提示被饋入GPT-2模型，以生成對用戶查詢的連貫和相關的響應。

數據模式

從斯坦福大學LLM課程講座中刮除的數據具有以下模式：

 key1:{
  {
      'paragraphs': [],
      'tables': [],
      'links': [],
      'equations': [],
      'ordered_lists': [],
      'unordered_lists': []
  } }
key2:{
  {
      'paragraphs': [],
      'tables': [],
      'links': [],
      'equations': [],
      'ordered_lists': [],
      'unordered_lists': []
  } }

每個鍵對應於講座頁面的h2 ， h3或<strong>標籤。與每個密鑰關聯的數據包括段落，表，鏈接，方程式，有序列表以及如果存在的無序列表。

示例用法

用戶查詢：“有什麼好處和危害？”
FAISS檢索：使用L2距離將查詢與矢量數據庫中最接近的鍵匹配。

及時施工：

 # Create a structured prompt
prompt = f"**Question:** {query}nn"

# Add top 2 matched sections
prompt += f"**Sections:**n- {result_key1}n- {result_key2}nn"

# Add content to the prompt
for result_key, result_content in [(result_key1, result_content1), (result_key2, result_content2)]:
    if result_content.get('paragraphs'):
        prompt += "**Paragraphs:**n" + "n".join(result_content['paragraphs']) + "nn"
    if result_content.get('ordered_lists'):
        prompt += "**Ordered Lists:**n" + "n".join(["n".join(ol) for ol in result_content['ordered_lists']]) + "nn"
    if result_content.get('unordered_lists'):
        prompt += "**Unordered Lists:**n" + "n".join(["n".join(ul) for ul in result_content['unordered_lists']]) + "nn"
    if result_content.get('tables'):
        prompt += "**Tables:**n" + "n".join(["n".join(table) for table in result_content['tables']]) + "nn"
    if result_content.get('links'):
        prompt += "**Links:**n" + "n".join(result_content['links']) + "nn"
    if result_content.get('equations'):
        prompt += "**Equations:**n" + "n".join(result_content['equations']) + "nn"

# Add a closing statement
prompt += "Answer is :"

# Define max_length
max_length = min(len(prompt) + 100, 750)

# Generate response
response = generator(prompt[:750], max_length=max_length, num_return_sequences=1, truncation=True, pad_token_id=50256)