
Raptor引入了一种新颖的方法,通过从文档中构造递归树结构来检索启动语言模型。这允许在大型文本中进行更有效和上下文感知的信息检索,从而解决了传统语言模型中的共同局限性。
有关详细的方法和实施,请参阅原始论文:
在使用Raptor之前,请确保安装Python 3.8+。克隆猛禽存储库并安装必要的依赖项:
git clone https://github.com/parthsarthi03/raptor.git
cd raptor
pip install -r requirements.txt要开始使用Raptor,请按照以下步骤:
首先,设置OpenAI API密钥并初始化猛禽配置:
import os
os . environ [ "OPENAI_API_KEY" ] = "your-openai-api-key"
from raptor import RetrievalAugmentation
# Initialize with default configuration. For advanced configurations, check the documentation. [WIP]
RA = RetrievalAugmentation ()将您的文本文档添加到Raptor进行索引:
with open ( 'sample.txt' , 'r' ) as file :
text = file . read ()
RA . add_documents ( text )您现在可以使用猛禽根据索引文档回答问题:
question = "How did Cinderella reach her happy ending?"
answer = RA . answer_question ( question = question )
print ( "Answer: " , answer )将构造的树保存到指定的路径:
SAVE_PATH = "demo/cinderella"
RA . save ( SAVE_PATH )将保存的树加载回猛禽:
RA = RetrievalAugmentation ( tree = SAVE_PATH )
answer = RA . answer_question ( question = question )Raptor旨在灵活,并允许您整合任何汇总,提问(QA)和嵌入生成的模型。这是如何用自己的模型扩展猛禽的方法:
如果您希望使用其他语言模型进行摘要,则可以通过扩展BaseSummarizationModel类来做到这一点。实施summarize方法以集成您的自定义摘要逻辑:
from raptor import BaseSummarizationModel
class CustomSummarizationModel ( BaseSummarizationModel ):
def __init__ ( self ):
# Initialize your model here
pass
def summarize ( self , context , max_tokens = 150 ):
# Implement your summarization logic here
# Return the summary as a string
summary = "Your summary here"
return summary 对于自定义QA模型,扩展BaseQAModel类并实现answer_question方法。此方法应返回您的模型找到的最佳答案,并给定一个问题:
from raptor import BaseQAModel
class CustomQAModel ( BaseQAModel ):
def __init__ ( self ):
# Initialize your model here
pass
def answer_question ( self , context , question ):
# Implement your QA logic here
# Return the answer as a string
answer = "Your answer here"
return answer 要使用不同的嵌入模型,请扩展BaseEmbeddingModel模型类。实现create_embedding方法,该方法应返回输入文本的向量表示:
from raptor import BaseEmbeddingModel
class CustomEmbeddingModel ( BaseEmbeddingModel ):
def __init__ ( self ):
# Initialize your model here
pass
def create_embedding ( self , text ):
# Implement your embedding logic here
# Return the embedding as a numpy array or a list of floats
embedding = [ 0.0 ] * embedding_dim # Replace with actual embedding logic
return embedding 实施自定义模型后,将它们与猛禽集成如下:
from raptor import RetrievalAugmentation , RetrievalAugmentationConfig
# Initialize your custom models
custom_summarizer = CustomSummarizationModel ()
custom_qa = CustomQAModel ()
custom_embedding = CustomEmbeddingModel ()
# Create a config with your custom models
custom_config = RetrievalAugmentationConfig (
summarization_model = custom_summarizer ,
qa_model = custom_qa ,
embedding_model = custom_embedding
)
# Initialize RAPTOR with your custom config
RA = RetrievalAugmentation ( config = custom_config )请查看demo.ipynb ,以获取有关如何指定自己的摘要/QA模型的示例,例如Llama/Mistral/Gemma,以及嵌入Sbert等模型,例如与Raptor一起使用。
注意:更多示例和配置猛禽的方法即将到来。文档和存储库更新中将提供高级用法和其他功能。
Raptor是一个开源项目,欢迎捐款。无论您是修复错误,添加新功能还是改进文档,您的帮助都将受到赞赏。
猛禽根据麻省理工学院许可发布。有关详细信息,请参见存储库中的许可证文件。
如果猛禽协助您的研究,请用如下来引用:
@inproceedings { sarthi2024raptor ,
title = { RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval } ,
author = { Sarthi, Parth and Abdullah, Salman and Tuli, Aditi and Khanna, Shubh and Goldie, Anna and Manning, Christopher D. } ,
booktitle = { International Conference on Learning Representations (ICLR) } ,
year = { 2024 }
}请继续关注更多示例,配置指南和更新。