byaldi下載 - byaldi源代碼下載

byaldi

其他源碼

0.0.5

下載

歡迎來到Byaldi

你可知道？在電影Ragatouille中，雷米（Remy）製作的這道菜實際上不是比塔圖爾（Remy），而是該菜的精緻版本，稱為“ confit byaldi”。

Byaldi徽標是一隻開朗的老鼠，使用放大鏡來查看複雜的文檔。它在大鼠周圍的一個圓圈中間說“旁政”。

配x這是Byaldi的預發行版。請報告您遇到的任何問題，可能會有很多怪癖可以解決！

Byaldi是Ragatouille的迷你姐妹項目。這是Colpali存儲庫圍繞的簡單包裝器，可以使使用熟悉的API等後期交流多模型型號易於使用。

入門

首先，警告：這是使用未壓縮的索引和缺乏其他類型的改進的預發行庫。

當前，我們支持基礎COLPALI引擎（包括新的和Better Colqwen2檢查點）支持的所有模型，例如vidore/colqwen2-v1.0 。從廣義上講，目的是使Byaldi支持所有Colvlm模型。

將來的更新將支持其他後端。由於Byaldi的存在是為了促進採用多模式獵犬的採用，我們還打算增加對Visrag等模型的支持。

最終，我們將添加一個HNSW索引機制，匯總，誰知道，也許是2位量化？

隨著多模式生態系統的進一步發展，它將被更新！

先決條件

流行音樂

為了將PDF轉換為具有友好許可證的圖像，我們使用pdf2image庫。該庫需要在系統上安裝poppler 。通過按照其網站上的說明，Poppler非常容易安裝。 TL; DR是：

Macos與自製

brew install poppler

Debian/Ubuntu

sudo apt-get install -y poppler-utils

閃存注意力

Gemma使用了最近的Flash注意力。為了使事情盡可能順利地運行，我們建議您在安裝庫後安裝它：

pip install --upgrade byaldi
pip install flash-attn

硬體

Colpali使用數十億個參數模型來編碼文檔。我們建議使用GPU進行平穩操作，儘管弱/較舊的GPU完全可以！編碼您的收藏將遭受CPU或國會議員的性能不佳。

使用`byaldi`

Byaldi在很大程度上以Ragatouille的形式建模，這意味著所有內容旨在採用最少的代碼行，因此您可以很快地建立在其頂部，而不是花時間弄清楚如何創建檢索管道。

加載模型

用byaldi加載模型非常簡單：

 from byaldi import RAGMultiModalModel
# Optionally, you can specify an `index_root`, which is where it'll save the index. It defaults to ".byaldi/".
RAG = RAGMultiModalModel . from_pretrained ( "vidore/colqwen2-v1.0" )

如果您已經有一個索引，並希望將其與查詢所需的模型一起加載，則可以輕鬆地這樣做：

 from byaldi import RAGMultiModalModel
# Optionally, you can specify an `index_root`, which is where it'll look for the index. It defaults to ".byaldi/".
RAG = RAGMultiModalModel . from_index ( "your_index_name" )

創建索引

用byaldi創建索引是簡單而靈活的。您可以索引一個PDF文件，單個圖像文件或包含其中多個的目錄。這是創建索引的方法：

 from byaldi import RAGMultiModalModel
# Optionally, you can specify an `index_root`, which is where it'll save the index. It defaults to ".byaldi/".
RAG = RAGMultiModalModel . from_pretrained ( "vidore/colqwen2-v1.0" )
RAG . index (
    input_path = "docs/" , # The path to your documents
    index_name = index_name , # The name you want to give to your index. It'll be saved at `index_root/index_name/`.
    store_collection_with_index = False , # Whether the index should store the base64 encoded documents.
    doc_ids = [ 0 , 1 , 2 ], # Optionally, you can specify a list of document IDs. They must be integers and match the number of documents you're passing. Otherwise, doc_ids will be automatically created.
    metadata = [{ "author" : "John Doe" , "date" : "2021-01-01" }], # Optionally, you can specify a list of metadata for each document. They must be a list of dictionaries, with the same length as the number of documents you're passing.
    overwrite = True # Whether to overwrite an index if it already exists. If False, it'll return None and do nothing if `index_root/index_name` exists.
)

就是這樣！該模型將開始旋轉並創建您的索引，在完成後將所有必要的信息導出到磁盤。然後，您可以使用上面介紹的RAGMultiModalModel.from_index("your_index_name")方法在需要時加載它（您無需在創建它後立即執行此操作 - 它已經將其加載到內存中並準備就緒！）。

您必須在此處做出的主要決定是您是否要將store_collection_with_index設置為true。如果設置為true，它將大大簡化您的工作流程：將作為查詢結果的一部分返回基本64編碼的相關文檔版本，因此您可以立即將其輸送到LLM。但是，它為您的索引增加了大量的內存和存儲要求，因此，如果您對這些資源的簡短，則可能需要將其設置為False（默認設置），並在需要時自己創建Base64編碼版本。

搜尋

創建或加載索引後，您可以開始搜索相關文檔。同樣，這是一個非常簡單的命令：

 results = RAG . search ( query , k = 3 )

結果將是Result對象的列表，您也可以將其視為普通詞典。每個結果都將以這種格式：

[
    {
        "doc_id" : 0 ,
        "page_num" : 10 ,
        "score" : 12.875 ,
        "metadata" : {},
        "base64" : None
    },
    ...
]

page_num是1個索引，而doc_ids為0索引。這是為了使使用其他PDF操縱工具更簡單，其中第一頁通常是第1頁。圖像和單頁PDF的page_num始終為1，它僅對較長的PDF有用。

如果您通過元數據或用標誌編碼以存儲base64版本，則將填充這些字段。結果按分數排序，因此列表中的項目0將永遠是最相關的文檔，等等。

將文檔添加到現有索引

由於索引是內存的，因此它們是補充友好的！如果您需要攝入一些新的PDF，只需使用from_index加載索引，然後將add_to_index調用，與原始index()方法：

 RAG . add_to_index ( "path_to_new_docs" ,
        store_collection_with_index : bool = False ,
        ...
    )

展開

附加信息

版本 0.0.5
類型其他源碼
更新時間 2025-04-18
大小 5.2MB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部

byaldi