fastT5下載fastT5源代碼下載

fastT5

其他源碼

1.0.0

下載

安裝
用法
細節
功能
基準
- ONNX模型
- 量化ONNX模型
量化模型得分
進一步的改進
執照
得到幫助
致謝

T5模型可用於多個NLP任務，例如摘要，QA，QG，翻譯，文本生成等。連續文本生成自然慢，對於較大的T5模型，它的速度甚至更慢。 FASTT5通過在OnnxRuntime上運行T5模型推理速度更快。它還通過量化模型大小來降低模型的大小。

FASTT5庫允許您將驗證的T5模型轉換為ONNX，對其進行量化，並將模型作為輸出，該輸出在單個代碼中以OnnxRuntime運行。您還可以自定義整個過程。

安裝

您可以從PYPI安裝FASTT5：

 pip install fastt5

如果您想從來源構建：

 git clone https : // github . com / Ki6an / fastT5
cd fastT5
pip3 install - e .

用法

export_and_get_onnx_model()方法將給定的T5模型導出到ONNX，對其進行量化並在帶有默認設置的ONXRUNTIME上運行它。此方法返回的模型支持generate() hugingface的方法。

如果您不希望量化模型，則在方法中使用quantized=False 。

 from fastT5 import export_and_get_onnx_model
from transformers import AutoTokenizer

model_name = 't5-small'
model = export_and_get_onnx_model ( model_name )

tokenizer = AutoTokenizer . from_pretrained ( model_name )
t_input = "translate English to French: The universe is a dark forest."
token = tokenizer ( t_input , return_tensors = 'pt' )

tokens = model . generate ( input_ids = token [ 'input_ids' ],
               attention_mask = token [ 'attention_mask' ],
               num_beams = 2 )

output = tokenizer . decode ( tokens . squeeze (), skip_special_tokens = True )
print ( output )

運行已經導出的模型使用get_onnx_model()

您可以按照以下代碼示例自定義整個管道：

 from fastT5 import ( OnnxT5 , get_onnx_runtime_sessions ,
                    generate_onnx_representation , quantize )
from transformers import AutoTokenizer

model_or_model_path = 't5-small'

# Step 1. convert huggingfaces t5 model to onnx
onnx_model_paths = generate_onnx_representation ( model_or_model_path )

# Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
quant_model_paths = quantize ( onnx_model_paths )

# step 3. setup onnx runtime
model_sessions = get_onnx_runtime_sessions ( quant_model_paths )

# step 4. get the onnx model
model = OnnxT5 ( model_or_model_path , model_sessions )

                      ...

自定義輸出路徑

默認情況下，FastT5在當前目錄中創建models文件夾，並存儲所有模型。您可以為文件夾提供自定義路徑以存儲導出的型號。並且要運行已存儲在自定義文件夾路徑中的已經exported models ：使用get_onnx_model(onnx_models_path="/path/to/custom/folder/")

 from fastT5 import export_and_get_onnx_model , get_onnx_model

model_name = "t5-small"
custom_output_path = "/path/to/custom/folder/"

# 1. stores models to custom_output_path
model = export_and_get_onnx_model ( model_name , custom_output_path )

# 2. run already exported models that are stored in custom path
# model = get_onnx_model(model_name, custom_output_path)

細節

T5是seq2seq模型（編碼器），因為它反複使用解碼器進行推理，我們無法將整個模型直接導出到ONNX。我們需要單獨導出編碼器和解碼器。

past_key_values包含可用於加快順序解碼加快速度的預計的隱藏園（自我注意區塊和跨注意區塊中的鍵和值）。

模型只能用恆定數量的輸入導出。與此相反，第一步的解碼器並未佔據past_key_values ，其餘的步驟解碼器也不會。為了解決這個問題，我們可以創建兩個解碼器：一個用於第一步，不使用past_key_values ，另一個是使用past_key_values的其餘步驟。

接下來，我們將導出所有三個模型（編碼器，解碼器，init_decoder）。然後對它們進行量化，將32bit定量為8bit應給出4倍的內存降低。由於有一個額外的解碼器，該模型尺寸將減少3倍。

最後，我們將在ONNX運行時運行量化的模型。

推斷很簡單，因為該模型支持generate() hugingface的方法。

功能

輕鬆將任何預估計的T5模型導出到ONNX（使用past_key_values ）。
導出的模型支持Beam搜索和貪婪搜索，以及更多通過generate()方法。
使用量化將模型大小降低3X 。
與Pytorch執行貪婪的搜索和3-4X光束搜索相比，最多可達5X速度。

基準

基準是根據英語對法語翻譯測試的T5基準模型的結果。

ONNX模型

以下圖顯示了量化的ONNX模型的延遲與波束數量從1到9不等的Pytorch模型的延遲。此處顯示的潛伏期是序列長度的平均值，最高為130。

以下熱圖顯示了X倍更快的速度，而Pytorch與ONNX模型的潛伏期比率更快。 ONX模型的表現大多數情況都優於大多數情況。但是，模型的速度降低了較長的序列長度。

量化ONNX模型

量化模型是前面提到的輕巧模型，這些模型的精度幾乎與原始模型相同（在下一部分中提到了量化的模型得分）。與OnNX和Pytorch模型相比，量化的ONX模型的延遲最低。

該模型的表現平均比Pytorch模型以5.7倍的速度和3-4倍的搜索量優於Pytorch模型。

注意：結果是在AMD EPYC 7B12上生成的，這些結果可能因設備而異。 ONX型號通常在具有更多核心的高端CPU上表現良好。

量化模型得分

結果測試了英文對法語翻譯的測試，梁搜索號為3。

	bleu_4	流星	rouge_l
T5-SMALL（QUANT）	0.240769	0.282342	0.468817
T5-Mall（Pytorch）	0.254601	0.295172	0.492749
T5鹼（量子）	0.267606	0.306019	0.499188
T5鹼（Pytorch）	0.268346	0.304969	0.503306
T5-LARGE（量子）	0.286726	0.316845	0.503585
t5-large（pytorch）	0.294015	0.315774	0.508677

私人擁抱面型集線器模型

HuggingFace模型中心支持私有模型。要使用fastt5使用私有的，預先訓練的T5版本，您必須首先已通過$ transformers-cli login到HuggingFace生態系統中進行身份驗證。然後，使用FastT5時，會有額外的導入並致電：

 from fastT5 import (
    OnnxT5 ,
    get_onnx_runtime_sessions ,
    generate_onnx_representation ,
    quantize ,
    set_auth_token )
from transformers import AutoTokenizer

set_auth_token ( True )
# the rest of the code is the same as using a public model

如果您無法調用$ transformers-cli login或更喜歡使用API鍵，請在https://huggingface.co/settings/token（或https:/https：//huggingface.co/organizations/org_name/settings/token）中找到該組織，則可以通過該字符串到set_auth_token 。避免通過設置環境變量HF_API_KEY=<redacted>將API鍵進行對代碼進行硬編碼，然後在代碼中：

 import os

from fastT5 import (
    OnnxT5 ,
    get_onnx_runtime_sessions ,
    generate_onnx_representation ,
    quantize ,
    set_auth_token )
from transformers import AutoTokenizer

auth_token = os . environ . get ( "HF_API_KEY" )
set_auth_token ( auth_token )

# code proceeds as normal

進一步的改進

當前，FASTT5庫僅支持CPU版本的ONXRUNTIME，GPU實現仍需要完成。
ONNX模型的圖形優化將進一步減少延遲。

得到幫助

通過[email protected]與我聯繫
如果適當，請在Github上打開問題

致謝

原始T5紙
擁抱面的變壓器
onnx
Microsoft的OnnxRuntime
Onnxt5

 @ article { 2019 t5 ,
  author = { Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J . Liu },
  title = { Exploring the Limits of Transfer Learning with a Unified Text - to - Text Transformer },
  journal = { arXiv e - prints },
  year = { 2019 },
  archivePrefix = { arXiv },
  eprint = { 1910.10683 },
}

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2025-04-17
大小 203.38KB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部