TIGERScore下載TIGERScore源代碼下載

老虎

此存儲庫包含TMLR 2024論文的代碼，數據和模型“ Tigerscore：為所有文本生成任務構建可解釋的指標”

查看我們的[項目頁面]以獲取更多結果和分析！

Tigerscore-yi-6b

其他資源
？ Tigerscore收藏
？擁抱表演

安裝

要直接使用Tigerscore管道，您首先需要將其安裝為Python軟件包。

pip install git+https://github.com/TIGER-AI-Lab/TIGERScore.git

請檢查您的本地機器是否為torch.cuda.is_available()是True 。

此外，要在此處詳細使用的VLLM使用TigerScore，您需要在VLLM文檔之後進行人工安裝VLLM。

如果您的CUDA為12.1

pip install vllm
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

如果您的CUDA為11.8

 # Replace `cp39` with your Python version (e.g., `cp38`, `cp39`, `cp311`).
pip install https://github.com/vllm-project/vllm/releases/download/v0.2.2/vllm-0.2.2+cu118-cp39-cp39-manylinux1_x86_64.whl
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

如果要使用培訓腳本，請通過運行以下命令來安裝依賴項：

pip install -r requirements.txt

用法

基本用法

安裝後，您可以使用以下Exmaple Python代碼為文本世代進行評分（有關更多用例，請參見tigerscore_example_usage.ipynb 。

 # gpu device setup
import os
os . environ [ "CUDA_VISIBLE_DEVICES" ] = "0"
# example  
instruction = "Write an apology letter."
input_context = "Reason: You canceled a plan at the last minute due to illness."
hypo_output = "Hey [Recipient], n n I'm really sorry for ditching our plan. I suddenly got an opportunity for a vacation so I took it. I know this might have messed up your plans and I regret that. n n Despite being under the weather, I would rather go for an adventure. I hope you can understand my perspective and I hope this incident doesn't change anything between us. n n We can reschedule our plan for another time. Sorry again for the trouble. n n Peace out, n [Your Name] n n ---"

# Load and evaluate examples in all options in 3 lines of code
from tigerscore import TIGERScorer
scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" ) # on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B", quantized=True) # 4 bit quantization on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B", use_vllm=True) # VLLM on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B-GGUF", use_llamacpp=True) # 4 bit quantization on CPU
results = scorer . score ([ instruction ], [ hypo_output ], [ input_context ])

# print the results, which is a list of json output containging the automatically parsed results!
print ( results )

結果是由結構化誤差分析組成的DICT列表。

[
    {
        "num_errors" : 3 ,
        "score" : -12.0 ,
        "errors" : {
            "error_0" : {
                "location" : " " I'm really glad for ditching our plan. " " ,
                "aspect" : " Inappropriate language or tone " ,
                "explanation" : " The phrase " ditching our plan " is informal and disrespectful. It should be replaced with a more respectful and apologetic phrase like " cancelling our plan " . " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            },
            "error_1" : {
                "location" : " " I suddenly got an opportunity for a vacation so I took it. " " ,
                "aspect" : " Lack of apology or remorse " ,
                "explanation" : " This sentence shows no remorse for cancelling the plan at the last minute. It should be replaced with a sentence that expresses regret for the inconvenience caused. " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            },
            "error_2" : {
                "location" : " " I would rather go for an adventure. " " ,
                "aspect" : " Incorrect reason for cancellation " ,
                "explanation" : " This sentence implies that the reason for cancelling the plan was to go on an adventure, which is incorrect. The correct reason was illness. This sentence should be replaced with a sentence that correctly states the reason for cancellation. " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            }
        },
        "raw_output" : " ... "
    }
]

VLLM支持（建議）

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" , use_vllm = True ) # VLLM on GPU

TigersCore支持VLLM快速推斷。在單個A6000（48GB）GPU上，TigersCore -13B僅需0.2s -0.3s即可為每個實例評分。

量化支持（GPU）

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" , quantized = True ) # 4 bit quantization on GPU

通過設置初始化參數quanitzed=True ，該模型被設置為4位版本中，帶有擁抱face load_in_4bit=True選項。

請注意，儘管使用量化會將內存需求減少較大。您可以在約20+GB內存GPU上運行TigersCore。但是，推理速度可能比使用原始BFLOAT16版本要慢。取決於您的權衡。

Llamacpp支持（CPU）

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B-GGUF" , use_llamacpp = True )

我們還提供LlamACPP版本的Tigerscore-7b/13b。通過使用我們提供的GGUF版本，您可以在純CPU設備上運行TigerScore。 Tigerscore-13B通常需要20秒才能得分每個實例。

數據準備

數據集預處理腳本和中間結果可以在此處找到

預言模板

文件夾xgptscore包含我們用於查詢ChatGpt或GPT-4的所有模板，以獲取有關TigerScore所涉及的不同任務的假設輸出中確定的錯誤。我們將這些API查詢方法稱為XGPTSCORE，用於通過查詢GPT模型來為AE X X Planable評分方法。

XGPTSCORE的整體管道是：

我們定義了一個查詢模板，該模板要求GPT模型根據任務指令，源文本和參考文本在假設輸出中識別錯誤。
我們每月構建各種評估方面，以專注於不同的任務。（ ./constants.py ）
然後，通過應用模板並指定在模板中要關注的方面，GPT模型必須以預定格式（例如JSON格式）返回確定的錯誤。

檢查xgptscore/README.md以獲取更多詳細信息。以及如何使用單個函數xgptscore()使用我們的查詢模板

數據集組件

公制由來自2個採樣通道，現實世界通道和合成通道的數據組成。

真實世界的頻道數據由腳本generate_distill_data.sh生成。
綜合通道數據由腳本generate_synthesis_distill_data.sh生成。 2頻道數據收集的總體目的是確保我們涵蓋培訓數據中的錯誤類型，以使我們的模型概括得更好。

獲取這些數據後，我們進行了一系列啟發式方法來過濾我們的不良數據和增強數據：

太長，太短，格式不好等（圖案匹配）的丟棄項目
Propmt gpt-4刪除具有不合理的錯誤分析內容（ check_data.sh ）的項目
我們的評估ASEPCT可能受到限制，因為它們是人為定義和固定的。因此，我們建議使用generate_inst_synthetic_data.sh生成具有自由形式誤差ASEPCT的高質量輸出，作為合成通道的補充。

？公制

您可以通過擁抱面孔加載我們用於固定tigerscore-v1的預處理數據嗎？直接地：

 from datasets import load_dataset
dataset = load_dataset ( "TIGER-Lab/MetricInstruct" )

培訓腳本

我們在finetune中提供培訓和測試腳本，我們在哪裡使用？

finetune_llama.sh finetine模型。
format_distill_data.sh將數據轉換為列式的格式，即帶有輸出的sinlge指令和輸入上下文。
test_llama_vllm.sh以測試和計算相關性作為我們的固定模型的性能。請檢查這些腳本，以了解我們的培訓和測試過程的更多詳細信息。
'eval_baseline.sh恢復基線實驗結果。請參閱./tigerscore/common/README.md以安裝Env。

引用

如果您罰款我們的數據，模型或代碼有用，請引用我們的論文。

 @article{Jiang2023TIGERScoreTB,
  title={TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks},
  author={Dongfu Jiang and Yishan Li and Ge Zhang and Wenhao Huang and Bill Yuchen Lin and Wenhu Chen},
  journal={ArXiv},
  year={2023},
  volume={abs/2310.00752},
  url={https://api.semanticscholar.org/CorpusID:263334281}
}

展開