TIGERScoreダウンロードTIGERScoreソースコードのダウンロード

タイガースコア

このレポーには、TMLR 2024 Paperのコード、データ、およびモデルが含まれています。

その他の結果と分析については、[プロジェクトページ]をご覧ください！

Tigerscore-yi-6b

その他のリソース
？タイガースコアコレクション
？ Huggingfaceデモ

インストール

TigerScoreパイプラインを直接使用するには、最初にPythonパッケージとしてインストールする必要があります。

pip install git+https://github.com/TIGER-AI-Lab/TIGERScore.git

torch.cuda.is_available()がローカルマシンにTrueかどうかを確認してください。

また、ここで詳述されているVLLMを使用してTigerScoreを使用するには、VLLMドキュメントに次いでVLLMをvllMにインストールする必要があります。

あなたのcudaが12.1の場合

pip install vllm
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

あなたのcudaが11.8の場合

 # Replace `cp39` with your Python version (e.g., `cp38`, `cp39`, `cp311`).
pip install https://github.com/vllm-project/vllm/releases/download/v0.2.2/vllm-0.2.2+cu118-cp39-cp39-manylinux1_x86_64.whl
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

トレーニングスクリプトを使用する場合は、次のコマンドを実行して依存関係をインストールします。

pip install -r requirements.txt

使用法

基本的な使用法

インストール後、次のExmaple Pythonコードでテキスト世代を獲得するのが良いです（より多くのユースケースについては、 tigerscore_example_usage.ipynbを参照）：

 # gpu device setup
import os
os . environ [ "CUDA_VISIBLE_DEVICES" ] = "0"
# example  
instruction = "Write an apology letter."
input_context = "Reason: You canceled a plan at the last minute due to illness."
hypo_output = "Hey [Recipient], n n I'm really sorry for ditching our plan. I suddenly got an opportunity for a vacation so I took it. I know this might have messed up your plans and I regret that. n n Despite being under the weather, I would rather go for an adventure. I hope you can understand my perspective and I hope this incident doesn't change anything between us. n n We can reschedule our plan for another time. Sorry again for the trouble. n n Peace out, n [Your Name] n n ---"

# Load and evaluate examples in all options in 3 lines of code
from tigerscore import TIGERScorer
scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" ) # on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B", quantized=True) # 4 bit quantization on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B", use_vllm=True) # VLLM on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B-GGUF", use_llamacpp=True) # 4 bit quantization on CPU
results = scorer . score ([ instruction ], [ hypo_output ], [ input_context ])

# print the results, which is a list of json output containging the automatically parsed results!
print ( results )

結果は、構造化されたエラー分析で構成されるDICTのリストです。

[
    {
        "num_errors" : 3 ,
        "score" : -12.0 ,
        "errors" : {
            "error_0" : {
                "location" : " " I'm really glad for ditching our plan. " " ,
                "aspect" : " Inappropriate language or tone " ,
                "explanation" : " The phrase " ditching our plan " is informal and disrespectful. It should be replaced with a more respectful and apologetic phrase like " cancelling our plan " . " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            },
            "error_1" : {
                "location" : " " I suddenly got an opportunity for a vacation so I took it. " " ,
                "aspect" : " Lack of apology or remorse " ,
                "explanation" : " This sentence shows no remorse for cancelling the plan at the last minute. It should be replaced with a sentence that expresses regret for the inconvenience caused. " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            },
            "error_2" : {
                "location" : " " I would rather go for an adventure. " " ,
                "aspect" : " Incorrect reason for cancellation " ,
                "explanation" : " This sentence implies that the reason for cancelling the plan was to go on an adventure, which is incorrect. The correct reason was illness. This sentence should be replaced with a sentence that correctly states the reason for cancellation. " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            }
        },
        "raw_output" : " ... "
    }
]

VLLMサポート（推奨）

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" , use_vllm = True ) # VLLM on GPU

TigerScoreはVLLM高速推論をサポートしています。単一のA6000（48GB）GPUでは、TigerScore -13Bが各インスタンスを獲得するのに0.2S -0.3Sのみが必要です。

量子化サポート（GPU）

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" , quantized = True ) # 4 bit quantization on GPU

初期化パラメーターquanitzed=Trueを設定することにより、モデルは、Face load_in_4bit=Trueオプションを抱える4ビットバージョンでロードするように設定されます。

量子化を使用すると、メモリの要件が大きなマージンだけ減少することに注意してください。約20+GBメモリGPUでTigerScoreを実行できます。ただし、推論速度は、元のBFLOAT16バージョンを使用するよりも遅くなる可能性があります。トレードオフをすることはあなたに依存します。

llamacppサポート（CPU）

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B-GGUF" , use_llamacpp = True )

また、TigerScore-7B/13BのLMALMACPPバージョンも提供しています。提供されたGGUFバージョンを使用すると、純粋なCPUデバイスでタイガースコアを実行できます。通常、TigerScore-13Bが各インスタンスを獲得するには20秒かかります。

データの準備

データセットのプリプロセッシングスクリプトと中間結果はこちらにあります

プロップテンプレート

Folder xgptscoreには、TigerScoreが関与するさまざまなタスクの仮説出力の識別されたエラーを取得するために、CHATGPTまたはGPT-4をクエリするために使用したすべてのテンプレートが含まれています。これらのAPIクエリメソッドを、 GPTモデルを照会することにより、AE X PlanAnabableスコアリングメソッドのXGPTSCOREと呼びます。

xgptscoreの全体的なパイプラインは次のとおりです。

タスク命令、ソーステキスト、および参照テキストに基づいて、仮説出力のエラーをidNetifyするためにGPTモデルをASKESでaskesするクエリテンプレートを定義します。
さまざまなタスクに焦点を当てるために、さまざまな評価の側面を多数構築します。（ ./constants.py ）
次に、テンプレートを適用し、テンプレートで焦点を合わせる側面を指定することにより、GPTモデルは、識別されたエラーを事前定義された形式（JSON形式など）で返すために必要です。

詳細については、 xgptscore/README.mdを確認してください。単一の関数xgptscore()でクエリテンプレートを使用する方法

データセットコンポーネント

MetricInstructは、2つのサンプリングチャネル、実際のチャネル、合成チャネルのデータで構成されています。

実際のチャネルデータは、スクリプトgenerate_distill_data.shによって生成されます。
合成チャネルデータは、スクリプトgenerate_synthesis_distill_data.shによって生成されます。 2チャンネルのデータ収集の全体的な目的は、モデルがより良くなるように、トレーニングデータのエラータイプのようにカバーすることを確認することです。

これらのデータを取得した後、私たちは悪いデータをフィルタリングし、データを拡大するためにシリーズヒューリスティックを行います。

長すぎる、短すぎる、悪い形式などをドロップする（パターンマッチング）
不合理なエラー分析コンテンツを使用してアイテムをドロップするgpt-4をプロップします（ check_data.sh ）
私たちの評価ASEPCTは、それらが手本的に定義され、固定されているため、制限される可能性があります。したがって、合成チャネルのサプリメントとしてgenerate_inst_synthetic_data.shを使用して、フリーフォームエラーASEPCTSで高品質の出力を生成することを提案します。

？メトリックスロクト

hugging hugging顔からタイガースコア-v1を微調整するために使用される前処理データをロードできますか？直接：

 from datasets import load_dataset
dataset = load_dataset ( "TIGER-Lab/MetricInstruct" )

トレーニングスクリプト

フォルダーfinetuneでトレーニングとテストのスクリプトを提供していますか？

finetune_llama.shからモデルを獲得します。
format_distill_data.shデータをFinetuningの形式、つまり出力を持つSinlge命令と入力コンテキストに変換します。
test_llama_vllm.sh finetunedモデルのパフォーマンスとして相関関係をテストおよび計算します。これらのスクリプトを確認して、トレーニングとテストプロセスの詳細を確認してください。
'eval_baseline.shベースライン実験の結果を復元します。 ./tigerscore/common/README.mdを参照して、envをインストールしてください。

引用

データ、モデル、またはコードが役立つ場合は、私たちの論文を引用してください。

 @article{Jiang2023TIGERScoreTB,
  title={TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks},
  author={Dongfu Jiang and Yishan Li and Ge Zhang and Wenhao Huang and Bill Yuchen Lin and Wenhu Chen},
  journal={ArXiv},
  year={2023},
  volume={abs/2310.00752},
  url={https://api.semanticscholar.org/CorpusID:263334281}
}

拡大する