TIGERScore 다운로드 - TIGERScore 소스 코드 다운로드

Tigerscore

이 repo에는 TMLR 2024 용지 "Tigerscore : 모든 텍스트 생성 작업에 대한 설명 가능한 메트릭을 구축하기 위해"코드, 데이터 및 모델

더 많은 결과와 분석을 보려면 [프로젝트 페이지]를 확인하십시오!

Tigerscore-Yi-6B

기타 리소스
? Tigerscore 컬렉션
? 포옹 페이스 데모

설치

Tigerscore 파이프 라인을 직접 사용하려면 먼저 파이썬 패키지로 설치해야합니다.

pip install git+https://github.com/TIGER-AI-Lab/TIGERScore.git

로컬 컴퓨터에 대해 torch.cuda.is_available() 가 True 인지 확인하십시오.

게다가, 여기에 vllm이 자세히 설명 된 Tigerscore를 사용하려면 Vllm 문서를 따라 vllm을 밀어 넣어야합니다.

CUDA가 12.1 인 경우

pip install vllm
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

CUDA가 11.8 인 경우

 # Replace `cp39` with your Python version (e.g., `cp38`, `cp39`, `cp311`).
pip install https://github.com/vllm-project/vllm/releases/download/v0.2.2/vllm-0.2.2+cu118-cp39-cp39-manylinux1_x86_64.whl
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

교육 스크립트를 사용하려면 다음 명령을 실행하여 종속성을 설치하십시오.

pip install -r requirements.txt

용법

기본 사용

설치 후 다음 exmaple python 코드로 텍스트 세대를 점수를 매기는 것이 좋습니다 (더 많은 사용 사례는 tigerscore_example_usage.ipynb 참조) : :

 # gpu device setup
import os
os . environ [ "CUDA_VISIBLE_DEVICES" ] = "0"
# example  
instruction = "Write an apology letter."
input_context = "Reason: You canceled a plan at the last minute due to illness."
hypo_output = "Hey [Recipient], n n I'm really sorry for ditching our plan. I suddenly got an opportunity for a vacation so I took it. I know this might have messed up your plans and I regret that. n n Despite being under the weather, I would rather go for an adventure. I hope you can understand my perspective and I hope this incident doesn't change anything between us. n n We can reschedule our plan for another time. Sorry again for the trouble. n n Peace out, n [Your Name] n n ---"

# Load and evaluate examples in all options in 3 lines of code
from tigerscore import TIGERScorer
scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" ) # on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B", quantized=True) # 4 bit quantization on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B", use_vllm=True) # VLLM on GPU
# scorer = TIGERScorer(model_name="TIGER-Lab/TIGERScore-7B-GGUF", use_llamacpp=True) # 4 bit quantization on CPU
results = scorer . score ([ instruction ], [ hypo_output ], [ input_context ])

# print the results, which is a list of json output containging the automatically parsed results!
print ( results )

결과는 구조화 된 오류 분석으로 구성된 딕트 목록입니다.

[
    {
        "num_errors" : 3 ,
        "score" : -12.0 ,
        "errors" : {
            "error_0" : {
                "location" : " " I'm really glad for ditching our plan. " " ,
                "aspect" : " Inappropriate language or tone " ,
                "explanation" : " The phrase " ditching our plan " is informal and disrespectful. It should be replaced with a more respectful and apologetic phrase like " cancelling our plan " . " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            },
            "error_1" : {
                "location" : " " I suddenly got an opportunity for a vacation so I took it. " " ,
                "aspect" : " Lack of apology or remorse " ,
                "explanation" : " This sentence shows no remorse for cancelling the plan at the last minute. It should be replaced with a sentence that expresses regret for the inconvenience caused. " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            },
            "error_2" : {
                "location" : " " I would rather go for an adventure. " " ,
                "aspect" : " Incorrect reason for cancellation " ,
                "explanation" : " This sentence implies that the reason for cancelling the plan was to go on an adventure, which is incorrect. The correct reason was illness. This sentence should be replaced with a sentence that correctly states the reason for cancellation. " ,
                "severity" : " Major " ,
                "score_reduction" : " 4.0 "
            }
        },
        "raw_output" : " ... "
    }
]

Vllm 지원 ( 권장 )

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" , use_vllm = True ) # VLLM on GPU

Tigerscore는 VLLM 빠른 추론을 지원합니다. 단일 A6000 (48GB) GPU에서 각 인스턴스를 점수하는 데 Tigerscore -13B의 경우 0.2S -0.3S 만 소요됩니다.

양자화 지원 (GPU)

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B" , quantized = True ) # 4 bit quantization on GPU

초기화 매개 변수를 설정하면 quanitzed=True 설정하면 모델이 CAPER load_in_4bit=True 옵션으로 4 비트 버전으로로드되도록 설정됩니다.

양자화를 사용하면 메모리 요구 사항이 큰 마진으로 줄어 듭니다. 약 20+GB 메모리 GPU에서 Tigerscore를 실행할 수 있습니다. 그러나 추론 속도는 원래 bfloat16 버전을 사용하는 것보다 느리게 될 수 있습니다. 트레이드 오프를 만드는 것은 당신에게 달려 있습니다.

llamacpp 지원 (CPU)

 scorer = TIGERScorer ( model_name = "TIGER-Lab/TIGERScore-7B-GGUF" , use_llamacpp = True )

우리는 또한 Tigerscore-7b/13b의 Llamacpp 버전을 제공합니다. 우리가 제공 한 GGUF 버전을 사용하면 순수한 CPU 장치에서 Tigerscore를 실행할 수 있습니다. 일반적으로 Tigerscore-13B가 각 인스턴스를 평가하는 데 20 대가 걸립니다.

데이터 준비

데이터 세트 전처리 스크립트 및 중간 결과는 여기에서 찾을 수 있습니다.

프로무트 템플릿

폴더 xgptscore 에는 ChatGpt 또는 GPT-4를 쿼리하는 데 사용한 모든 템플릿이 포함되어 있습니다. GPT 모델을 쿼리하여 이러한 API 쿼리 메소드를 AE X Planainable 스코어링 방법에 대한 XGPTScore로 호출합니다.

XGPTScore의 전체 파이프 라인은 다음과 같습니다.

GPT 모델에 작업 명령, 소스 텍스트 및 참조 텍스트를 기반으로 가설 출력에서 오류를 IDNetning하도록 요청하는 쿼리 템플릿을 정의합니다.
우리는 다양한 작업에 중점을 둔 다양한 평가 측면을 구성합니다. ( ./constants.py )
그런 다음 템플릿을 적용하고 템플릿에 초점을 맞출 측면을 구체화함으로써 GPT 모델은 식별 된 오류를 사전 정의 된 형식 (예 : JSON 형식)으로 반환해야합니다.

자세한 내용은 xgptscore/README.md 확인하십시오. 단일 함수 xgptscore() 와 함께 쿼리 템플릿을 사용하는 방법

데이터 세트 구성 요소

Metricinstruct는 2 개의 샘플링 채널, 실제 채널 및 합성 채널 의 데이터로 구성됩니다.

실제 채널 데이터는 스크립트 generate_distill_data.sh 에 의해 생성됩니다.
합성 채널 데이터는 스크립트 generate_synthesis_distill_data.sh 에 의해 생성됩니다. 2 채널 데이터 수집의 전반적인 목적은 교육 데이터의 오류 유형만큼 많은 오류 유형을 다루어 모델이 더 나은 일반화되도록하는 것입니다.

이러한 데이터를 얻은 후, 우리는 잘못된 데이터를 필터링하고 데이터를 보강하기 위해 일련의 휴리스틱을 수행합니다.

너무 길고 너무 짧고 나쁜 형식 등의 드롭 아이템 (패턴 일치)
propmt gpt-4 to leasonable 오류 분석 내용 ( check_data.sh )이있는 항목을 삭제
우리의 평가 아메리카는 남성으로 정의되고 고정되어 있기 때문에 제한 될 수 있습니다. 따라서, 우리는 합성 채널의 보충제로 generate_inst_synthetic_data.sh 사용하여 자유 형식 오류 ASEPCT를 사용하여 고품질 출력을 생성 할 것을 제안합니다.

? 메트릭 instruct

Tigerscore-V1을 포옹하는 데 사용하는 전처리 데이터를로드 할 수 있습니까? 곧장:

 from datasets import load_dataset
dataset = load_dataset ( "TIGER-Lab/MetricInstruct" )

훈련 스크립트

우리는 폴더 finetune 에서 교육 및 테스트 스크립트를 제공합니다.

finetune_llama.sh 모델을 Finetine합니다.
format_distill_data.sh 데이터를 FinetUning을위한 형식으로 변환합니다.
test_llama_vllm.sh 상관 관계를 미세 튜닝 모델의 성능으로 테스트하고 계산합니다. 교육 및 테스트 프로세스에 대한 자세한 내용을 알아 보려면이 스크립트를 확인하십시오.
'Eval_Baseline.sh 기준 실험 결과를 복원하려면. Env를 설치하려면 ./tigerscore/common/README.md 참조하십시오.

소환

데이터, 모델 또는 코드가 유용하다면 논문을 인용하십시오.

 @article{Jiang2023TIGERScoreTB,
  title={TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks},
  author={Dongfu Jiang and Yishan Li and Ge Zhang and Wenhao Huang and Bill Yuchen Lin and Wenhu Chen},
  journal={ArXiv},
  year={2023},
  volume={abs/2310.00752},
  url={https://api.semanticscholar.org/CorpusID:263334281}
}

확장하다