ReAlign 다운로드 - ReAlign 소스 코드 다운로드

개혁 된 정렬

이것은 재구성 된 정렬을 위한 공식 저장소입니다.

Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai HE, Ethan Chern, Jiewen Hu, Pengfei Liu

소식

2024 년 9 월 : 우리의 논문은 EMNLP 2024 결과에 의해 받아 들여졌습니다! ?
2024 년 2 월 : 우리는 ARXIV, 재정렬 데이터 및 이들을 개발할 때 기타 유용한 리소스에 대한 프리 프린트 용지를 발표합니다 (작업 설명, 손으로 작성된 형식, 작업 분류기, 교육 데이터 및 사실 평가를위한 NQ 데이터 세트).

소개

우리는 기존 명령 데이터의 품질을 높이기 위해 인간 값과 더 잘 일치 시켜서 명령 데이터의 응답을 사전 설정된 기준 및 협력 된 증거와 더 잘 맞추는 형식으로 개혁 하는 간단하고 효과적인 접근법 을 도입합니다. 이 접근법은 기존 정렬 기술과 직교로 남아있는 인간 주석, 환각 및 스케일링의 어려움을 최소화합니다. 실험적으로, 재정렬은 LLM의 일반적인 정렬 능력, 수학 추론, 사실 및 가독성을 크게 향상시킵니다.

고무적으로, 추가 데이터 또는 고급 훈련 기술을 도입 하지 않고 단순히 응답을 재 포장함으로써 LLAMA-2-13B의 수학적 추론 능력은 정확도에서 46.77%에서 56.63%로 향상 될 수 있습니다. 또한, 재정렬 데이터의 5%에 불과한 ALPACA 데이터 세트에 의해 측정 된 일반적인 정렬 능력이 67% 증가합니다. 이 연구는 LLM의 과학 및 기계적 해석 가능성 에 대한 추가 연구의 필요성을 강조합니다.

재조정의 근본적인 철학 은 정렬 프로세스에서 인간과 LLM의 역할을 다시 조정하여 보완 강점을 활용하는 것입니다. 인간은 증류 된 LLM 지식을 사용하지 않고 생성력 (예 : 교육 능력)을 기반으로 자신의 선호도를 표현하고 LLM을 구조화하는 것입니다. 이 협력적인 시너지를 통해 생성 된 명령 데이터는 더 정확하게 정확할뿐만 아니라 인간 선호도와 더 밀접하게 정렬 될 것으로 기대합니다.

LLAMA-2-13B 및 MISTRAL-7B 모델에 대한 GSM8K 테스트 세트의 정확도는 GSM8K 및 재정렬이 있거나없는 수학 세트에 미세 조정되었습니다. (a) : GSM8K에 대한 교육 및 테스트. (b) : GSM8K에 대한 수학 및 테스트에 대한 교육 (배포되지 않은 설정).

세 단계를 포함한 재조정 에 대한 개요. Kilt는 지식 집약적 인 언어 작업을 나타냅니다.

재정렬 프로세스는 세 가지 주요 단계에서 전개됩니다.

첫 번째 단계는 자연 언어 형태의 다양한 시나리오에서 인간이 자신의 선호도 (예 : 선호되는 응답 형식)를 정의하는 기준 정의 와 관련이 있습니다. 이 논문에서는 46 개의 뚜렷한 시나리오에 대한 기준을 세 심하게 정의합니다.

두 번째 단계 인 검색 확대는 Open-Domain QA 및 사실 검증과 같은 지식 집약적 인 작업의 지식 기반을 넓 힙니다. 이는 추가 정보를 통합하여 대응의 사실과 정보를 향상시킴으로써 달성됩니다.

마지막 단계 인 재구성은 사전 확립 된 기준과 협력 된 증거로 응답을 다시 정렬하여 구조화되고 입증 된 출력을 보장하는 것을 목표로합니다.

재조정은 사전 정의 된 기준으로 원래 응답을 더 나은 형식으로 재정렬합니다.

원래 모델의 응답의 예와 재조정 모델의 응답

빠른 시작

설정

이 프로젝트에서는 python 3.10 사용합니다. conda 통해 가상 환경을 조성하는 것이 좋습니다.

그런 다음 requirements.txt 에 나열된 모든 라이브러리를 설치해야합니다 .txt. CUDA 버전에 따라 적절한 버전의 torch 선택할 수 있습니다 (이 파일에서 torch>=2.0.1+cu118 씁니다).

pip install -r requirements.txt

관로

여기에서 OpenAI API 키를 얻으십시오. 이것은 재구성에 사용됩니다.
여기에서 Serper API 키를 얻으십시오. 이것은 Google 검색으로 검색하는 데만 사용됩니다.

1 단계 : 작업 분류

Huggingface Hub에서 작업 분류기를 다운로드하십시오.

모델 이름	HF 체크 포인트	크기	특허
작업 분류기	? GAIR/재조정-타스크-클래식기	13b	라마 2

그런 다음 다음 프롬프트를 사용하여 작업 분류기는 쿼리가 속하는 작업을 식별 할 수 있습니다.

 PROMPT_INPUT_FOR_TASK_CLS : str = '''
You will receive a user's query. Additionally, you are given some pre-defined tasks below: 

[Existing tasks start]
question_generation
story_generation
poem_generation
email_generation
data_generation
advice_giving
recommendations
how_to_generation
planning
instructional_rewriting
language_polishing
paraphrasing
text_correction
code_correction
code_simplification
information_extraction
keywords_extraction
table_extraction
title_generation
text_summarization
note_summarization
explain_code
explain_answer
text_to_text_translation
text_to_code_translation
code_to_code_translation
code_to_text_translation
open_qa
closed_qa
fill_in_the_blank
fact_verification
math_puzzles
language_learning_questions
natural_language_learning_tutor
exam_problem_solving_tutor
ml_ai_language_model_tutor
general_classification
ordering
sentiment_analysis
code_language_classification
language_classification
topic_classification
value_judgement
rejecting
roleplay
default
[Existing tasks end]

You objective is to choose the most appropriate task that can reflect the high-level intention of this query. You should first clearly give out your choice. Your choice should exactly match one of the task names provided above, without any modification. Do not include the task description in your choice.

Your output should be just the task name.

User's query is below:
[User's query start]
{input}
[User's query end]

Task name:

'''

예는 다음과 같습니다.

 from vllm import LLM , SamplingParams
import torch

num_gpus = torch . cuda . device_count ()
model_name_or_dir = "GAIR/ReAlign-Task-Classifier" # or the local directory to store the downloaded model
llm = LLM ( model = model_name_or_dir , tensor_parallel_size = num_gpus )

query = "Give three tips for staying healthy."
input_ = PROMPT_INPUT_FOR_TASK_CLS . format ( input = query )

sampling_params = SamplingParams ( temperature = 0.0 , top_p = 1.0 , max_tokens = 50 )
outputs = llm . generate ( input_ , sampling_params )
task = output [ 0 ]. outputs [ 0 ]. text

print ( task ) # should be `advice_giving`.
# If the classified result is not in task list, set it as `default`.

2 단계 : 데이터 세트를 준비하십시오

데이터 세트를 재정렬 데이터 세트와 동일한 JSON 유형으로 다음 형식으로 변환하십시오.

예는 다음과 같습니다.

[
    {
        "id" : 0 ,
        "items" : [
            {
                # question
                "from" : "human" ,
                "value" : "Give three tips for staying healthy." ,
                "category" : "advice_giving"
            },
            {
                # response
                "from" : "gpt" ,
                "value" : "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. n 2. Exercise regularly to keep your body active and strong. n 3. Get enough sleep and maintain a consistent sleep schedule."
            }
        ]
    }
]

3 단계 : Google 검색으로 검색합니다

Serper API 키 설정 :

 export SERPER_API_KEY = ...

다음 스크립트를 실행하십시오.

 python retrieval . py 
    - - input_data_path dataset . json 
    - - output_path dataset_retrieval . json 
    - - batch_size 10

출력 파일 :

dataset_retrieval.json 원래 검색 결과가 추가됩니다.

dataset_retrieval_clean_evidence.json 정리 된 검색 결과가 추가됩니다. 이것은 재정렬에 사용됩니다.

4 단계 : 개혁

OpenAI API 키 설정 :

 export OPENAI_API_KEY = ...

다음 스크립트를 실행하십시오.

 python reformat . py 
    - - input_data_path dataset_retrieval_clean_evidence . json 
    - - output_directory reformat_results 
    - - tokenizer_path meta - llama / Llama - 2 - 7 b - chat - hf  # or the local directory to store the downloaded tokenizer
    - - dataset_batch_id 0  # the first file (it's in 0 - 9) of ten files
    - - dataset_batch_num 10  # the total number of the file
    - - openai_key < OPENAI_API_KEY > 
    - - top_k 2  # output 2 reformatted response for each response
    - - model gpt - 3.5 - turbo - 1106 
    - - temperature 0.3 
    - - top_p 1 
    - - target_length 4096

SpeedUp과 평행 한 프로세스를 사용하고 있습니다. 즉, dataset_batch_num 프로세스를 동시에 재구성하기 위해 동시에 실행할 것이며 각 프로세스는 dataset_batch_id 수동으로 지정해야합니다.

예를 들어:

dataset_batch_num 10으로 설정하면 데이터 세트가 10 Subdataset (10x Acceleration)로 분할됩니다. dataset_batch_id 0-9로 지정할 때마다 스크립트를 동시에 10 번 실행해야합니다.

그런 다음 디렉토리 output_directory 에서 dataset_batch_num 파일을 얻을 수 있습니다.

다음 스크립트를 실행하여 이러한 파일을 하나의 최종 데이터 세트로 병합하십시오.

 python parallel_data_merge . py 
    - - input_data_path dataset_retrieval_clean_evidence . json  # the <input_data_path> in reformat script
    - - output_directory reformat_results  # the <output_directory> in reformat script
    - - final_output_path dataset_reformat . json

마지막으로, 최종 개혁 된 데이터 세트를 얻을 수 있습니다.

5 단계 : 필터링 후

rewrite_data_selection.py 에서 필터링 규칙을 결합하거나 필터링 규칙을 사용자 정의 할 수 있습니다.

다음 스크립트를 실행하여 개혁 된 데이터 세트를 필터링하십시오.

 python rewrite_data_selection . py 
    - - input_original_data_path dataset_retrieval_clean_evidence . json  # the dataset path before reformatting
    - - input_rewrite_data_path dataset_reformat . json  # the reformatted dataset path
    - - output_path realign_dataset . json # the final dataset path after filtering

이제 최종 재조정 데이터 세트 realign_dataset.json 얻을 수 있습니다.

데이터 세트를 재정렬합니다

우리는 5 개의 데이터 세트 기반 오픈 플라 타피, 알파카, 로봇 없음, GSM8K 및 수학을 개혁합니다.

Open-Platypus 재조정 : datasets/realign_OpenPlatypus.json

Alpaca 재조정 : datasets/realign_alpaca.json

재정렬 없음 로봇 : datasets/realign_no_robots.json

재정렬 GSM8K : datasets/realign_gsm8k.json

재조정 수학 : datasets/realign_math.json

데이터 세트는 또한 포옹 얼굴에로드 할 수 있습니다.

데이터 세트 이름	포옹 페이스 링크	크기
개방형 플라 타피를 재조정하십시오	? GAIR/재조정-오펜-플라 타피	25K
알파카를 재정렬하십시오	? Gair/Realign-Alpaca	52k
로봇이 없음을 재정렬하십시오	? GAIR/재조정-로봇	10k
GSM8K를 재정렬하십시오	? GAIR/재조정 GSM8K	7.4k
수학 재정렬	? Gair/Realign-Math	6.5k

기타 리소스

작업 설명 및 형식

작업 설명 및 사전 정의 된 형식은 code/constant.py 에서 찾을 수 있습니다.

작업 분류기의 데이터

작업 분류기의 교육 데이터는 datasets/classification/task_classifier_train_dataset.json 에 있습니다.

테스트 데이터는 datasets/classification/task_classifier_test_dataset.json 에 있습니다.

형식은 다음과 같습니다.

{
        "instruction" : "Create a story about a dog that finds a magical portal." ,
        "category" : "story_generation"
}

사실 평가

사실 성 평가를 위해 NQ 데이터 세트에서 100 건의 사례를 무작위로 샘플링하며, datasets/nq 에서 찾을 수 있습니다.

근거 진실은 datasets/nq/nq_factuality_100.json 에 있습니다.

형식은 다음과 같습니다.

{
        "items" : [
            {
                "from" : "human" ,
                "value" : "when did the democratic party change its name?"
            },
            {
                "from" : "gpt" ,
                "value" : "the 1830s"
            }
        ],
        "id" : 0
}

소환

이 저장소 또는 논문의 자원이 도움이된다면 논문을 인용하십시오.

 @article{fan2024reformatted,
      title={Reformatted Alignment}, 
      author={Fan, Run-Ze and Li, Xuefeng and Zou, Haoyang and Li, Junlong and He, Shwai and Chern, Ethan and Hu, Jiewen and Liu, Pengfei},
      year={2024},
      journal={arXiv preprint arXiv:2402.12219},
      url={https://arxiv.org/abs/2402.12219}
}

감사의 말

우리는 논문을 검토하고 귀중한 피드백을 주신 GAIR 회원들에게 감사드립니다. 교육 코드베이스와 도움을 제공 한 OpenChat의 저자에게 감사드립니다.

확장하다

ReAlign

개혁 된 정렬

소식

목차

소개

빠른 시작

설정

관로

1 단계 : 작업 분류

2 단계 : 데이터 세트를 준비하십시오

3 단계 : Google 검색으로 검색합니다

4 단계 : 개혁

5 단계 : 필터링 후

데이터 세트를 재정렬합니다

기타 리소스

작업 설명 및 형식

작업 분류기의 데이터

사실 평가

소환

감사의 말

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express