dataless model merging 다운로드 - dataless model merging 소스 코드 다운로드

dataless model merging

AI 소스 코드

1.0.0

다운로드

언어 모델의 가중치를 병합하여 Dataless 지식 융합

이 저장소에는 2023 년 5 월 1 일부터 5 일까지 르완다 키 갈리에서 개최되는 11 번째 국제 학습 대표 회의 (ICLR 2023)에서 출판 될 논문 인 언어 모델의 가중치를 병합하여 Dataless Knowledge Fusion의 결과를 재현하는 실험 코드가 포함되어 있습니다.

 @inproceedings{
    jin2023dataless,
    title={Dataless Knowledge Fusion by Merging Weights of Language Models},
    author={Xisen Jin and Xiang Ren and Daniel Preotiuc-Pietro and Pengxiang Cheng},
    booktitle={The Eleventh International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=FCnohuR6AnM}
}

요구 사항

우리는 Pytorch 1.13.1을 사용했습니다. 다른 요구 사항은 요구 사항을 참조하십시오.

빠른 데모

회귀 평균 (regmean) 알고리즘에 관심이있는 경우 regmean_demo.ipynb를 확인하십시오.

이것은 접착제에 미세 조정 된 두 개의 포옹 얼굴 변압기 모델을 병합하는 독립형 Jupyter 노트북입니다. 이 파일은 src/ 에서 파일을 가져 오지 않습니다.

결과를 재현합니다

감정 분류 데이터 세트 준비

이 repo에서 통합 감정 데이터 세트를 다운로드하십시오. 파일은 PROJECT_ROOT/resources/emotion_splits 아래에 다음 구조로 배치해야합니다.

 .
├── crowdflower
│   ├── dev.jsonl
│   ├── full.jsonl
│   ├── test.jsonl
│   └── train.jsonl
├── dailydialog
│   ├── dev.jsonl
│   ├── full.jsonl
│   ├── test.jsonl
│   └── train.jsonl
├── electoraltweets
│   ├── dev.jsonl
│   ├── full.jsonl
│   ├── test.jsonl
│   └── train.jsonl
├── emobank
│   ├── dev.jsonl
│   ├── full.jsonl
│   ├── test.jsonl
│   └── train.jsonl
...

NER 데이터 세트 준비

Conll2003, Ontonotes 및 Twitter NER 데이터 세트를 준비하여 PROJECT_ROOT/resources/ner 에 배치하십시오.

 .
├── conll2003
│   ├── dev.conll
│   ├── test.conll
│   └── train.conll
├── ontonotes
│   ├── onto.development.bc.ner
│   ├── onto.development.bn.ner
│   ├── onto.development.mz.ner
│   ├── onto.development.nw.ner
│   ├── onto.development.tc.ner
│   ├── onto.development.wb.ner
│   ├── onto.test.bc.ner
│   ├── onto.test.bn.ner
│   ├── onto.test.mz.ner
│   ├── onto.test.nw.ner
│   ├── onto.test.tc.ner
│   ├── onto.test.wb.ner
│   ├── onto.train.bc.ner
│   ├── onto.train.bn.ner
│   ├── onto.train.mz.ner
│   ├── onto.train.nw.ner
│   ├── onto.train.tc.ner
│   └── onto.train.wb.ner
└── twitter
    ├── annotated.twitter-ner-20-21-tweet-dev-withcleaned.json
    ├── annotated.twitter-ner-20-21-tweet-test-withcleaned.json
    └── annotated.twitter-ner-20-21-tweet-train-withcleaned.json

여기에서 Conll 및 Ontonotes 데이터 세트에는 Conll 형식의 항목이 포함되어 있습니다.

 CRICKET	O	Conll
-	O	Conll
LEICESTERSHIRE	B-ORG	Conll
TAKE	O	Conll
OVER	O	Conll
AT	O	Conll
TOP	O	Conll
AFTER	O	Conll
INNINGS	O	Conll
VICTORY	O	Conll
.	O	Conll

LONDON	B-LOC	Conll
1996-08-30	O	Conll
...

트위터 NER에는 라인 당 1 개의 JSON DICT가 포함되어 있습니다.

 {"text": "Spectacular skies over #Clonmel tonight http://t.co/OxclQkuyTp /via @niallodonovan #lastdayofautumn", "id": "539106999980797952", "entities": [{"startCharOffset": 24, "endOffset": 31, "endCharOffset": 31, "surface": "Clonmel", "startOffset": 24, "type": "LOC"}, {"startCharOffset": 69, "endOffset": 82, "endCharOffset": 82, "surface": "niallodonovan", "startOffset": 69, "type": "PER"}], "labels": ["O", "O", "O", "O", "B-LOC", "O", "O", "O", "O", "B-PER", "O", "O"], "tokens": ["Spectacular", "skies", "over", "#", "Clonmel", "tonight", "http://t.co/OxclQkuyTp", "/", "via", "@niallodonovan", "#", "lastdayofautumn"], "domain": "TWT"}

접착제 데이터 세트 준비

접착제 데이터 세트는 Hugging Face의 datasets 라이브러리가 다운로드 및로드됩니다.

사전 준비된 LMS 준비

Hugging Face 모델 저장소에서 사전 처리 된 모델 (예 : Roberta-Base)을 다운로드하여 PROJECT_ROOT/resources (예 : PROJECT_ROOT/resources/roberta-base ) 아래에 배치하십시오.

용법

--config_files : src/configs 아래를 참조하십시오. 교육 모듈 ( src.run_experiments )에는 기본 인수 ( src/defaults.yaml ), 데이터 구성 ( src/configs/datasets ) 및 Exp Config ( src/configs/exps 에서)를 정의하는 3 개의 구성 파일이 필요합니다.
--filter_model : 데이터 구성에 특화된 개별 모델의 하위 집합 만 병합 할 때 유용합니다. --filter_model model0 model1 model0 및 model1의 pairwaise 병합을 수행합니다 (데이터 구성에서 model0, model1과 같은 별칭의 정의 참조).
--templates : 구성 파일에는 {seed} 와 같은 템플릿이 포함될 수 있습니다. 템플릿의 값은 명령 줄에 지정해야합니다 (예 : --templates seed=1 ).

개별 모델 (병합 전)은 구성에 지정된 local_zoo_dir 에 따라 교육 및 저장됩니다. 동물원의 개별 모델 중 어느 것도 구성에서 주어진 모델 유형과 zoo_filter 인수와 일치하지 않으면 프로그램은 자동으로 새로운 개별 모델을 교육하고 local_zoo_dir 아래에 저장합니다. local_zoo_dir 에서 개별 모델이 발견되면 재 훈련없이로드됩니다.

예 : Regmean, 감정, 동일한 머리 초기, 병합 Model0 (DailyDialogue) 및 Model1 (Crowdflower)

 HF_DATASETS_OFFLINE=1 CUDA_VISIBLE_DEVICES=0 python -m src.run_experiments --config src/configs/defaults.yaml src/configs/datasets/emotion.yaml src/configs/exps/roberta-base/roberta-base-emotion.yaml --templates seed=1 --filter_model model0 model1

스크립트

쌍별 병합

서로 다른 데이터 세트 (도메인)에서 훈련 된 두 가지 감정 분류 모델을 병합합니다.

감정, Roberta-Base : scripts/roberta/pairwise_emotion.py
감정, t5-베이스 : scripts/t5/pairwise_emotion.py
감정, Roberta-Base : scripts/t5/pairwise_emotion.py

서로 다른 접착제 작업에 대해 훈련 된 두 모델을 병합합니다. 작업 별 분류 헤드는 병합되지 않습니다.

접착제, Distilbert-Base : scripts/distilbert/pairwise_glue_difftask.py
접착제, Roberta-Base : scripts/roberta/pairwise_glue_difftask.py

동일한 접착제 작업의 두 개의 비 IID 파티션에 대해 훈련 된 두 모델 병합

접착제, Distilbert-Base : scripts/distilbert/pairwise_glue_subset.py
접착제, Roberta-Base : scripts/roberta/pairwise_glue_subset.py

욕심 많은 병합

개별 모델의 OOD 성능 순서대로 여러 (2 개 모두) 모델을 탐욕스럽게 병합합니다.

감정, Roberta-Base : scripts/roberta/incremental_emotion.py
감정, T5-베이스 : scripts/t5/incremental_emotion.py
감정, Deberta-Large : scripts/deberta/incrementale_emotion.py
Ner, Roberta-Base : scripts/roberta/incremental_ner.py
Ner, Deberta-Large : scripts/deberta/incremental_ner.py

이 스크립트는 도메인 및 도메인 외 테스트 세트 모두에서 추론을 실행하십시오.

위의 각 스크립트는 단순, Fisher 및 Regmean 평균화가 실행됩니다. 또한 MTL (Multi-Task Learning), 모델 앙상블 및 비교기로 개별 모델 (병합 전)의 성능을 실행합니다. 이 스크립트 내부의 줄을 주석하여 각 스크립트의 일부를 실행할 수 있습니다.