Causal Distill 다운로드 - Causal Distill 소스 코드 다운로드

언어 모델의 인과 증류 (Diito)

Zhengxuan Wu*, Atticus Geiger*, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D. Goodman

언어 모델에 대한 사전 인쇄 인과 증류의 구현입니다. 증류에 대한 표준 접근법은 두 가지 목표에 대해 학생 모델을 훈련시킵니다. 즉, 작업 별 목표 (예 : 언어 모델링) 및 학생 모델의 숨겨진 상태가 더 큰 교사 모델의 숨겨진 상태와 유사하게 장려하는 모방 목표. 이 논문에서 우리는 학생이 교환 중재 훈련 (IIT)을 통해 교사의 인과 계산 과정을 모방하도록 장려하는 세 번째 목표로 증류를 증류하는 것이 유리하다는 것을 보여줍니다. 우리는 우리의 방법을 증류 교환 중재 훈련 목표 (DIITO) 로 지정합니다.

우리는 Diito가 자원이 적은 설정에 도움이된다는 것을 알게됩니다. DIITO는 (97%) 표준 증류로 PAR을 수행하지만 97% 적은 데이터로 훈련합니다.

우리는 Huggingface Distillation 인터페이스에서 메인 코드베이스를 포크합니다.

릴리스 노트

✅ 12/02/2021 우리의 교환 중재 훈련 (IIT)에 관한 논문이 발표되었습니다! 이 방법의보다 공식적인 정의를 위해 이것을 읽으십시오.
✅ 12/06/2021은 Preprint와 함께 인과 적 증류 코드베이스를 발표했습니다.
✅ 12/06/2021은 Wiki-Text 103M 데이터 세트를 사용하여 증류 된 Tiny-Bert (3 층)에 대한 평가 결과를 발표했습니다.
01/14/2022는 새로운 버전의 Diito 및 평가 결과를 발표했습니다. 자세한 내용은 개인 공유 업데이트 된 사전 인쇄를 볼 수 있습니다.
02/21/2022는 DIITO-XXS 의 코드베이스를 출시하여 DITTO를 NLP의 작업 별 모델을 증류하는 데 적용하여 자원이 적은 설정에서 모델 증류를 지원하는 데 중점을 둡니다. 자세한 정보는 저장소를 확인하십시오!
⬜️ 릴리스 Diito (6 층) 모델은 영어 Wikipedia + Bookcorpus로 훈련 된 모델을 출시했습니다.

문제가 발생하거나 제안이 있으면 문제 페이지 또는 [email protected]로 저에게 연락하십시오.

벤치 마크 결과

접착제의 개발자 세트의 결과는 다음과 같습니다.

모델	훈련 토큰	평균 점수	콜라	mnli	MRPC	qnli	QQP	RTE	SST-2	STS-B
Distilbert (6 층) Devlin et al., 2019	3.3b	79.59	51.30	82.10	87.50	89.20	88.50	59.90	91.30	86.90
Distilbert (6 층)	0.1b	75.80	40.43	78.95	87.45	84.76	84.96	60.10	89.38	80.40
Diito (6 층)	0.1b	77.14	45.17	79.68	88.18	85.83	85.31	60.94	90.32	81.69
Diito (6 층)	3.3b	(-)	(-)	(-)	(-)	(-)	(-)	(-)	(-)	(-)

주요 내용

소환
요구 사항
데이터 세트
증류
평가

소환

이 저장소를 사용하는 경우 다음 두 논문을 인용하십시오. 교환 중재 훈련을위한 종이, 증류 방법 용 종이를 인용하십시오.

  @article{geiger-etal-2021-iit,
        title={Inducing Causal Structure for Interpretable Neural Networks}, 
        author={Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and Kreiss, Elisa and Icard, Thomas and Goodman, Noah D. and Potts, Christopher},
        year={2021},
        eprint={2112.00826},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
  }

  @article{wu-etal-2021-distill,
        title={Causal Distillation for Language Models}, 
        author={Wu, Zhengxuan and Geiger, Atticus and Rozner, Josh and Kreiss, Elisa and Lu, Hanson and Icard, Thomas and Potts, Christopher and Goodman, Noah D.},
        year={2021},
        eprint={2112.02505},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
  }

요구 사항

파이썬 3.6 또는 3.7이 지원됩니다.
Pytorch 버전 : 1.9.0
Transfermers 버전 : 4.11.3
데이터 세트 버전 : 버전 : 1.8.0
Huggingface Distillation Interface에서 코드베이스를 구축하므로 Doc에서 요구 사항을 검토하십시오.

데이터 세트

Huggingface Distillation 인터페이스에 이어 증류를 수행하기 전에 데이터 세트를 사전 처리해야합니다. 자세한 내용은 Repo를 참조 할 수 있습니다. 사전 처리 스크립트를 조정하고 몇 가지 개선 사항으로 업데이트합니다. 예를 들어, 이제 데이터 세트 허브에서 데이터 세트를 껴안는 것에서 직접 데이터 세트를 이진화 할 수 있습니다.

 # preprocessing from disk
python script/binarized_data.py 
--file_path ../../bert-mid-tuning/data-files/wikitext-15M 
--split train 
--field_name text 
--max_parsing_example 1000 
--tokenizer_type bert 
--tokenizer_name bert-base-uncased 
--dump_file ./data/binarized_text

# preprocessing from huggingface.
python scripts/binarized_data.py 
--dataset_name bookcorpus 
--split train 
--field_name text 
--tokenizer_type bert 
--tokenizer_name bert-base-uncased 
--dump_file bookcorpus-dataset/binarized_text 
--cache_dir ./distill_cache/

python scripts/binarized_data.py 
--dataset_name wikitext 
--split train 
--field_name text 
--tokenizer_type bert 
--tokenizer_name bert-base-uncased 
--dump_file wikitext-dataset/binarized_text 
--cache_dir ./distill_cache/

python scripts/binarized_data.py 
--dataset_name wikitext+bookcorpus 
--split train 
--field_name text 
--tokenizer_type bert 
--tokenizer_name bert-base-uncased 
--dump_file wikitext+bookcorpus-dataset/binarized_text 
--cache_dir ./distill_cache/

# helper scripts to combine two binarized data files
python scripts/data_combinator.py 
--file_path_left ./bookcorpus-dataset/binarized_text.train.bert-base-uncased.pickle 
--file_path_right ./wikitext-dataset/binarized_text.train.bert-base-uncased.pickle 
--split train 
--tokenizer_name bert-base-uncased 
--dump_file wikitext+bookcorpus-dataset/binarized_text

# multiprocessing preprocessor.
python scripts/binarized_data.py 
--dataset_name bookcorpus 
--split train 
--field_name text 
--tokenizer_type bert 
--tokenizer_name bert-base-uncased 
--dump_file bookcorpus-dataset/binarized_text 
--cache_dir ./distill_cache/ 
--fast_process 
--preprocessing_num_workers 48

데이터 세트를 준비한 후에는 토큰 수를 생성해야합니다.

python scripts/token_counts.py 
--data_file data/binarized_text.train.bert-base-uncased.pickle 
--token_counts_dump data/binarized_text.train.token_counts.bert-base-uncased.pickle 
--vocab_size 30522

증류

훈련하기 전에 교사 모델에서 추출한 가중치로 학생 모델을 초기화하는 것이 좋습니다.

python scripts/extract_distilbert.py 
--model_type bert 
--model_name bert-base-uncased 
--dump_checkpoint ./distillation_checkpoints/bert-base-uncased_num_layer_3.pth 
--num_layers 3

이제 인과 적 증류 목표와 함께 증류 할 수있는 예가 있습니다.

CUDA_VISIBLE_DEVICES=0,1,2,3 python causal_train.py 
--force 
--n_gpu 4 
--log_interval 10 
--student_type distilbert 
--student_config ./training_configs/distilbert-base-uncased-large.json 
--student_pretrained_weights ./distillation_checkpoints/bert-base-uncased_num_layer_6.pth 
--teacher_type bert 
--teacher_name bert-base-uncased 
--neuron_mapping ./training_configs/single_middle_layer_6.nm 
--mlm --alpha_ce 0.25 --alpha_mlm 0.25 --alpha_cos 0.25 --alpha_clm 0.0 --alpha_causal_ce 0.25 --alpha_causal_cos 0.0 
--interchange_prop 0.3 --interchange_max_token -1 --interchange_consecutive_only 
--freeze_pos_embs 
--dump_path ./results/ 
--data_file ./wikitext-dataset/binarized_text.train.bert-base-uncased.pickle 
--token_counts ./wikitext-dataset/binarized_text.train.token_counts.bert-base-uncased.pickle 
--seed 42 
--n_epoch 3 
--gradient_accumulation_steps 6 
--batch_size 40

인과 적 증류 목적을 단순히 인수를 설정하여/끄기 만하면 간단히 돌릴 수 있습니다. 예를 들어, 우리는 최근 코사인 손실 항의 인과 적 손실을 지원하기 위해이 인수 --alpha_causal_cos 추가합니다. 설정의 유효 배치 크기는 240으로 설정됩니다.

평가

증류 된 모델을 얻은 후에는 미세 조정하고 다운 스트림 작업으로 평가해야합니다. 실행 해야하는 모든 스크립트를 제공합니다.

MLM 평가

CUDA_VISIBLE_DEVICES=0 python run_mlm.py 
--model_name_or_path ./path_to_your_model/ 
--dataset_dir ../path_to_your_data/ 
--tokenizer_name bert-base-uncased 
--do_eval 
--output_dir /tmp/test-mlm 
--cache_dir ./distill_cache/

접착제 평가

CUDA_VISIBLE_DEVICES=0,1,2,3 python run_glue.py 
--model_name_or_path ./path_to_your_model/ 
--tokenizer_name bert-base-uncased 
--task_name sst2 
--do_train 
--do_eval 
--max_seq_length 128 
--per_device_train_batch_size 32 
--learning_rate 2e-5 
--num_train_epochs 3 
--output_dir ./results/ 
--save_total_limit 1 
--cache_dir ./distill_cache/

conll 평가

CUDA_VISIBLE_DEVICES=0,1,2,3 python run_ner.py 
--model_name_or_path ./path_to_your_model/ 
--tokenizer_name bert-base-uncased 
--dataset_name conll2003 
--do_train 
--do_eval 
--output_dir ./ner_results/ 
--save_total_limit 1 
--cache_dir ./distill_cache/

분대 평가

CUDA_VISIBLE_DEVICES=0,1,2,3 python run_qa.py 
--model_name_or_path ./path_to_your_model/ 
--tokenizer_name bert-base-uncased 
--dataset_name squad 
--do_train 
--do_eval 
--per_device_train_batch_size 12 
--learning_rate 3e-5 
--num_train_epochs 2 
--max_seq_length 384 
--doc_stride 128 
--save_total_limit 1 
--output_dir ./qa_results/

확장하다