BERT of Theseus

BERT of Theseus

기타 소스코드

1.0.0

다운로드

Bert-of Thisseus

종이 코드 "Bert-of theSeuse : 프로그레시브 모듈 대체에 의한 Bert 압축".

Bert-of Theeseus는 원래 Bert의 구성 요소를 점차적으로 교체하여 새로운 압축 된 버트입니다.

테세우스의 버트

소환

연구 에서이 코드를 사용하는 경우, 본 논문을 인용하십시오.

 @inproceedings { xu-etal-2020-bert ,
    title = " {BERT}-of-Theseus: Compressing {BERT} by Progressive Module Replacing " ,
    author = " Xu, Canwen  and
      Zhou, Wangchunshu  and
      Ge, Tao  and
      Wei, Furu  and
      Zhou, Ming " ,
    booktitle = " Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) " ,
    month = nov,
    year = " 2020 " ,
    address = " Online " ,
    publisher = " Association for Computational Linguistics " ,
    url = " https://www.aclweb.org/anthology/2020.emnlp-main.633 " ,
    pages = " 7859--7869 "
}

새로운 : 우리는 접착제 작업에 대한 예측 및 리더 보드 제출 준비를위한 스크립트를 업로드했습니다. 여기에서 확인하십시오!

Bert-of Thisesus를 실행하는 방법

요구 사항

우리의 코드는 Huggingface/Transformers를 기반으로합니다. 코드를 사용하려면 Huggingface/Transformers를 복제하고 설치해야합니다.

버트를 압축하십시오

Huggingface의 지시에 따라 이전 모델을 미세 조정 한 다음 그렇게하지 않은 경우 디렉토리에 저장해야합니다.
아래 예제에 따라 압축 실행 :

 # For compression with a replacement scheduler
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py 
  --model_name_or_path /path/to/saved_predecessor 
  --task_name $TASK_NAME 
  --do_train 
  --do_eval 
  --do_lower_case 
  --data_dir " $GLUE_DIR / $TASK_NAME " 
  --max_seq_length 128 
  --per_gpu_train_batch_size 32 
  --per_gpu_eval_batch_size 32 
  --learning_rate 2e-5 
  --save_steps 50 
  --num_train_epochs 15 
  --output_dir /path/to/save_successor/ 
  --evaluate_during_training 
  --replacing_rate 0.3 
  --scheduler_type linear 
  --scheduler_linear_k 0.0006

 # For compression with a constant replacing rate
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py 
  --model_name_or_path /path/to/saved_predecessor 
  --task_name $TASK_NAME 
  --do_train 
  --do_eval 
  --do_lower_case 
  --data_dir " $GLUE_DIR / $TASK_NAME " 
  --max_seq_length 128 
  --per_gpu_train_batch_size 32 
  --per_gpu_eval_batch_size 32 
  --learning_rate 2e-5 
  --save_steps 50 
  --num_train_epochs 15 
  --output_dir /path/to/save_successor/ 
  --evaluate_during_training 
  --replacing_rate 0.5 
  --steps_for_replacing 2500

인수에 대한 자세한 설명은 소스 코드를 참조하십시오.

MNLI에 사전 처리 된 모델을로드하십시오

우리는 MNLI에 대한 6 층 사기 모델을 일반 목적 모델로 제공합니다.이 모델은 다른 문장 분류 작업으로 전송할 수 있으며, 6 개의 접착제 작업 (DEV 세트)에서 Distillbert (동일한 6 계층 구조)를 능가 할 수 있습니다.

방법	mnli	MRPC	qnli	QQP	RTE	SST-2	STS-B
버트베이스	83.5	89.5	91.2	89.8	71.1	91.5	88.9
Distillbert	79.0	87.5	85.3	84.9	59.9	90.7	81.2
Bert-of Thisseus	82.1	87.5	88.8	88.8	70.1	91.8	87.8

Huggingface/Transformers를 사용하여 일반 목적 모델을 쉽게로드 할 수 있습니다.

 from transformers import AutoTokenizer , AutoModel

tokenizer = AutoTokenizer . from_pretrained ( "canwenxu/BERT-of-Theseus-MNLI" )

model = AutoModel . from_pretrained ( "canwenxu/BERT-of-Theseus-MNLI" )

버그 보고서 및 기여

기여하고 더 많은 작업을 추가하려면 (현재 접착제 만 사용할 수 있음) 풀 요청을 제출하고 저에게 연락하십시오. 또한 문제 나 버그를 찾으면 문제가 발생하여보고하십시오. 감사해요!

타사 구현

우리는 여기에 커뮤니티의 타사 구현을 나열합니다. 이 목록에 구현을 친절하게 추가하십시오.

Tensorflow Implementation (tested on NER) : https://github.com/qiufengyuyi/bert-o-of-teseus-tf
Keras Implementation (tested on text classification) : https://github.com/bojone/bert-of-theseus

확장하다

추가 정보