efficient attention 다운로드 - efficient attention 소스 코드 다운로드

효율적인 관심

이 저장소에는 수행 된 실험의 공식 구현이 포함됩니다.

EVA : Control Variates를 통한 효율적인주의 (ICLR 2023)
LARA : 선형 복잡성 무작위 자체 정보 메커니즘 (ICML 2022)

? 리포 구조 :

efficient-attention : 다양한 효율적인주의 메커니즘을 구현하는 작은 자체 포함 코드베이스. 자세한 내용은 사용법을 참조하십시오.
vit : 이미지 분류 실험을위한 코드베이스.
- 파일 구조에 대한 DEIT 및
- PVTV2 모델 클래스를위한 PVT.
fairseq : 기계 번역 및 자동 회귀 언어 모델링을 포함한 언어 작업을위한 FairSeQ의 수정 된 포크.
main.sh : 모든 실험을 시작하기위한 bash 스크립트.
- 인수 목록은 스크립트를 참조하십시오.
- -e True 이후의 인수는 훈련 명령에 직접 전달됩니다. -e True 이후에 교육 명령에 사용자 정의 인수를 전달할 수 있습니다.

의존성

환경을 설정하려면 다음 명령을 실행하여 필요한 종속성을 설치합니다 (가상 환경에서 권장).

 # install packages
pip install -r requirements.txt
# install efficient-attention library
pip install -e efficient-attention

# OPTIONAL: install fairseq library for running language tasks
cd fairseq
python3 setup.py build develop
cd ..

환경은 Python 3.8.10, Pytorch 1.12.0 및 Cuda 11.3 으로 테스트됩니다. 또한 FairSeQ의 포크는 원래 코드베이스에서 여러 파일을 수정합니다. 보다 최근 버전의 FairSeQ를 사용하면 예기치 않은 의존성 충돌이 발생할 수 있습니다.

효율적인주의 라이브러리의 기본 사용

efficient-attention 몇 가지 효율적인주의 메커니즘을 수집하는 작은 자체 포함 코드베이스입니다.

Argparse에게 관심 별 주장을 전달합니다

각주의 메커니즘과 관련된 인수는 해당 Python 파일의 add_attn_specific_args() 클래스 메소드를 확인하십시오.
이 주장을 argparse 파서에게 전달하려면 다음 코드 스 니펫을 따르십시오.

 import argparse
from efficient_attention import AttentionFactory
# ...
parser = argparse . ArgumentParser ()
parser . add_argument ( '--attn-name' , default = 'softmax' , type = str , metavar = 'ATTN' ,
                        help = 'Name of attention model to use' )
# ...
temp_args , _ = parser . parse_known_args ()
# add attention-specific arguments to the parser
# struct_name: name of the inner namespace to store all attention-specific arguments
# prefix: prefix to prepend to all argument names
#         for example, if prefix = encoder-attn, then for the argument --window-size 
#         we need to pass --encoder-attn-window-size
#         this is useful to avoid argument name conflicts.
efficient_attention . AttentionFactory . add_attn_specific_args ( parser , temp_args . attn_name , struct_name = "attn_args" , prefix = "" )
# parse arguments to a namespace that supports nested attributes
args = parser . parse_args ( namespace = efficient_attention . NestedNamespace ())
# now we can access the attention-specific arguments via args.attn_args
print ( args . attn_args . window_size )

효율적인주의 모듈을 만듭니다

torch.nn.Module 클래스에서 다음과 같이 효율적인주의 모듈을 만들 수 있습니다.

 # we might want to pass attention-specific arguments to the attention module
# along with other related arguments
attn_args = {
    ** vars ( args . attn_args ),
    ** {
    'dim' : args . embed_dim , 
    'num_heads' : args . num_heads , 
    'qkv_bias' : args . qkv_bias , 
    'attn_drop' : args . attn_drop_rate , 
    'proj_drop' : args . drop_rate ,
    }
}
self . attn = AttentionFactory . build_attention ( attn_name = attn_name , attn_args = attn_args )

# the module can then be used as a normal function as
x = self . attn ( x )

imagenet1k에서 이미지 분류

데이터 준비

Imagenet 데이터 세트를 사전 프로세스하기 위해 DEIT와 유사한 설정을 따릅니다. imagenet train 및 val 이미지를 다운로드하여 다음 디렉토리 구조에 배치하여 TorchVision datasets.ImageFolder 와 호환 될 수 있습니다.

 /path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

교육 및 평가

다음 명령은 LARA/EVA 사용한 다양한 비전 변압기를 훈련하고 평가하는 데 사용됩니다. 훈련은 8 GPU로 수행되는 것으로 가정합니다.

DEIT에서의 ImageNet 분류 (서열 길이 784 (접미사 :_ P8)/196 (접미사 :_ P16))

다른 DEIT 아키텍처에서 LARA/EVA 사용하려면 :

 # LARA: DeiT-tiny-p8
bash main.sh -m evit_tiny_p8 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name lara --mis-type mis-opt --proposal-gen pool-mixed --alpha-coeff 2.0 --num-landmarks 49

# LARA: DeiT-tiny-p16
bash main.sh -m evit_tiny_p16 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name lara --mis-type mis-opt --proposal-gen pool-mixed --alpha-coeff 2.0 --num-landmarks 49

# LARA: DeiT-small-p16
bash main.sh -m evit_small_p16 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name lara --mis-type mis-opt --proposal-gen pool-mixed --alpha-coeff 2.0 --num-landmarks 49

# EVA: DeiT-tiny-p8
bash main.sh -m evit_tiny_p8 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name eva --num-landmarks 49 --adaptive-proj default --window-size 7 --attn-2d --use-rpe

# EVA: DeiT-tiny-p16
bash main.sh -m evit_tiny_p16 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name eva --num-landmarks 49 --adaptive-proj default --window-size 7 --attn-2d --use-rpe

# EVA: DeiT-small-p16
bash main.sh -m evit_small_p16 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name eva --num-landmarks 49 --adaptive-proj default --window-size 7 --attn-2d --use-rpe

PVTV2 -B3의 ImageNet 분류 (서열 길이 : 3136-> 784-> 196-> 49)

PVTV2 아키텍처에서 LARA/EVA 조정하려면 :

 # LARA Attention
bash main.sh -m pvt_medium2 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 1.0 --drop-path-rate 0.3 --warmup-epochs 10 --seed 1 --attn-name lara --pool-module-type dense --mis-type mis-opt --proposal-gen pool-mixed --num-landmarks 49 --alpha-coeff 2.0 --repeated-aug

# EVA Attention
bash main.sh -m pvt_medium2 -p < dir-of-imagenet-data > -g 8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --drop-path-rate 0.3 --warmup-epochs 10 --seed 1 --attn-name eva --num-landmarks 49 --adaptive-proj default --window-size 7 --attn-2d --use-rpe --repeated-aug

다른주의 메커니즘의 사용 :

또는 다른주의 메커니즘을 시험해 볼 수 있습니다.

 # Softmax Attention
bash main.sh -m evit_tiny_p8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name softmax
# RFA/Performer
bash main.sh -m evit_tiny_p8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name performer --proj-method favorp --approx-attn-dim 64
# Local Attention
bash main.sh -m evit_tiny_p8 -d imagenet -e TRUE --dist-eval --num-workers 16 --clip-grad 5.0 --warmup-epochs 10 --seed 1 --attn-name local --window-size 7 --attn-2d --use-rpe

언어 작업

데이터 준비

표준 사전 처리 FairSeQ를 사용하여 언어 작업에 대한 데이터를 준비합니다.

기계 번역은 여기에 따르면 Binarized WMT'14 EN-DE 데이터를 준비하십시오.
자동 회귀 언어 모델링을 보려면 여기를 따라 Wikitext-103 데이터 세트를 처리하십시오.

훈련

-r <resume-ckpt-DIR> 훈련 중 체크 포인트를 저장하는 디렉토리를 지정하고 교육을 재개하는 데 사용할 수 있습니다.
모든주의 별 주장은 접두사 --encoder-attn- (인코더 측) / --decoder-attn- (decoder-side의 경우)과 연관되어야합니다. 아래 예제를 참조하십시오.

기계 번역

 # # LARA
CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -d wmt -s lara_8 -g 4 -e TRUE --attn-name-encoder lara --encoder-attn-num-landmarks 8 --encoder-attn-proposal-gen adaptive-1d --encoder-attn-mis-type mis-opt

CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -d wmt -s lara_16 -g 4 -e TRUE --attn-name-encoder lara --encoder-attn-num-landmarks 16 --encoder-attn-proposal-gen adaptive-1d --encoder-attn-mis-type mis-opt

CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -d wmt -s lara_32 -g 4 -e TRUE --attn-name-encoder lara --encoder-attn-num-landmarks 32 --encoder-attn-proposal-gen adaptive-1d --encoder-attn-mis-type mis-opt

# # EVA
CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -d wmt -s eva_8_8 -g 4 -e TRUE --attn-name-encoder eva --encoder-attn-window-size 8 --encoder-attn-num-landmarks 8 --encoder-attn-adaptive-proj no-ln --encoder-attn-use-t5-rpe --encoder-attn-overlap-window

CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -d wmt -s eva_16_8 -g 4 -e TRUE --attn-name-encoder eva --encoder-attn-window-size 16 --encoder-attn-num-landmarks 8 --encoder-attn-adaptive-proj no-ln --encoder-attn-use-t5-rpe --encoder-attn-overlap-window

CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -d wmt -s eva_32_8 -g 4 -e TRUE --attn-name-encoder eva --encoder-attn-window-size 32 --encoder-attn-num-landmarks 8 --encoder-attn-adaptive-proj no-ln --encoder-attn-use-t5-rpe --encoder-attn-overlap-window

자동 회귀 언어 모델링

 # Currently, LARA does not support causal masking yet.

# EVA on a 16-layer Transformer LM
CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -m 16layers -d wikitext103 -s eva_128_8_16layers -g 4 -e TRUE --attn-name-decoder causal_eva --decoder-attn-window-size 128 --decoder-attn-causal --decoder-attn-adaptive-proj qk --decoder-attn-chunk-size 8 --decoder-attn-use-t5-rpe

# EVA on a 32-layer Transformer LM
CUDA_VISIBLE_DEVICES=0,1,2,3 bash main.sh -p < dir-of-your-bin-data > -m 32layers -d wikitext103 -s eva_128_8_32layers -g 4 -e TRUE --attn-name-decoder causal_eva --decoder-attn-window-size 128 --decoder-attn-causal --decoder-attn-adaptive-proj qk --decoder-attn-chunk-size 8 --decoder-attn-use-t5-rpe

생성 및 평가

Generation & Evaluation의 경우 간단히 인수를 전달합니다 -i true main.sh 호출하여 추론 절차 만 수행하십시오. 체크 포인트 경로는 -c <your-ckpt-path> 로 지정할 수 있습니다. 예를 들어,

 # Machine Translation
CUDA_VISIBLE_DEVICES=0 bash main.sh -i true -c < your-possibly-avg-checkpoint.pt > -p < dir-of-your-bin-data > -d wmt -g 1

# Autoregressive Language Modeling
CUDA_VISIBLE_DEVICES=0 bash main.sh -i true -c < your-checkpoint_last.pt > -p < dir-of-your-bin-data > -d wikitext103 -g 1

미리 훈련 된 모델

또한 기계 번역 및 언어 모델링 작업을 위해 OneDrive에서 훈련 된 EVA 모델 검문소를 제공합니다.

Wikitext103-EVA-16Layers-LM
Wikitext103-EVA-32Layers-LM
WMT14ENDE-EVA-E32_C8-MT
WMT14ENDE-EVA-E8_C8-MT

소환

 @inproceedings { zheng2023efficient ,
  title = { Efficient Attention via Control Variates } ,
  author = { Lin Zheng and Jianbo Yuan and Chong Wang and Lingpeng Kong } ,
  booktitle = { International Conference on Learning Representations } ,
  year = { 2023 } ,
  url = { https://openreview.net/forum?id=G-uNfHKrj46 }
}

 @inproceedings { zheng2022linear ,
  title = { Linear complexity randomized self-attention mechanism } ,
  author = { Lin Zheng and Chong Wang and Lingpeng Kong } ,
  booktitle = { International Conference on Machine Learning } ,
  pages = { 27011--27041 } ,
  year = { 2022 } ,
  organization = { PMLR }
}

확장하다

efficient attention

효율적인 관심

의존성

효율적인주의 라이브러리의 기본 사용

Argparse에게 관심 별 주장을 전달합니다

효율적인주의 모듈을 만듭니다

imagenet1k에서 이미지 분류

데이터 준비

교육 및 평가

DEIT에서의 ImageNet 분류 (서열 길이 784 (접미사 :_ P8)/196 (접미사 :_ P16))

PVTV2 -B3의 ImageNet 분류 (서열 길이 : 3136-> 784-> 196-> 49)

다른주의 메커니즘의 사용 :

언어 작업

데이터 준비

훈련

기계 번역

자동 회귀 언어 모델링

생성 및 평가

미리 훈련 된 모델

소환

efficient language detector

Parameter Efficient Transfer Learning Benchmark

GitHub sgrebnov/cordova plugin background download

Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express

efficient attention

효율적인 관심

의존성

효율적인주의 라이브러리의 기본 사용

Argparse에게 관심 별 주장을 전달합니다

효율적인주의 모듈을 만듭니다

imagenet1k에서 이미지 분류

데이터 준비

교육 및 평가

DEIT에서의 ImageNet 분류 (서열 길이 784 (접미사 :*_ P8)/196 (접미사 :*_ P16))

PVTV2 -B3의 ImageNet 분류 (서열 길이 : 3136-> 784-> 196-> 49)

다른주의 메커니즘의 사용 :

언어 작업

데이터 준비

훈련

기계 번역

자동 회귀 언어 모델링

생성 및 평가

미리 훈련 된 모델

소환

DEIT에서의 ImageNet 분류 (서열 길이 784 (접미사 :_ P8)/196 (접미사 :_ P16))