AmoebaLLMのダウンロードAmoebaLLMソースコードのダウンロード

AmoebaLLM

AI ソースコード

1.0.0

ダウンロード

アメーバム：効率的かつインスタント展開のために、あらゆる形の大きな言語モデルを構築する

ヨンガン・フー、Zhongzhi Yu、Junwei Li、Jiayi Qian、Yongan Zhang、Xiangchi Yuan、Dachuan Shi、Roman Yakunin、およびYingyan（Celine）Lin

Neurips 2024で受け入れられました[Paper |スライド]。

アメーバム：概要

1回トレーニングして多くの効率的なLLMを導き出す方法は？ AmoeBallmを紹介します。AmoeBallmは、任意の形状のLLMサブネットを即座に導き出すように設計された新しいフレームワークであり、精度効率のフロンティアを実現し、1回限りの微調整の後に抽出できます。このようにして、AmoeBallmは、さまざまなプラットフォームとアプリケーション駆動型の仕様に合わせた迅速な展開を促進します。具体的には、Amoeballmは、高性能のサブネットを戦略的に抽出し、競合を回避するために共同でトレーニングすることにより、この目標を達成します。

実験結果： AmoeBallmは、LLM適応性に新しい標準を設定するだけでなく、精度と効率の間でSOTAトレードオフを達成するサブネットを成功裏に提供します。

コードの使用

環境のセットアップ

Condaを使用して、提供されたenv.ymlに基づいて環境をセットアップします。

 conda env create -f env.yml

ステージ1：知識を提供するサブセットの選択

ステップ1 ：動的プログラミングを使用してレイヤー選択戦略を導き出します。

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --fp16 --output_dir ./output/calib_dp --do_train False --do_eval False --no_eval_orig --layer_calib_dp --calib_dataset mmlu --enable_shrinking --num_calib_sample 40 --calib_metric acc --min_num_layer 20 --dp_keep_last_layer 1

ステップ2 ：フラップの重要性メトリックを使用して、ニューロン（幅）選択戦略を導き出します。

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --fp16 --output_dir ./output/width_calib --do_train False --do_eval False --use_auth_token --no_eval_orig --width_calib --num_calib_sample 512 --prune_width_method flap

ステップ3 ：レイヤーとニューロンの選択戦略を同じファイルdp_selection_strategy.npyにマージします（リポジトリのllama2-7bのこのファイルも提供しました）：

 python utils/merge_depth_width.py

ステージ2：すべての微調整

--do_train Trueおよび--enable_shrinkingを使用して1回の微調整を有効にし、ステージ1で提供される--shrinking_file dp_selection_strategy.npyで提供されるサブセット選択戦略を指定します。

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir ./output/ft --dataset alpaca-gpt4 --use_auth_token --do_train True --do_eval True --do_mmlu_eval True --do_eval_wikitext2 True --lora_modules all --fp16 --source_max_len 384 --target_max_len 128 --gradient_accumulation_steps 4 --logging_steps 10 --max_steps 10000 --save_strategy steps --data_seed 42 --save_steps 1000 --save_total_limit 1 --evaluation_strategy steps --eval_dataset_size 1024  --max_eval_samples 1000 --eval_steps 1000 --optim paged_adamw_32bit --ddp_find_unused_parameters --enable_shrinking --kd_weight 1 --min_num_layer 20 --random_sample_num_layer 2 --distill_method sp --shrinking_method calib_dp --shrinking_file dp_selection_strategy.npy --shrinkable_width --width_choice [1,7/8,3/4,5/8] --prune_width_method flap --use_moe_lora --moe_num_expert 5 --moe_topk 2

評価

上記の2段階のプロセスを使用して作成された微調整モデルに加えて、Amoeballm微調整されたLlama2-7Bモデル、 amoeba_llama2ここで提供しました。次のコマンドを使用してダウンロードして解凍できます。

 pip install gdown
gdown 1lwOiQa-UOYOXn72wo5gvzUvFat_PTg6b
unzip amoeba_llama2.zip

微調整されたモデルへのパスとして--output_dirを指定し、それぞれ--eval_num_layerと--eval_num_widthを使用してターゲットの深さと幅比を指定します。

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir amoeba_llama2 --do_train False --do_eval True --do_mmlu_eval True --bits 8 --bf16 --enable_shrinking --min_num_layer 20 --shrinking_method calib_dp --shrinking_file dp_selection_strategy.npy --shrinkable_width --width_choice [1,7/8,3/4,5/8] --prune_width_method flap --use_moe_lora --moe_num_expert 5 --moe_topk 2  --eval_num_layer 24 --eval_num_width 0.875 --do_lm_eval True --do_lm_eval_task arc_easy,piqa,hellaswag

了承

Qloraの実装を参照します。

引用

 @inproceedings{fuamoeballm,
  title={AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment},
  author={Fu, Yonggan and Yu, Zhongzhi and Li, Junwei and Qian, Jiayi and Zhang, Yongan and Yuan, Xiangchi and Shi, Dachuan and Yakunin, Roman and Lin, Yingyan Celine},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}

拡大する

追加情報