AmoebaLLM下載 - AmoebaLLM源代碼下載

AmoebaLLM

Ai源碼

1.0.0

下載

AMOEBALLM：構建任何形狀的大型語言模型，以進行有效和即時部署

Yonggan Fu，Zhongzhi Yu，Junwei Li，Jiayi Qian，Yongan Zhang，Xiangchi Yuan，Dachuan Shi，Roman Yakunin和Yingyan（Celine）Lin

在神經2024年接受[紙張|滑動]。

Amoeballm：概述

如何訓練一次並得出許多有效的LLM？我們介紹了Amoeballm，這是一個新穎的框架，旨在立即得出任意形狀的LLM子網，該子網達到了精確效率的邊界，只需在一次性微調之後提取。通過這種方式，Amoeballm促進了針對不同平台和應用程序驅動的規格量身定制的快速部署。具體而言，Amoeballm通過戰略性提取高性能子網並共同訓練以避免衝突來實現這一目標。

實驗結果： AmoeballM不僅在LLM適應性方面設定了新標準，而且還成功地提供了在準確性和效率之間實現SOTA權衡的子網。

代碼用法

環境設置

使用Conda基於提供的env.yml ：

 conda env create -f env.yml

階段1：知識的子集選擇

步驟1 ：使用動態編程得出層選擇策略：

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --fp16 --output_dir ./output/calib_dp --do_train False --do_eval False --no_eval_orig --layer_calib_dp --calib_dataset mmlu --enable_shrinking --num_calib_sample 40 --calib_metric acc --min_num_layer 20 --dp_keep_last_layer 1

步驟2 ：使用flap中的重要性指標得出神經元（寬度）選擇策略：

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --fp16 --output_dir ./output/width_calib --do_train False --do_eval False --use_auth_token --no_eval_orig --width_calib --num_calib_sample 512 --prune_width_method flap

步驟3 ：將圖層和神經元選擇策略合併到同一文件dp_selection_strategy.npy （我們還為repo中的llama2-7b提供了此文件）：

 python utils/merge_depth_width.py

第2階段：全部微調

使用--do_train True和--enable_shrinking啟用一對全部的微調，並指定階段1提供的子集選擇策略，with- --shrinking_file dp_selection_strategy.npy ：

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir ./output/ft --dataset alpaca-gpt4 --use_auth_token --do_train True --do_eval True --do_mmlu_eval True --do_eval_wikitext2 True --lora_modules all --fp16 --source_max_len 384 --target_max_len 128 --gradient_accumulation_steps 4 --logging_steps 10 --max_steps 10000 --save_strategy steps --data_seed 42 --save_steps 1000 --save_total_limit 1 --evaluation_strategy steps --eval_dataset_size 1024  --max_eval_samples 1000 --eval_steps 1000 --optim paged_adamw_32bit --ddp_find_unused_parameters --enable_shrinking --kd_weight 1 --min_num_layer 20 --random_sample_num_layer 2 --distill_method sp --shrinking_method calib_dp --shrinking_file dp_selection_strategy.npy --shrinkable_width --width_choice [1,7/8,3/4,5/8] --prune_width_method flap --use_moe_lora --moe_num_expert 5 --moe_topk 2

評估

除了您使用上述兩階段過程創建的微調模型外，我們還提供了我們的Amoeballm微調Llama2-7b模型amoeba_llama2 。您可以使用以下命令下載並解壓縮：

 pip install gdown
gdown 1lwOiQa-UOYOXn72wo5gvzUvFat_PTg6b
unzip amoeba_llama2.zip

指定--output_dir作為微型模型的路徑，並分別使用--eval_num_layer和--eval_num_width指定目標深度和寬度比：

 CUDA_VISIBLE_DEVICES=0 python main.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir amoeba_llama2 --do_train False --do_eval True --do_mmlu_eval True --bits 8 --bf16 --enable_shrinking --min_num_layer 20 --shrinking_method calib_dp --shrinking_file dp_selection_strategy.npy --shrinkable_width --width_choice [1,7/8,3/4,5/8] --prune_width_method flap --use_moe_lora --moe_num_expert 5 --moe_topk 2  --eval_num_layer 24 --eval_num_width 0.875 --do_lm_eval True --do_lm_eval_task arc_easy,piqa,hellaswag

致謝

我們指的是Qlora中的實現。

引用

 @inproceedings{fuamoeballm,
  title={AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment},
  author={Fu, Yonggan and Yu, Zhongzhi and Li, Junwei and Qian, Jiayi and Zhang, Yongan and Yuan, Xiangchi and Shi, Dachuan and Yakunin, Roman and Lin, Yingyan Celine},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-19
大小 36.02MB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部