llama2 lora fine tuningダウンロード-llama2 llama2 lora fine tuningソースコードダウンロード

llama2 lora fine tuning

AI ソースコード

1.0.0

ダウンロード

LORAとDeepSpeedでllama2-chatを微調整します

2つのP100（16G）でllama-2-7b-chatモデルを微調整します。

データソースはALPACA形式を採用し、トレーニングと検証の2つのデータソースで構成されています。

1。グラフィックカードの要件

16gのビデオメモリ以上（P100またはT4以上）、1つ以上のブロック。

2。クローンソースコード

git clone https://github.com/git-cloner/llama2-lora-fine-tuning
cd llama2-lora-fine-tuning

3。インストール依存環境

 # 创建虚拟环境
conda create -n llama2 python=3.9 -y
conda activate llama2
# 下载github.com上的依赖资源（需要反复试才能成功，所以单独安装）
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1
pip install git+https://github.com/PanQiWei/AutoGPTQ.git -i https://pypi.mirrors.ustc.edu.cn/simple --trusted-host=pypi.mirrors.ustc.edu.cn
pip install git+https://github.com/huggingface/peft -i https://pypi.mirrors.ustc.edu.cn/simple
pip install git+https://github.com/huggingface/transformers -i https://pypi.mirrors.ustc.edu.cn/simple
# 安装其他依赖包
pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple
# 验证bitsandbytes
python -m bitsandbytes

4.元のモデルをダウンロードします

python model_download.py --repo_id daryl149/llama-2-7b-chat-hf

5。中国語の単語リストを展開します

 # 使用了https://github.com/ymcui/Chinese-LLaMA-Alpaca.git的方法扩充中文词表
# 扩充完的词表在merged_tokenizes_sp（全精度）和merged_tokenizer_hf（半精度）
# 在微调时，将使用--tokenizer_name ./merged_tokenizer_hf参数
python merge_tokenizers.py 
  --llama_tokenizer_dir ./models/daryl149/llama-2-7b-chat-hf 
  --chinese_sp_model_file ./chinese_sp.model

6.微調整パラメーターの説明

調整できるいくつかのパラメーターがあります。

パラメーター	説明します	値を取得します
load_in_bits	モデルの精度	4および8。ビデオメモリがオーバーフローしない場合は、高精度8を選択してみてください。
block_size	トークンの最大長	最初の選択2048、メモリオーバーフロー、1024、512など。
per_device_train_batch_size	トレーニング中に毎回ロードされたカードごとのバッチ数の数	記憶があふれない限り、総選挙に行くようにしてください
per_device_eval_batch_size	評価中に毎回ロードされたカードごとのバッチ数	記憶があふれない限り、総選挙に行くようにしてください
含む	使用されるグラフィックカードシーケンス	たとえば、2つのピース：localhost：1,2（シーケンスは必ずしもNvidia-smiが見ているものと同じではないことに注意してください）
num_train_epochs	トレーニングラウンドの数	少なくとも3ラウンド

7。細かい調整

chmod +x finetune-lora.sh
# 微调
./finetune-lora.sh
# 微调（后台运行）
pkill -9 -f finetune-lora
nohup ./finetune-lora.sh > train.log  2>&1 &
tail -f train.log

8。テスト

CUDA_VISIBLE_DEVICES=0 python generate.py 
    --base_model ' ./models/daryl149/llama-2-7b-chat-hf ' 
    --lora_weights ' output/checkpoint-2000 ' 
    --load_8bit #不加这个参数是用的4bit

拡大する

追加情報