Llama2 chinese Download - Llama2 chinese Source code download

Llama2 chinese

AI Source Code

1.0.0

Download

LLaMA2 Chinese fine tune

The license for the LLaMA2 model has changed and has been commercially available. When the model was launched, LLaMA2-Chat was also launched. I have practiced fine-tuning of Llama-2-7b-chat on the 16G reasoning card (https://zhuanlan.zhihu.com/p/645152512, the code is https://github.com/git-cloner/llama2-lora-fine-tuning). However, even if the Chinese vocabulary list is expanded, the reasoning effect is still not good, and the answers are mainly in English.

When the LLaMA2 model was released, the official fine-tuning program was opened, called LLaMA Companion (https://github.com/facebookresearch/llama-recipes), which supports fine-tuning of full-scale, Lora and other methods, and is relatively more compatible than third-party programs.

This article is based on llama-recipes, modifying the adaptive graphics card resources, and fine-tuning the original LLaMA2-7b model based on Lora. The result is reasonable inference. This project also provides a test process and streaming interface.

The effects of LLaMA2 Chinese fine-tuning can be viewed in Aiit-Chat, and the link address is: https://gitclone.com/aiit/chat/.

1. Reasoning card requirements

16G or above, it is best to have more than two pieces.

It takes 120 hours to fine-tune a round of more than 100 M of corpus on two P100s (16G). Therefore, it is recommended to use V100, 4090 and other reasoning cards to fine-tune.

2. Fine-tuning process

2.1 Download code

git clone https://github.com/git-cloner/Llama2-chinese
cd Llama2-chinese

2.2 Installing the virtual environment

conda create -n llama-recipes python=3.9 -y
conda activate llama-recipes
# 因为requirements中有从github中安装的依赖，网络环境不佳，打开这两个参数可以观察进度
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1
pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple --trusted-host=pypi.mirrors.ustc.edu.cn
# 问题比较多的是bitsandbytes，pip install后用以下命令验证
python -m bitsandbytes

2.3 Download the original Llama2-7b model

 # 用本项目开发的下载器下载模型，可以断点续传和重连
python model_download.py --repo_id NousResearch/Llama-2-7b-hf
# 下载后的模型在 ./modelsNousResearchLlama-2-7b-hf 下

2.4 Corpus preparation

The corpus is in the alpaca format (the alpaca corpus in huggingface.co is very large and can be sorted out by yourself). After personalization, it is named: ft_datasets/alpaca_data.json

2.5 Fine-tuning process

 # kill process force
pkill -9 -f llama_finetuning
# train，batch_size_training可按显存大小反复试，尽量把显存占满
# 本例是用两块P100，分别是第1、2块
# ！注意如果用两块卡，nproc_per_node是1，不是2
CUDA_VISIBLE_DEVICES=1,2 nohup torchrun --nnodes 1 --nproc_per_node 1   
llama_finetuning.py 
--use_peft 
--peft_method lora 
--model_name ./models/NousResearch/Llama-2-7b-hf 
--use_fp16 
--output_dir output/model 
--dataset alpaca_dataset 
--batch_size_training 40 
--num_epochs 3 
--quantization > train.log  2>&1 &
# check log
tail -f train.log

3. Inference test

After a round of fine-tuning, a peft incremental model will be generated. Under output/model, use the following command to test it interactively on the client. Since the stream mode is not used, the results can only be seen after generating it at one time, so the speed is slow.

CUDA_VISIBLE_DEVICES=0 python generate.py 
    --base_model ' ./models/NousResearch/Llama-2-7b-hf ' 
    --lora_weights ' ./output/model ' 
    --load_8bit

4. Streaming API testing

4.1 Turn on API service

 # 可以用4bit或8bit量化方式或半精度装入模型测试
# --load_4bit  需要约6G显存
# --load_8bit  需要9G显存
# 半精度  需要13G显存
CUDA_VISIBLE_DEVICES=0 nohup python -u api_stream.py 
--load_4bit > api_stream.log  2>&1 &
tail -f api_stream.log

4.2 Test API

 # 多次发POST请求，直到返回的response中包含[stop]后停止调用
curl -X POST " http://127.0.0.1:8000/stream " 
     -H ' Content-Type: application/json ' 
     -d ' {"prompt": "你好", "history": []} '

5. Model merger

python inference/hf-text-generation-inference/merge_lora_weights.py 
--base_model ./models/NousResearch/Llama-2-7b-hf 
--peft_model output/model 
--output_dir output/merged_model_output

6. There are problems

Try to make full or half-precision fine-tuning, Lora's effect is average
In this project, due to computing power limitation, the max_token_size setting is relatively small (256) and the accuracy is also low (4bit), so the generated may be incomplete due to incompetence.
The corpus should not be too many, but the quality is required, and more than 50,000 pieces (51K) have good results.

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-08-14
size 71.37KB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
Chinese DOS games (Chinese DOS games in browser) project source code official version

2022-11-01

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All