
This project is based on Chinese-LLaMA-Alpaca V3.1 for instructions. Chinese-LLaMA-Alpaca has pioneered Chinese expansion and improvement based on LLaMA. Based on the original LLaMA, it expanded the Chinese vocabulary list and used Chinese data for secondary pre-training, further improving the basic semantic understanding ability of Chinese.
Project composition : <sources>
.
├── README.md # 使用说明文件
├── SHA256.md # LLaMA模型SHA值对比文件
├── notebooks
│ ├── convert_and_quantize_chinese_alpaca_plus.ipynb
│ └── convert_and_quantize_chinese_llama.ipynb
├── requirements.txt # 依赖文件
└── scripts
├── chinese_sp.model # 中文词表文件
├── crawl_prompt.py # 1. 通过OpenAI的大模型(如ChatGPT、GPT4等)生成可用于微调的数据
├── inference_hf.py # 5. 对微调训练产生的LoRA模型和原始LLaMA模型做推理
├── merge_llama_with_chinese_lora.py # 4. 合并模型权重
├── merge_tokenizers.py # 2. 词表扩充
└── run_clm_pt_with_peft.py # 3. 对模型进行训练或者微调Whether you want to pre-train or fine-tune, you need to prepare data. There are two ways to prepare data:
scripts/crawl_prompt.py to generate the corresponding data. The basic idea is to use ChatGPT or other OpenAI efficient models for data generation. # tokenizer
wget https : // agi . gpt4 . org / llama / LLaMA / tokenizer . model - O . / tokenizer . model
wget https : // agi . gpt4 . org / llama / LLaMA / tokenizer_checklist . chk - O . / tokenizer_checklist . chk
# 7B
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / consolidated . 00. pth - O . / 7 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / params . json - O . / 7 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / checklist . chk - O . / 7 B / checklist . chk
# 13B
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / consolidated . 00. pth - O . / 13 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / consolidated . 01. pth - O . / 13 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / params . json - O . / 13 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / checklist . chk - O . / 13 B / checklist . chk
# 30B
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 00. pth - O . / 30 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 01. pth - O . / 30 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 02. pth - O . / 30 B / consolidated . 02. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 03. pth - O . / 30 B / consolidated . 03. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / params . json - O . / 30 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / checklist . chk - O . / 30 B / checklist . chk
# 65B
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 00. pth - O . / 65 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 01. pth - O . / 65 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 02. pth - O . / 65 B / consolidated . 02. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 03. pth - O . / 65 B / consolidated . 03. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 04. pth - O . / 65 B / consolidated . 04. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 05. pth - O . / 65 B / consolidated . 05. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 06. pth - O . / 65 B / consolidated . 06. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 07. pth - O . / 65 B / consolidated . 07. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / params . json - O . / 65 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / checklist . chk - O . / 65 B / checklist . chkYou need to download the LLaMA model weights of different size parameters. The larger the parameters, the larger the weight, the larger the weight, the better the accuracy, and the fine-tuning and training time are also relatively long. Generally speaking, the 7B or 13B model will be the choice of most people.
Be sure to confirm the integrity of the LLaMA base model and check whether it is consistent with the value shown in SHA256.md, otherwise the merge operation cannot be performed.
# 安装依赖库
pip install git + https : // github . com / huggingface / transformers
# 转化HF权重
python - m transformers . models . llama . convert_llama_weights_to_hf
- - input_dir llama - weights
- - model_size 7 B
- - output_dir llama - hf - weights
> python - m transformers . models . llama . convert_llama_weights_to_hf - - input_dir . / - - model_size 7 B - - output_dir . / output / 7 B - hf If you don't want to convert manually yourself, you can also use the LLaMA-HF model that others have converted. pinkmanlove has the weight of the converted LLaMA-HF in HuggingFace. If it fails, you can search for other people to convert well in HuggingFace-Models .
The entire training and fine-tuning process consists of three steps:
python scripts / merge_tokenizers . py
- - llama_tokenizer_dir llama_tokenizer_dir
- - chinese_sp_model_file chinese_sp_model_file
> python scripts / merge_tokenizers . py - - llama_tokenizer_dir output / 7 B - hf - - chinese_sp_model_file scripts / chinese_sp . modelParameter description:
llama_tokenizer_dir : Point to the directory where the original LLaMA tokenizer is stored;chinese_sp_model_file : Point to the Chinese vocabulary file (chinese_sp.model) trained with sentencepiece;Note
There are two major methods for expanding the vocabulary list: (1) merge and expand the vocabulary list; (2) find a large vocabulary list and delete useless words to get a vocabulary list;
During the pre-training stage, the general Chinese corpus is used to further pre-train on the basis of the original LLaMA weights. The process is divided into two stages:
The model converges slowly in the first stage of pre-training. If there is not particularly abundant time and computing resources, it is recommended to skip this stage. The second stage of pre-training training is as follows (single-player single card):
########参数设置########
lr = 2e-4
lora_rank = 8
lora_alpha = 32
lora_trainable = "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save = "embed_tokens,lm_head"
lora_dropout = 0.05
pretrained_model = path / to / hf / llama / dir
chinese_tokenizer_path = path / to / chinese / llama / tokenizer / dir
dataset_dir = path / to / pt / data / dir
data_cache = temp_data_cache_dir
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
training_steps = 100
gradient_accumulation_steps = 1
output_dir = output_dir
deepspeed_config_file = ds_zero2_no_offload . json
########启动命令########
torchrun - - nnodes 1 - - nproc_per_node 1 run_clm_pt_with_peft . py
- - deepspeed ${ deepspeed_config_file }
- - model_name_or_path ${ pretrained_model }
- - tokenizer_name_or_path ${ chinese_tokenizer_path }
- - dataset_dir ${ dataset_dir }
- - data_cache_dir ${ data_cache }
- - validation_split_percentage 0.001
- - per_device_train_batch_size ${ per_device_train_batch_size }
- - per_device_eval_batch_size ${ per_device_eval_batch_size }
- - do_train
- - seed $ RANDOM
- - fp16
- - max_steps ${ training_steps }
- - lr_scheduler_type cosine
- - learning_rate ${ lr }
- - warmup_ratio 0.05
- - weight_decay 0.01
- - logging_strategy steps
- - logging_steps 10
- - save_strategy steps
- - save_total_limit 3
- - save_steps 500
- - gradient_accumulation_steps ${ gradient_accumulation_steps }
- - preprocessing_num_workers 8
- - block_size 512
- - output_dir ${ output_dir }
- - overwrite_output_dir
- - ddp_timeout 30000
- - logging_first_step True
- - lora_rank ${ lora_rank }
- - lora_alpha ${ lora_alpha }
- - trainable ${ lora_trainable }
- - modules_to_save ${ modules_to_save }
- - lora_dropout ${ lora_dropout }
- - torch_dtype float16
- - gradient_checkpointing
- - ddp_find_unused_parameters FalseParameter description:
--model_name_or_path : The directory where the original HF format LLaMA model is located;--tokenizer_name_or_path : The directory where Chinese-LLaMA tokenizer is located (result of merge_tokenizers.py synthesis);--dataset_dir : A directory of pre-trained data, which can contain multiple plain text files ending in txt;--data_cache_dir : Specify a directory where data cache files are stored;Multiple machines and multiple cards:
torchrun
- - nnodes ${ num_nodes }
- - nproc_per_node ${ num_gpu_per_node }
- - node_rank ${ node_rank }
- - master_addr ${ master_addr }
- - master_port ${ master_port }
run_clm_pt_with_peft . py
...The Chinese LLaMA model has expanded the Chinese vocabulary list based on the original version and used Chinese general plain text data for secondary pre-training. Here the author provides two ways to download these pre-training weights without requiring us to spend resources on training ourselves:
| Model name | Training data | Refactoring the model | size | LoRA Download |
|---|---|---|---|---|
| Chinese-LLaMA-7B | General 20G | Original LLaMA-7B | 770M | [Baidu Netdisk] [Google Drive] |
| Chinese-LLaMA-Plus-7B ️ | General purpose 120G | Original LLaMA-7B | 790M | [Baidu Netdisk] [Google Drive] |
| Chinese-LLaMA-13B | General 20G | Original LLaMA-13B | 1G | [Baidu Netdisk] [Google Drive] |
| Chinese-LLaMA-Plus-13B ️ | General purpose 120G | Original LLaMA-13B | 1G | [Baidu Netdisk] [Google Drive] |
.from_pretrained() .| Model name | Model call name | Link |
|---|---|---|
| Chinese-LLaMA-7B | ziqingyang/chinese-llama-lora-7b | Model Hub Link |
| Chinese-LLaMA-Plus-7B | ziqingyang/chinese-llama-plus-lora-7b | Model Hub Link |
| Chinese-LLaMA-13B | ziqingyang/chinese-llama-lora-13b | Model Hub Link |
| Chinese-LLaMA-Plus-13B | ziqingyang/chinese-llama-plus-lora-13b | Model Hub Link |
The training scheme also uses LoRA for efficient fine adjustment and further increases the number of trainable parameters.
Single-player single card:
########参数部分########
lr = 1e-4
lora_rank = 8
lora_alpha = 32
lora_trainable = "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save = "embed_tokens,lm_head"
lora_dropout = 0.05
pretrained_model = path / to / hf / llama / or / merged / llama / dir / or / model_id
chinese_tokenizer_path = path / to / chinese / llama / tokenizer / dir
dataset_dir = path / to / sft / data / dir
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
training_steps = 100
gradient_accumulation_steps = 1
output_dir = output_dir
peft_model = path / to / peft / model / dir
validation_file = validation_file_name
deepspeed_config_file = ds_zero2_no_offload . json
########启动命令########
torchrun - - nnodes 1 - - nproc_per_node 1 run_clm_sft_with_peft . py
- - deepspeed ${ deepspeed_config_file }
- - model_name_or_path ${ pretrained_model }
- - tokenizer_name_or_path ${ chinese_tokenizer_path }
- - dataset_dir ${ dataset_dir }
- - validation_split_percentage 0.001
- - per_device_train_batch_size ${ per_device_train_batch_size }
- - per_device_eval_batch_size ${ per_device_eval_batch_size }
- - do_train
- - do_eval
- - seed $ RANDOM
- - fp16
- - max_steps ${ training_steps }
- - lr_scheduler_type cosine
- - learning_rate ${ lr }
- - warmup_ratio 0.03
- - weight_decay 0
- - logging_strategy steps
- - logging_steps 10
- - save_strategy steps
- - save_total_limit 3
- - evaluation_strategy steps
- - eval_steps 250
- - save_steps 500
- - gradient_accumulation_steps ${ gradient_accumulation_steps }
- - preprocessing_num_workers 8
- - max_seq_length 512
- - output_dir ${ output_dir }
- - overwrite_output_dir
- - ddp_timeout 30000
- - logging_first_step True
- - lora_rank ${ lora_rank }
- - lora_alpha ${ lora_alpha }
- - trainable ${ lora_trainable }
- - modules_to_save ${ modules_to_save }
- - lora_dropout ${ lora_dropout }
- - torch_dtype float16
- - validation_file ${ validation_file }
- - peft_path ${ peft_model }
- - gradient_checkpointing
- - ddp_find_unused_parameters FalseParameter description:
--tokenizer_name_or_path : The directory where Chinese-Alpaca tokenizer is located (result of merge_tokenizers.py synthesis);--dataset_dir : A directory for instruction fine-tuning data, containing one or more instruction fine-tuning data files in Stanford Alpaca format ending in json;--validation_file : A single instruction fine-tuning file used as a validation set, ending in json, also follows the Stanford Alpaca format;The so-called Stanford Alpaca format is:
[
{ "instruction" : ...,
"input" : ...,
"output" : ... },
...
]The data here can also be generated using Chinese-LLaMA-Alpaca-Usage/# preparation data-generating method.
Configuration instructions:
If you want to continue training the LoRA weights of the Chinese-Alpaca model:
--model_name_or_path : The original HF format LLaMA model (if you continue to train non-Plus Alpaca models) or merge Chinese-LLaMA-Plus-LoRA (if you continue to train Plus models);--peft_path : Chinese-Alpaca's LoRA weight directory; No need to specify --lora_rank , --lora_alpha , --lora_dropout , --trainable and --modules_to_save parameters.
If you want to fine-tune LoRA weights based on Chinese-LLaMA training:
--model_name_or_path : Merge the HF format Chinese-LLaMA model after Chinese-LLaMA-LoRA (regardless of whether it is a Plus model or not);--peft_path : Do not provide this parameter and delete --peft_path from the script; --lora_rank , --lora_alpha , --lora_dropout , --trainable and --modules_to_save parameters need to be specified.
Multiple machines and multiple cards:
torchrun
- - nnodes ${ num_nodes }
- - nproc_per_node ${ num_gpu_per_node }
- - node_rank ${ node_rank }
- - master_addr ${ master_addr }
- - master_port ${ master_port }
run_clm_sft_with_peft . py
...Suitable for Chinese-LLaMA, Chinese-LLaMA-Plus, Chinese-Alpaca
python scripts / merge_llama_with_chinese_lora . py
- - base_model path_to_original_llama_hf_dir
- - lora_model path_to_chinese_llama_or_alpaca_lora
- - output_type [ pth | huggingface ]
- - output_dir path_to_output_dir Parameter description:
--base_model : The directory where LLaMA model weights and configuration files are stored in HF format;--lora_model : The directory where the file is decompressed in Chinese LLaMA/Alpaca LoRA, you can also use the Model Hub model call name;--output_type : Specifies the output format, which can be pth or huggingface . If not specified, the default is pth ;--output_dir : Specifies the directory that saves the full model weights, the default is ./ ;--offload_dir : For low-memory users, you need to specify an offload cache path; Further explanation on output_type :
.pth files can be used for quantization and deployment of: llama.cpp tools;.bin files can be used for: Transformers for reasoning; text-generation-webui for building interfaces;Single LoRA weight merger is performed online and quantifies it simultaneously:
Merging Chinese-Alpaca-Plus requires two LoRA weights, namely Chinese-LLaMA-Plus-LoRA and Chinese-Alpaca-Plus-LoRA.
python scripts / merge_llama_with_chinese_lora . py
- - base_model path_to_original_llama_hf_dir
- - lora_model path_to_chinese_llama_plus_lora , path_to_chinese_alpaca_plus_lora
- - output_type [ pth | huggingface ]
- - output_dir path_to_output_dir Multi-LoRA weight merger is performed online and quantifies it simultaneously:
CUDA_VISIBLE_DEVICES = 0 python scripts / inference_hf . py
- - base_model path_to_original_llama_hf_dir
- - lora_model path_to_chinese_llama_or_alpaca_lora
- - with_prompt
- - interactive If the merge_llama_with_chinese_lora_to_hf.py script has been executed before to merge lora weights, then there is no need to specify --lora_model , and the startup method is simpler:
CUDA_VISIBLE_DEVICES = 0 python scripts / inference_hf . py
- - base_model path_to_merged_llama_or_alpaca_hf_dir
- - with_prompt
- - interactive If CUDA_VISIBLE_DEVICES=0 is deleted, it is CPU reasoning mode. Of course, you can also run and deploy it in WebUI. <sources>
The models in this project mainly support the following quantification, reasoning and deployment methods.
| Reasoning and deployment methods | Features | platform | CPU | GPU | Quantitative loading | Graphic interface | Tutorial |
|---|---|---|---|---|---|---|---|
| llama.cpp | Rich quantitative options and efficient local reasoning | General | ✅ | ✅ | ✅ | Link | |
| ?Transformers | Native transformers inference interface | General | ✅ | ✅ | ✅ | ✅ | Link |
| text-generation-webui | How to deploy the front-end Web UI interface | General | ✅ | ✅ | ✅ | ✅ | Link |
| LlamaChat | Graphic interaction interface under macOS (need to be matched with the llama.cpp model) | MacOS | ✅ | ✅ | ✅ | Link |