Chinese LLaMA Alpaca Usage Download - Chinese LLaMA Alpaca Usage Source code download

This project is based on Chinese-LLaMA-Alpaca V3.1 for instructions. Chinese-LLaMA-Alpaca has pioneered Chinese expansion and improvement based on LLaMA. Based on the original LLaMA, it expanded the Chinese vocabulary list and used Chinese data for secondary pre-training, further improving the basic semantic understanding ability of Chinese.

Project composition : <sources>

.
├── README.md # 使用说明文件
├── SHA256.md # LLaMA模型SHA值对比文件
├── notebooks
│   ├── convert_and_quantize_chinese_alpaca_plus.ipynb
│   └── convert_and_quantize_chinese_llama.ipynb
├── requirements.txt # 依赖文件
└── scripts
    ├── chinese_sp.model # 中文词表文件
    ├── crawl_prompt.py # 1. 通过OpenAI的大模型（如ChatGPT、GPT4等）生成可用于微调的数据
    ├── inference_hf.py # 5. 对微调训练产生的LoRA模型和原始LLaMA模型做推理
    ├── merge_llama_with_chinese_lora.py # 4. 合并模型权重
    ├── merge_tokenizers.py # 2. 词表扩充
    └── run_clm_pt_with_peft.py # 3. 对模型进行训练或者微调

1. Prepare data <sources>

Whether you want to pre-train or fine-tune, you need to prepare data. There are two ways to prepare data:

(public) If you can use publicly standard data that can be used for fine-tuning or training, you can skip this step;
(Generate) If you do not have appropriate fine-tuning or training data, you can use scripts/crawl_prompt.py to generate the corresponding data. The basic idea is to use ChatGPT or other OpenAI efficient models for data generation.

2. Prepare the LLaMA weight

 # tokenizer
wget https : // agi . gpt4 . org / llama / LLaMA / tokenizer . model - O . / tokenizer . model
wget https : // agi . gpt4 . org / llama / LLaMA / tokenizer_checklist . chk - O . / tokenizer_checklist . chk
# 7B
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / consolidated . 00. pth - O . / 7 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / params . json - O . / 7 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 7 B / checklist . chk - O . / 7 B / checklist . chk
# 13B
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / consolidated . 00. pth - O . / 13 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / consolidated . 01. pth - O . / 13 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / params . json - O . / 13 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 13 B / checklist . chk - O . / 13 B / checklist . chk
# 30B
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 00. pth - O . / 30 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 01. pth - O . / 30 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 02. pth - O . / 30 B / consolidated . 02. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / consolidated . 03. pth - O . / 30 B / consolidated . 03. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / params . json - O . / 30 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 30 B / checklist . chk - O . / 30 B / checklist . chk
# 65B
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 00. pth - O . / 65 B / consolidated . 00. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 01. pth - O . / 65 B / consolidated . 01. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 02. pth - O . / 65 B / consolidated . 02. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 03. pth - O . / 65 B / consolidated . 03. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 04. pth - O . / 65 B / consolidated . 04. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 05. pth - O . / 65 B / consolidated . 05. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 06. pth - O . / 65 B / consolidated . 06. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / consolidated . 07. pth - O . / 65 B / consolidated . 07. pth
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / params . json - O . / 65 B / params . json
wget https : // agi . gpt4 . org / llama / LLaMA / 65 B / checklist . chk - O . / 65 B / checklist . chk

You need to download the LLaMA model weights of different size parameters. The larger the parameters, the larger the weight, the larger the weight, the better the accuracy, and the fine-tuning and training time are also relatively long. Generally speaking, the 7B or 13B model will be the choice of most people.

Be sure to confirm the integrity of the LLaMA base model and check whether it is consistent with the value shown in SHA256.md, otherwise the merge operation cannot be performed.

3. Convert to HF format weight

 # 安装依赖库
pip install git + https : // github . com / huggingface / transformers

# 转化HF权重
python - m transformers . models . llama . convert_llama_weights_to_hf 
   - - input_dir llama - weights 
   - - model_size 7 B 
   - - output_dir llama - hf - weights
  
> python - m transformers . models . llama . convert_llama_weights_to_hf - - input_dir . / - - model_size 7 B - - output_dir . / output / 7 B - hf

If you don't want to convert manually yourself, you can also use the LLaMA-HF model that others have converted. pinkmanlove has the weight of the converted LLaMA-HF in HuggingFace. If it fails, you can search for other people to convert well in HuggingFace-Models .

4. Training and fine-tuning the model <sources>

The entire training and fine-tuning process consists of three steps:

Vocabulary expansion;
Pre-training (optional);
Fine-tuning of instructions;

4.1 Vocabulary expansion <sources>

 python scripts / merge_tokenizers . py 
  - - llama_tokenizer_dir llama_tokenizer_dir 
  - - chinese_sp_model_file chinese_sp_model_file

> python scripts / merge_tokenizers . py - - llama_tokenizer_dir output / 7 B - hf - - chinese_sp_model_file scripts / chinese_sp . model

Parameter description:

llama_tokenizer_dir : Point to the directory where the original LLaMA tokenizer is stored;
chinese_sp_model_file : Point to the Chinese vocabulary file (chinese_sp.model) trained with sentencepiece;

Note
There are two major methods for expanding the vocabulary list: (1) merge and expand the vocabulary list; (2) find a large vocabulary list and delete useless words to get a vocabulary list;

4.2 Pre-training (optional)

During the pre-training stage, the general Chinese corpus is used to further pre-train on the basis of the original LLaMA weights. The process is divided into two stages:

The first stage: freeze transformer parameters, train embedding only, and adapt to the newly added Chinese word vector without interfering with the original model;
The second stage: Use LoRA technology to add LoRA weights (adapters) to the model, and update the LoRA parameters while training embedding;

The model converges slowly in the first stage of pre-training. If there is not particularly abundant time and computing resources, it is recommended to skip this stage. The second stage of pre-training training is as follows (single-player single card):

 ########参数设置########
lr = 2e-4
lora_rank = 8
lora_alpha = 32
lora_trainable = "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save = "embed_tokens,lm_head"
lora_dropout = 0.05

pretrained_model = path / to / hf / llama / dir
chinese_tokenizer_path = path / to / chinese / llama / tokenizer / dir
dataset_dir = path / to / pt / data / dir
data_cache = temp_data_cache_dir
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
training_steps = 100
gradient_accumulation_steps = 1
output_dir = output_dir

deepspeed_config_file = ds_zero2_no_offload . json

########启动命令########
torchrun - - nnodes 1 - - nproc_per_node 1 run_clm_pt_with_peft . py 
    - - deepspeed ${ deepspeed_config_file } 
    - - model_name_or_path ${ pretrained_model } 
    - - tokenizer_name_or_path ${ chinese_tokenizer_path } 
    - - dataset_dir ${ dataset_dir } 
    - - data_cache_dir ${ data_cache } 
    - - validation_split_percentage 0.001 
    - - per_device_train_batch_size ${ per_device_train_batch_size } 
    - - per_device_eval_batch_size ${ per_device_eval_batch_size } 
    - - do_train 
    - - seed $ RANDOM 
    - - fp16 
    - - max_steps ${ training_steps } 
    - - lr_scheduler_type cosine 
    - - learning_rate ${ lr } 
    - - warmup_ratio 0.05 
    - - weight_decay 0.01 
    - - logging_strategy steps 
    - - logging_steps 10 
    - - save_strategy steps 
    - - save_total_limit 3 
    - - save_steps 500 
    - - gradient_accumulation_steps ${ gradient_accumulation_steps } 
    - - preprocessing_num_workers 8 
    - - block_size 512 
    - - output_dir ${ output_dir } 
    - - overwrite_output_dir 
    - - ddp_timeout 30000 
    - - logging_first_step True 
    - - lora_rank ${ lora_rank } 
    - - lora_alpha ${ lora_alpha } 
    - - trainable ${ lora_trainable } 
    - - modules_to_save ${ modules_to_save } 
    - - lora_dropout ${ lora_dropout } 
    - - torch_dtype float16 
    - - gradient_checkpointing 
    - - ddp_find_unused_parameters False

Parameter description:

--model_name_or_path : The directory where the original HF format LLaMA model is located;
--tokenizer_name_or_path : The directory where Chinese-LLaMA tokenizer is located (result of merge_tokenizers.py synthesis);
--dataset_dir : A directory of pre-trained data, which can contain multiple plain text files ending in txt;
--data_cache_dir : Specify a directory where data cache files are stored;

Multiple machines and multiple cards:

 torchrun 
  - - nnodes ${ num_nodes } 
  - - nproc_per_node ${ num_gpu_per_node } 
  - - node_rank ${ node_rank } 
  - - master_addr ${ master_addr } 
  - - master_port ${ master_port } 
  run_clm_pt_with_peft . py 
    ...

The Chinese LLaMA model has expanded the Chinese vocabulary list based on the original version and used Chinese general plain text data for secondary pre-training. Here the author provides two ways to download these pre-training weights without requiring us to spend resources on training ourselves:

(1) Google Drive or Baidu Netdisk

Model name	Training data	Refactoring the model	size	LoRA Download
Chinese-LLaMA-7B	General 20G	Original LLaMA-7B	770M	[Baidu Netdisk] [Google Drive]
Chinese-LLaMA-Plus-7B ️	General purpose 120G	Original LLaMA-7B	790M	[Baidu Netdisk] [Google Drive]
Chinese-LLaMA-13B	General 20G	Original LLaMA-13B	1G	[Baidu Netdisk] [Google Drive]
Chinese-LLaMA-Plus-13B ️	General purpose 120G	Original LLaMA-13B	1G	[Baidu Netdisk] [Google Drive]

(2) All the above models can be downloaded on the Model Hub and the Chinese LLaMA model can be called using transformers and PEFT. The following model call name refers to the model name specified in .from_pretrained() .

Model name	Model call name	Link
Chinese-LLaMA-7B	ziqingyang/chinese-llama-lora-7b	Model Hub Link
Chinese-LLaMA-Plus-7B	ziqingyang/chinese-llama-plus-lora-7b	Model Hub Link
Chinese-LLaMA-13B	ziqingyang/chinese-llama-lora-13b	Model Hub Link
Chinese-LLaMA-Plus-13B	ziqingyang/chinese-llama-plus-lora-13b	Model Hub Link

4.3 Instruction fine-tuning

The training scheme also uses LoRA for efficient fine adjustment and further increases the number of trainable parameters.

Single-player single card:

 ########参数部分########
lr = 1e-4
lora_rank = 8
lora_alpha = 32
lora_trainable = "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save = "embed_tokens,lm_head"
lora_dropout = 0.05

pretrained_model = path / to / hf / llama / or / merged / llama / dir / or / model_id
chinese_tokenizer_path = path / to / chinese / llama / tokenizer / dir
dataset_dir = path / to / sft / data / dir
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
training_steps = 100
gradient_accumulation_steps = 1
output_dir = output_dir
peft_model = path / to / peft / model / dir
validation_file = validation_file_name

deepspeed_config_file = ds_zero2_no_offload . json

########启动命令########
torchrun - - nnodes 1 - - nproc_per_node 1 run_clm_sft_with_peft . py 
    - - deepspeed ${ deepspeed_config_file } 
    - - model_name_or_path ${ pretrained_model } 
    - - tokenizer_name_or_path ${ chinese_tokenizer_path } 
    - - dataset_dir ${ dataset_dir } 
    - - validation_split_percentage 0.001 
    - - per_device_train_batch_size ${ per_device_train_batch_size } 
    - - per_device_eval_batch_size ${ per_device_eval_batch_size } 
    - - do_train 
    - - do_eval 
    - - seed $ RANDOM 
    - - fp16 
    - - max_steps ${ training_steps } 
    - - lr_scheduler_type cosine 
    - - learning_rate ${ lr } 
    - - warmup_ratio 0.03 
    - - weight_decay 0 
    - - logging_strategy steps 
    - - logging_steps 10 
    - - save_strategy steps 
    - - save_total_limit 3 
    - - evaluation_strategy steps 
    - - eval_steps 250 
    - - save_steps 500 
    - - gradient_accumulation_steps ${ gradient_accumulation_steps } 
    - - preprocessing_num_workers 8 
    - - max_seq_length 512 
    - - output_dir ${ output_dir } 
    - - overwrite_output_dir 
    - - ddp_timeout 30000 
    - - logging_first_step True 
    - - lora_rank ${ lora_rank } 
    - - lora_alpha ${ lora_alpha } 
    - - trainable ${ lora_trainable } 
    - - modules_to_save ${ modules_to_save } 
    - - lora_dropout ${ lora_dropout } 
    - - torch_dtype float16 
    - - validation_file ${ validation_file } 
    - - peft_path ${ peft_model } 
    - - gradient_checkpointing 
    - - ddp_find_unused_parameters False

Parameter description:

--tokenizer_name_or_path : The directory where Chinese-Alpaca tokenizer is located (result of merge_tokenizers.py synthesis);
--dataset_dir : A directory for instruction fine-tuning data, containing one or more instruction fine-tuning data files in Stanford Alpaca format ending in json;
--validation_file : A single instruction fine-tuning file used as a validation set, ending in json, also follows the Stanford Alpaca format;

The so-called Stanford Alpaca format is:

[
  { "instruction" : ...,
   "input" : ...,
   "output" : ... },
  ...
]

The data here can also be generated using Chinese-LLaMA-Alpaca-Usage/# preparation data-generating method.

Configuration instructions:

If you want to continue training the LoRA weights of the Chinese-Alpaca model:
- --model_name_or_path : The original HF format LLaMA model (if you continue to train non-Plus Alpaca models) or merge Chinese-LLaMA-Plus-LoRA (if you continue to train Plus models);
- --peft_path : Chinese-Alpaca's LoRA weight directory;

No need to specify --lora_rank , --lora_alpha , --lora_dropout , --trainable and --modules_to_save parameters.

If you want to fine-tune LoRA weights based on Chinese-LLaMA training:
- --model_name_or_path : Merge the HF format Chinese-LLaMA model after Chinese-LLaMA-LoRA (regardless of whether it is a Plus model or not);
- --peft_path : Do not provide this parameter and delete --peft_path from the script;

--lora_rank , --lora_alpha , --lora_dropout , --trainable and --modules_to_save parameters need to be specified.

Multiple machines and multiple cards:

 torchrun 
  - - nnodes ${ num_nodes } 
  - - nproc_per_node ${ num_gpu_per_node } 
  - - node_rank ${ node_rank } 
  - - master_addr ${ master_addr } 
  - - master_port ${ master_port } 
  run_clm_sft_with_peft . py 
    ...

5. Merge weights <sources>

5.1 Single LoRA weight overlap

Suitable for Chinese-LLaMA, Chinese-LLaMA-Plus, Chinese-Alpaca

 python scripts / merge_llama_with_chinese_lora . py 
    - - base_model path_to_original_llama_hf_dir 
    - - lora_model path_to_chinese_llama_or_alpaca_lora 
    - - output_type [ pth | huggingface ]
    - - output_dir path_to_output_dir

Parameter description:

--base_model : The directory where LLaMA model weights and configuration files are stored in HF format;
--lora_model : The directory where the file is decompressed in Chinese LLaMA/Alpaca LoRA, you can also use the Model Hub model call name;
--output_type : Specifies the output format, which can be pth or huggingface . If not specified, the default is pth ;
--output_dir : Specifies the directory that saves the full model weights, the default is ./ ;
(Optional) --offload_dir : For low-memory users, you need to specify an offload cache path;

Further explanation on output_type :

.pth files can be used for quantization and deployment of: llama.cpp tools;
.bin files can be used for: Transformers for reasoning; text-generation-webui for building interfaces;

Single LoRA weight merger is performed online and quantifies it simultaneously:

5.2 Multiple LoRA weight overlap

Merging Chinese-Alpaca-Plus requires two LoRA weights, namely Chinese-LLaMA-Plus-LoRA and Chinese-Alpaca-Plus-LoRA.

 python scripts / merge_llama_with_chinese_lora . py 
    - - base_model path_to_original_llama_hf_dir 
    - - lora_model path_to_chinese_llama_plus_lora , path_to_chinese_alpaca_plus_lora 
    - - output_type [ pth | huggingface ]
    - - output_dir path_to_output_dir

️ The order of the two LoRA models is important and cannot be reversed. Write LLaMA-Plus-LoRA first and then write Alpaca-Plus-LoRA. ️

Multi-LoRA weight merger is performed online and quantifies it simultaneously:

6. Deploy and run the model <sources>

 CUDA_VISIBLE_DEVICES = 0 python scripts / inference_hf . py 
    - - base_model path_to_original_llama_hf_dir 
    - - lora_model path_to_chinese_llama_or_alpaca_lora 
    - - with_prompt 
    - - interactive

If the merge_llama_with_chinese_lora_to_hf.py script has been executed before to merge lora weights, then there is no need to specify --lora_model , and the startup method is simpler:

 CUDA_VISIBLE_DEVICES = 0 python scripts / inference_hf . py 
    - - base_model path_to_merged_llama_or_alpaca_hf_dir 
    - - with_prompt 
    - - interactive

If CUDA_VISIBLE_DEVICES=0 is deleted, it is CPU reasoning mode. Of course, you can also run and deploy it in WebUI. <sources>

The models in this project mainly support the following quantification, reasoning and deployment methods.

Reasoning and deployment methods	Features	platform	CPU	GPU	Quantitative loading	Graphic interface	Tutorial
llama.cpp	Rich quantitative options and efficient local reasoning	General	✅	✅	✅		Link
?Transformers	Native transformers inference interface	General	✅	✅	✅	✅	Link
text-generation-webui	How to deploy the front-end Web UI interface	General	✅	✅	✅	✅	Link
LlamaChat	Graphic interaction interface under macOS (need to be matched with the llama.cpp model)	MacOS	✅		✅	✅	Link