This project aims to instruct the fine-tuning of multi-class base models to realize LORA + DeepSpeed + single card/multi-card fine-tuning. The currently tested models are shown in the table below:
| Test the model | language | Test weights |
|---|---|---|
| Chinese-LLaMA-Alpaca | Chinese | chinese-alpaca-plus-lora-13b |
| Open-LLaMA | English | open_llama_13b |
| open_llama_7b | ||
| BELLE | Chinese | BELLE-LLaMA-EXT-13B |
| BELLE-LLaMA-EXT-7B | ||
| BLOOM | English | bloomz-1b7 |
| bloomz-7b1 | ||
| ChatGLM-6B | Chinese | ChatGLM-6B |
| ChatGLM2-6B | ||
| Baichuan | Chinese | baichuan-7B |
| Chinese | baichuan-13B-Chat | |
| TigerBot | Chinese | tigerbot-7b-sft |
| tigerbot-7b-base | ||
| Python | English | pythia-1b-deduped |
| pythia-12b-deduped |
TODO:
Here we use CCKS2023-PromptCBLUE Chinese Medical Big Model to evaluate the data set in the benchmark competition as an example. This dataset transforms the "Chinese Medical Information Processing Challenge CBLUE" dataset, transforming all 16 different medical scenarios NLP tasks into prompt-based language generation tasks, forming the first LLM evaluation benchmark for Chinese medical scenarios.
PromptCBLUE uses 94 instruction fine-tuning templates to perform various tasks in the CBLUE benchmark. After transformation, all medical text NLP datasets will be converted into the following format. The input field string is the input to the LLM model, and the target field is also a string, which is the text sequence that the LLM model needs to generate.
{
"input" : str,
"target" : str,
"type" : str,
"answer_choices" : str,
"sample_id" : str,
} In order to facilitate quick verification, we extracted the CHIP-CTC subdataset, including 6000 training sets, 1100 verification sets, and 1060 test sets. Download address
The model can be downloaded locally. During training, model_name_or_path parameter is passed to the model's path, or you can only pass the name of the model on the Hugging Face, such as THUDM/chatglm-6b , and the code will automatically download the model.
Some models of LLaMA class require model conversion, and the models involved include: chinese-alpaca-plus-lora-13b, refer to the conversion method here.
conda create -n llms_train python=3.9
conda activate llms_train
pip install -r requirements.txt There are LoRA configuration files for various models in the config.py file, which can be customized and modified. The configuration file contents are as follows:
' glm ' : {
" lora_r " : 8,
" lora_alpha " : 32,
" lora_dropout " : 0.05,
" lora_target_modules " : " query_key_value,dense,dense_h_to_4h,dense_4h_to_h " ,
" modules_to_save " : " null "
},Field description:
lora_r : The rank of LoRA lora_alpha : lora_dropout : The dropout probability of the LoRA layer;lora_target_modules : Which modules does LoRA hang on;modules_to_save : In addition to the LoRA layer, which modules are set to trainable and will be saved in the last checkpoint. ZeRO2 configuration is used here:
{
" fp16 " : {
" enabled " : " auto " ,
" loss_scale " : 0,
" loss_scale_window " : 100,
" initial_scale_power " : 16,
" hysteresis " : 2,
" min_loss_scale " : 1e-10
},
" bf16 " : {
" enabled " : " auto "
},
" zero_optimization " : {
" stage " : 2,
" allgather_partitions " : true,
" allgather_bucket_size " : 5e8,
" overlap_comm " : true,
" reduce_scatter " : true,
" reduce_bucket_size " : 5e8,
" contiguous_gradients " : true
},
" gradient_accumulation_steps " : " auto " ,
" gradient_clipping " : " auto " ,
" steps_per_print " : 2000,
" train_batch_size " : " auto " ,
" train_micro_batch_size_per_gpu " : " auto " ,
" wall_clock_breakdown " : false
}For strategies for multi-card parallel training, please refer to here.
config.py has several other configurations: MODEL_MAP , TOKENIZER_MAP , SPECIAL_IDS , select different Model calss and Tokenizer Class according to model_type parameter, and select special token id according to model_name_or_path . model_type value and corresponding model are as follows:
llama : You can call LLaMA-type models such as chinese-alpaca-plus-lora-13b, open_llama_13b, open_llama_7b, BELLE-LLaMA-EXT-13B, BELLE-LLaMA-EXT-7B, tigerbot-7b-sft, tigerbot-7b-base, etc.glm : ChatGLM-6B and ChatGLM2-6B can be called.bloom : BLOOM-type models such as bloomz-1b7, bloomz-7b1, etc. can be called.pythia : You can call Python-1b-deduped, pythia-12b-deduped and other Python models. Run scripts/train.sh . The contents of the file are as follows:
LR=2e-4
model_name_or_path= " ../models/pythia-12b-deduped " # LLM底座模型路径,或者是huggingface hub上的模型名称
model_type= ' pythia '
your_data_path= " ./datasets/PromptCBLUE " # 填入数据集所在的文件夹路径
your_checkpopint_path= " ./experiments/outputs " # 填入用来存储模型的路径
max_steps=100
max_source_length=256
max_target_length=16
peft_path= " " # 如果之前训练过,且存储了peft权重,则设置为peft权重的文件夹路径
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 train.py
--deepspeed configs/ds_zero2_no_offload.json
--do_train
--do_eval
--model_name_or_path $model_name_or_path
--model_type $model_type
--use_lora True
--fp16
--train_file $your_data_path /train_CHIP-CTC.json
--validation_file $your_data_path /dev_CHIP-CTC.json
--preprocessing_num_workers 8
--cache_dir $your_data_path
--prompt_column input
--response_column target
--output_dir $your_checkpopint_path /test-pythia-12b-deduped-lora- $LR
--overwrite_output_dir
--max_source_length $max_source_length
--max_target_length $max_target_length
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 16
--max_steps $max_steps
--logging_steps 10
--save_strategy steps
--save_steps 50
--save_total_limit 3
--evaluation_strategy steps
--eval_steps 50
--learning_rate $LR The parameters are as follows:
deepspeed : the configuration file path of deepspeeddo_train : bool, whether to enable trainingdo_eval : bool, whether to verify on the verification set, if evaluation_strategy is not equal to "no", it will be set to Truemodel_name_or_path : The name of the model on the hugging face, or the path that already exists locallymodel_type : The type of model, optional options include llama , glm , bloom , pythia , baichuan , otheruse_lora : Use lora fine-tuning, default is True , otherwise it is full fine-tuningfp16 : Whether to use fp16 (mixed) precision to traintrain_file : training set data filevalidation_file : Verification set data filepreprocessing_num_workers : Number of workers when batch participle datacache_dir : cache path to HF modelprompt_column : The field name entered in the sampleresponse_column : The field name output in the sampleoutput_dir : the path to save the training resultoverwrite_output_dir : If set to True , overwrite the output foldermax_source_length : The maximum length of the input textmax_target_length : Maximum length of output textpre_device_train_batch_size : batch size on each card during trainingpre_device_eval_batch_size : batch size on each card during verification/testgradient_accumulation_steps : gradient accumulation roundsmax_steps : The number of training rounds, one round contains the number of samples: GPU数量* pre_device_train_batch_size * gradient_accumulation_stepslogging_steps : how many rounds of log printingsave_strategy : During the training process, the intermediate results are saved according to the number of steps or epoch numbers. The optional values are no , steps , and epochsave_steps : save checkpoint every stepevaluation_strategy : Run the verification set according to the number of steps or epoch numbers. The optional values are no , steps , and epocheval_steps : Verification every number of stepslearning_rate : learning rate If it is multi-card training, please modify the corresponding one in sh: CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 . For example, 4-card training can be changed to: CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 .
Notice:
model_name_or_path must correspond correctly to model_type .bos_id , eos_id , and pad_id of some models are not completely consistent. SPECIAL_IDS in config.py specifies the special token id of each model. In addition to the models that have been tested, it needs to be added manually by yourself.Run the inference script:
CUDA_VISIBLE_DEVICES=0 python inference.py
--model_name_or_path experiments/outputs/PromptCBLUE-chatglm-6b-lora-2e-4
--ckpt_path experiments/outputs/PromptCBLUE-chatglm-6b-lora-2e-4/checkpoint-9690
--model_type glm
--data_file ./datasets/PromptCBLUE/test.json
--cache_dir ./datasets/PromptCBLUE
--use_loraProblem record:
/work directory does not have permission, add environment variables: export HF_MODULES_CACHE=~/.cache/huggingfacechmod u+x xxx.sh Continuous update...
Thanks to the community for its excellent open source models: ChatGLM-6B (ChatGLM2), Chinese-LLaMA-Alpaca, openllama, BLOOM, BELLE, Python, GPTNeoX, Baichuan.
This project also refers to the following excellent open source projects:
PromptCBLUE
sentencepiece_chinese_bpe
Chatglm_lora_multi-gpu
ChatGLM-Efficient-Tuning
zero_nlp
This project is for study and research only . The training results of the model are affected by factors such as the model's own structure, randomness, training parameters, data sets, etc. This project is not responsible for the results of the model training, nor is it responsible for the content of the model generation, nor is it responsible for any losses caused by the use of this project. This project is developed and maintained by individuals in their spare time. Due to limited time and limited author level, the timeliness of replying to related questions cannot be guaranteed. However, a communication group will be established in the future. Everyone is welcome to learn and help each other.
If this project is helpful to you, please refer to it in the following format:
@software{LLMs_train,
title = {{LLMs_train: A Set of Code to Fine-Tune Large Language Models}},
author = {Xudong Li},
year = {2023},
url = {https://www.github.com/5663015/LLMs_train},
}