LLMs_train Download - LLMs_train Source code download

LLMs_train

AI Source Code

1.0.0

Download

LLMs-train: A set of code instruction fine-tuning large model

This project aims to instruct the fine-tuning of multi-class base models to realize LORA + DeepSpeed + single card/multi-card fine-tuning. The currently tested models are shown in the table below:

Test the model	language	Test weights
Chinese-LLaMA-Alpaca	Chinese	chinese-alpaca-plus-lora-13b
Open-LLaMA	English	open_llama_13b
		open_llama_7b
BELLE	Chinese	BELLE-LLaMA-EXT-13B
		BELLE-LLaMA-EXT-7B
BLOOM	English	bloomz-1b7
		bloomz-7b1
ChatGLM-6B	Chinese	ChatGLM-6B
		ChatGLM2-6B
Baichuan	Chinese	baichuan-7B
	Chinese	baichuan-13B-Chat
TigerBot	Chinese	tigerbot-7b-sft
		tigerbot-7b-base
Python	English	pythia-1b-deduped
		pythia-12b-deduped

TODO:

Change log

[2023-7-31] Release the first version of the code, including LoRA+single/multi-card fine-tuning and word participle training. The tested models include: Chinese-LLaMA-Alpaca, Open-LLaMA, BELLE, BLOOM, ChatGLM-6B, baichuan, TigerBot, Python.

run

1. Data preparation

Here we use CCKS2023-PromptCBLUE Chinese Medical Big Model to evaluate the data set in the benchmark competition as an example. This dataset transforms the "Chinese Medical Information Processing Challenge CBLUE" dataset, transforming all 16 different medical scenarios NLP tasks into prompt-based language generation tasks, forming the first LLM evaluation benchmark for Chinese medical scenarios.

PromptCBLUE uses 94 instruction fine-tuning templates to perform various tasks in the CBLUE benchmark. After transformation, all medical text NLP datasets will be converted into the following format. The input field string is the input to the LLM model, and the target field is also a string, which is the text sequence that the LLM model needs to generate.

{
	"input" : str,
	"target" : str,
	"type" : str,
	"answer_choices" : str,
	"sample_id" : str,
}

In order to facilitate quick verification, we extracted the CHIP-CTC subdataset, including 6000 training sets, 1100 verification sets, and 1060 test sets. Download address

2. Model preparation

The model can be downloaded locally. During training, model_name_or_path parameter is passed to the model's path, or you can only pass the name of the model on the Hugging Face, such as THUDM/chatglm-6b , and the code will automatically download the model.

Some models of LLaMA class require model conversion, and the models involved include: chinese-alpaca-plus-lora-13b, refer to the conversion method here.

3. Environment and configuration

Environmental preparation

conda create -n llms_train python=3.9
conda activate llms_train
pip install -r requirements.txt

LoRA configuration

There are LoRA configuration files for various models in the config.py file, which can be customized and modified. The configuration file contents are as follows:

 ' glm ' : {
    " lora_r " : 8,
    " lora_alpha " : 32,
    " lora_dropout " : 0.05,
    " lora_target_modules " : " query_key_value,dense,dense_h_to_4h,dense_4h_to_h " ,
    " modules_to_save " : " null "
},

Field description:

lora_r : The rank of LoRA $r$ ;
lora_alpha : $frac{alpha}{r} Delta Wx$ In-house $alpha$ ;
lora_dropout : The dropout probability of the LoRA layer;
lora_target_modules : Which modules does LoRA hang on;
modules_to_save : In addition to the LoRA layer, which modules are set to trainable and will be saved in the last checkpoint.

Deepspeed configuration

ZeRO2 configuration is used here:

 {
    " fp16 " : {
        " enabled " : " auto " ,
        " loss_scale " : 0,
        " loss_scale_window " : 100,
        " initial_scale_power " : 16,
        " hysteresis " : 2,
        " min_loss_scale " : 1e-10
    },
    " bf16 " : {
        " enabled " : " auto "
    },
    " zero_optimization " : {
        " stage " : 2,
        " allgather_partitions " : true,
        " allgather_bucket_size " : 5e8,
        " overlap_comm " : true,
        " reduce_scatter " : true,
        " reduce_bucket_size " : 5e8,
        " contiguous_gradients " : true
    },

    " gradient_accumulation_steps " : " auto " ,
    " gradient_clipping " : " auto " ,
    " steps_per_print " : 2000,
    " train_batch_size " : " auto " ,
    " train_micro_batch_size_per_gpu " : " auto " ,
    " wall_clock_breakdown " : false
}

For strategies for multi-card parallel training, please refer to here.

Other configurations

config.py has several other configurations: MODEL_MAP , TOKENIZER_MAP , SPECIAL_IDS , select different Model calss and Tokenizer Class according to model_type parameter, and select special token id according to model_name_or_path . model_type value and corresponding model are as follows:

Value llama : You can call LLaMA-type models such as chinese-alpaca-plus-lora-13b, open_llama_13b, open_llama_7b, BELLE-LLaMA-EXT-13B, BELLE-LLaMA-EXT-7B, tigerbot-7b-sft, tigerbot-7b-base, etc.
Take the value glm : ChatGLM-6B and ChatGLM2-6B can be called.
Take the value bloom : BLOOM-type models such as bloomz-1b7, bloomz-7b1, etc. can be called.
Take the value pythia : You can call Python-1b-deduped, pythia-12b-deduped and other Python models.

4. Fine adjustment

Run scripts/train.sh . The contents of the file are as follows:

LR=2e-4
model_name_or_path= " ../models/pythia-12b-deduped "   # LLM底座模型路径，或者是huggingface hub上的模型名称
model_type= ' pythia '
your_data_path= " ./datasets/PromptCBLUE "  # 填入数据集所在的文件夹路径
your_checkpopint_path= " ./experiments/outputs "  # 填入用来存储模型的路径
max_steps=100
max_source_length=256
max_target_length=16

peft_path= " "  # 如果之前训练过，且存储了peft权重，则设置为peft权重的文件夹路径

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 train.py 
    --deepspeed configs/ds_zero2_no_offload.json 
    --do_train 
    --do_eval 
    --model_name_or_path $model_name_or_path 
    --model_type $model_type 
    --use_lora True 
    --fp16 
    --train_file $your_data_path /train_CHIP-CTC.json 
    --validation_file $your_data_path /dev_CHIP-CTC.json 
    --preprocessing_num_workers 8 
    --cache_dir $your_data_path 
    --prompt_column input 
    --response_column target 
    --output_dir $your_checkpopint_path /test-pythia-12b-deduped-lora- $LR 
    --overwrite_output_dir 
    --max_source_length $max_source_length 
    --max_target_length $max_target_length 
    --per_device_train_batch_size 4 
    --per_device_eval_batch_size 4 
    --gradient_accumulation_steps 16 
    --max_steps $max_steps 
    --logging_steps 10 
    --save_strategy steps 
    --save_steps 50 
    --save_total_limit 3 
    --evaluation_strategy steps 
    --eval_steps 50 
    --learning_rate $LR

The parameters are as follows:

deepspeed : the configuration file path of deepspeed
do_train : bool, whether to enable training
do_eval : bool, whether to verify on the verification set, if evaluation_strategy is not equal to "no", it will be set to True
model_name_or_path : The name of the model on the hugging face, or the path that already exists locally
model_type : The type of model, optional options include llama , glm , bloom , pythia , baichuan , other
use_lora : Use lora fine-tuning, default is True , otherwise it is full fine-tuning
fp16 : Whether to use fp16 (mixed) precision to train
train_file : training set data file
validation_file : Verification set data file
preprocessing_num_workers : Number of workers when batch participle data
cache_dir : cache path to HF model
prompt_column : The field name entered in the sample
response_column : The field name output in the sample
output_dir : the path to save the training result
overwrite_output_dir : If set to True , overwrite the output folder
max_source_length : The maximum length of the input text
max_target_length : Maximum length of output text
pre_device_train_batch_size : batch size on each card during training
pre_device_eval_batch_size : batch size on each card during verification/test
gradient_accumulation_steps : gradient accumulation rounds
max_steps : The number of training rounds, one round contains the number of samples: GPU数量* pre_device_train_batch_size * gradient_accumulation_steps
logging_steps : how many rounds of log printing
save_strategy : During the training process, the intermediate results are saved according to the number of steps or epoch numbers. The optional values are no , steps , and epoch
save_steps : save checkpoint every step
evaluation_strategy : Run the verification set according to the number of steps or epoch numbers. The optional values are no , steps , and epoch
eval_steps : Verification every number of steps
learning_rate : learning rate

If it is multi-card training, please modify the corresponding one in sh: CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 . For example, 4-card training can be changed to: CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 .

Notice:

model_name_or_path must correspond correctly to model_type .
bos_id , eos_id , and pad_id of some models are not completely consistent. SPECIAL_IDS in config.py specifies the special token id of each model. In addition to the models that have been tested, it needs to be added manually by yourself.

5. Reasoning

Run the inference script:

CUDA_VISIBLE_DEVICES=0 python inference.py 
    --model_name_or_path experiments/outputs/PromptCBLUE-chatglm-6b-lora-2e-4 
    --ckpt_path experiments/outputs/PromptCBLUE-chatglm-6b-lora-2e-4/checkpoint-9690 
    --model_type glm 
    --data_file ./datasets/PromptCBLUE/test.json 
    --cache_dir ./datasets/PromptCBLUE 
    --use_lora

Problem record:

If the /work directory does not have permission, add environment variables: export HF_MODULES_CACHE=~/.cache/huggingface
sh add permissions: chmod u+x xxx.sh

Basics of AI

Basics of Big Models
AI World

Continuous update...

Acknowledgements

Thanks to the community for its excellent open source models: ChatGLM-6B (ChatGLM2), Chinese-LLaMA-Alpaca, openllama, BLOOM, BELLE, Python, GPTNeoX, Baichuan.

This project also refers to the following excellent open source projects:

PromptCBLUE
sentencepiece_chinese_bpe
Chatglm_lora_multi-gpu
ChatGLM-Efficient-Tuning
zero_nlp

Disclaimer

This project is for study and research only . The training results of the model are affected by factors such as the model's own structure, randomness, training parameters, data sets, etc. This project is not responsible for the results of the model training, nor is it responsible for the content of the model generation, nor is it responsible for any losses caused by the use of this project. This project is developed and maintained by individuals in their spare time. Due to limited time and limited author level, the timeliness of replying to related questions cannot be guaranteed. However, a communication group will be established in the future. Everyone is welcome to learn and help each other.

Quote

If this project is helpful to you, please refer to it in the following format:

@software{LLMs_train,
  title = {{LLMs_train: A Set of Code to Fine-Tune Large Language Models}},
  author = {Xudong Li},
  year = {2023},
  url = {https://www.github.com/5663015/LLMs_train},
}

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-09-05
size 536.5KB
From Github

Related Applications

Slingshot Train game latest version

2024-01-29
Train Sim World

2022-08-26
Train Your Minibot

2022-08-06
Russian Train Trip

2022-07-27
Train Between

2022-07-27
Wrong Train

2022-07-25

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All