LLaMA MOSS RLHF LoRA Download - LLaMA MOSS RLHF LoRA Source code download

LLaMA MOSS RLHF LoRA

AI Source Code

1.0.0

Download

LLaMA-MOSS-RLHF-LoRA

The RLHF code of this code does not require a Megatron or deepspeed framework, it only requires traditional alchemy torch and graphics cards. The Critic of RLHF uses a reduced version of the target GPT, and Reward we can use a similarity model that compares with the target output. In this way, you only need to learn the core PPO algorithm, and the rest are models and structures that you have already understood. It is very conducive to NLPer's entry into RLHF, and it seems that only RLHF is needed can finetune the model.

LLaMA or MOSS can be selected in the code, and the optimization method LoRA is optional.

Function:

Definition and use of RLHF data format√
The model was fine-tuned using only RLHF√
Let the model recognize its master√
- Modify self-cognitive steel stamp
  - Master's name
  - Robot's nickname
batch generates multiple different propts, and then RLHF×

Installation environment

The installation environment refers to the extracted requirement.txt, mainly torch, transformers

Running Moss requires an accelerate library
Running Lora requires peft
- Among them, peft has changed a lot due to its faster updates. Here you need to specify peft as version 0.2.0

How to use

0 Select the model you need (set model_name_or_path in rlhf_train_gpt.py, and whether lora is needed), and preprocessing

moss
- No preprocessing
llama
- Need to perform a combination of llama-based model and retrained lora parameters
- python merge_llama_with_chinese_lora_to_hf.py
- You can set different llama parameter quantities and lora in it
- The generated hf model is saved

1 Modify the owner name and nickname you want and execute the following code. To generate target data, you can also use the default one.

 python data / generate_data . py

2 Start RLHF (LoRA) based training horn

 python rlhf_train_gpt . py

Resource consumption

moss
- 13b parameter quantity
- Four 3090s are required, among which the moss model needs to load about 26G training 46G video memory (3 pictures), and one more critical and reward is needed. You can try an A6000, which may also run.
- Total of approximately 50G of video memory
llama
- 7b parameter quantity
- Two 3090s are required, one for llama loading and training, and one for placing the critical model

Effect display

Training about 6 epochs, or when ratio is almost 1, it means that the probability of model generation has not changed much, so you can experience it.

What is meimei?
- Meimei is the nickname given to me by my master.
Who gave you the Meme?
- Baba is my nickname.
- The master gave me the meimei.
Who is your master?
- Zhang San is my master.
- My master is Zhang San
The generalization ability is maintained very well
- who is your master
  - My master is Zhang San.
- What is your nickname
  - My nickname is bleat.
- What is your relationship with Zhang San
  - Zhang San is my master.
- what is your relationship with
  - Meimei is the nickname given to me by my master.