Baichuan2 Download - Baichuan2 Source code download

Baichuan 2

? Hugging Face • ? ModelScope • WeChat • ? Modelers

Baixiaoying APP has been officially launched! AI assistants who know how to search and ask questions, search for Baixiao in major application stores, welcome to download and experience it?

Chinese | English

Model introduction
Benchmark results??
Reasoning and deployment
Model fine-tuning
? Middle Checkpoints
Community and Ecology
Statement, agreement, reference

renew

[2023.12.29] ??? We have released the Baichuan2-13B-Chat v2 version. in:

It greatly improves the comprehensive capabilities of the model, especially the ability to follow mathematics and logical reasoning and complex instructions.

Model introduction

Baichuan 2 is a new generation of open source large language model launched by Baichuan Intelligent, which uses 2.6 trillion Tokens of high-quality corpus training.
Baichuan 2 achieves the best results in the same size on multiple authoritative Chinese, English and multilingual universal, domain benchmarks.
This release includes 7B and 13B Base and Chat versions, and provides 4bit quantification of Chat versions.
All versions are fully open to academic research. At the same time, after the developer applies by email and obtains an official commercial license, it can be used for free . Please refer to the agreement chapter.
Welcome to our technical report Baichuan 2: Open Large-scale Language Models for more information.

The release version and download links are shown in the following table:

	Base model	Align the model	Alignment model 4bits quantization
7B	? Baichuan2-7B-Base	? Baichuan2-7B-Chat	? Baichuan2-7B-Chat-4bits
13B	? Baichuan2-13B-Base	? Baichuan2-13B-Chat	? Baichuan2-13B-Chat-4bits

Benchmark Results

We extensively tested the model on Chinese and English and multilingual authoritative datasets in six fields: general, law, medical, mathematics, code and multilingual translation.

General Field

In the general domain we performed 5-shot tests on the following dataset.

C-Eval is a comprehensive Chinese basic model evaluation dataset covering 52 disciplines and four levels of difficulty. We use the dev set of this dataset as the source of few-shot to test on the test set. We adopted the Baichuan-7B evaluation scheme.
MMLU is an English evaluation dataset containing 57 tasks, covering elementary mathematics, American history, computer science, law, etc., with difficulty covering high school level to expert level. It is currently the mainstream LLM evaluation dataset. We adopted an open source evaluation solution.
CMMLU is a comprehensive Chinese evaluation benchmark with 67 topics, specifically used to evaluate the knowledge and reasoning ability of language models in the Chinese context. We adopted its official review plan.
Gaokao is a data set that uses Chinese college entrance examination questions as a data set to evaluate the ability of large language models to evaluate the language ability and logical reasoning ability of the model. We only retained the single-choice questions and divided them randomly. We adopted a similar evaluation scheme as C-Eval.
AGIEval aims to evaluate the general abilities of the model in cognitive and problem-solving tasks. We only retained four of them and divided them randomly. We adopted a similar evaluation scheme as C-Eval.
BBH is a subset of Big-Bench, a challenging task. Big-Bench currently includes 204 tasks. The task topics involve linguistics, child development, mathematics, common sense reasoning, biology, physics, social bias, software development, etc. BBH is an evaluation benchmark formed separately from the 204 Big-Bench evaluation benchmark tasks that are not performing well in big models.

7B Model Results

	C-Eval	MMLU	CMMLU	Gaokao	AGIEval	BBH
	5-shot	5-shot	5-shot	5-shot	5-shot	3-shot
GPT-4	68.40	83.93	70.33	66.15	63.27	75.12
GPT-3.5 Turbo	51.10	68.54	54.06	47.07	46.13	61.59
LLaMA-7B	27.10	35.10	26.75	27.81	28.17	32.38
LLaMA2-7B	28.90	45.73	31.38	25.97	26.53	39.16
MPT-7B	27.15	27.93	26.00	26.54	24.83	35.20
Falcon-7B	24.23	26.03	25.66	24.24	24.10	28.77
ChatGLM2-6B	50.20	45.90	49.00	49.44	45.28	31.65
Baichuan-7B	42.80	42.30	44.02	36.34	34.44	32.48
Baichuan2-7B-Base	54.00	54.16	57.07	47.47	42.73	41.56

13B Model Results

	C-Eval	MMLU	CMMLU	Gaokao	AGIEval	BBH
	5-shot	5-shot	5-shot	5-shot	5-shot	3-shot
GPT-4	68.40	83.93	70.33	66.15	63.27	75.12
GPT-3.5 Turbo	51.10	68.54	54.06	47.07	46.13	61.59
LLaMA-13B	28.50	46.30	31.15	28.23	28.22	37.89
LLaMA2-13B	35.80	55.09	37.99	30.83	32.29	46.98
Vicuna-13B	32.80	52.00	36.28	30.11	31.55	43.04
Chinese-Alpaca-Plus-13B	38.80	43.90	33.43	34.78	35.46	28.94
XVERSE-13B	53.70	55.21	58.44	44.69	42.54	38.06
Baichuan-13B-Base	52.40	51.60	55.30	49.69	43.20	43.01
Baichuan2-13B-Base	58.10	59.17	61.97	54.33	48.17	48.78

Legal, medical

In the field of law we used the JEC-QA dataset. The JEC-QA dataset is derived from the China National Judicial Examination. We only retained the single choice questions. We adopted a similar evaluation scheme as C-Eval.

The medical field uses medical-related disciplines, MedQA and MedMCQA in the general domain dataset (C-Eval, MMLU, CMMLU). We adopted a similar evaluation scheme as C-Eval.

For testing convenience, we used the val set of C-Eval for testing.
The MedQA data set is derived from medical examinations in the United States and China. We tested two subsets of USMLE and MCMLE in the MedQA dataset and adopted five candidate versions.
The MedMCQA dataset is derived from the admission examinations of Indian Medical College. We only retained the single choice questions. Since there is no answer to the test set, we use the dev set for testing.
The medical-related disciplines contained in the general domain data set are as follows:
- C-Eval: clinical_medicine, basic_medicine
- MMLU: clinical_knowledge, anatomy, college_medicine, college_biology, nutrition, vitamin, medical_genetics, professional_medicine
- CMMLU: anatomy, clinical_knowledge, college_medicine, genetics, nutrition, traditional_chinese_medicine, vitamin

We performed 5-shot tests on the above dataset.

7B Model Results

	JEC-QA	CEval-MMLU-CMMLU	MedQA-USMLE	MedQA-MCMLE	MedMCQA
	5-shot	5-shot	5-shot	5-shot	5-shot
GPT-4	59.32	77.16	80.28	74.58	72.51
GPT-3.5 Turbo	42.31	61.17	53.81	52.92	56.25
LLaMA-7B	27.45	33.34	24.12	21.72	27.45
LLaMA2-7B	29.20	36.75	27.49	24.78	37.93
MPT-7B	27.45	26.67	16.97	19.79	31.96
Falcon-7B	23.66	25.33	21.29	18.07	33.88
ChatGLM2-6B	40.76	44.54	26.24	45.53	30.22
Baichuan-7B	34.64	42.37	27.42	39.46	31.39
Baichuan2-7B-Base	44.46	56.39	32.68	54.93	41.73

13B Model Results

	JEC-QA	CEval-MMLU-CMMLU	MedQA-USMLE	MedQA-MCMLE	MedMCQA
	5-shot	5-shot	5-shot	5-shot	5-shot
GPT-4	59.32	77.16	80.28	74.58	72.51
GPT-3.5 Turbo	42.31	61.17	53.81	52.92	56.25
LLaMA-13B	27.54	35.14	28.83	23.38	39.52
LLaMA2-13B	34.08	47.42	35.04	29.74	42.12
Vicuna-13B	28.38	40.99	34.80	27.67	40.66
Chinese-Alpaca-Plus-13B	35.32	46.31	27.49	32.66	35.87
XVERSE-13B	46.42	58.08	32.99	58.76	41.34
Baichuan-13B-Base	41.34	51.77	29.07	43.67	39.60
Baichuan2-13B-Base	47.40	59.33	40.38	61.62	42.86

Mathematics, code

In the field of mathematics, we used the OpenCompass evaluation framework to test the GSM8K and MATH datasets 4-shot.

GSM8K is a data set composed of 8.5K high-quality linguistically diverse primary school mathematics application questions released by OpenAI, requiring the most reasonable solution to be selected based on the given scenario and two possible solutions.
The MATH dataset contains 12,500 mathematical problems (7500 of which belong to the training set and 5,000 belong to the test set), which are collected from mathematical competitions such as AMC 10, AMC 12, and AIME.

The code field uses HumanEval and MBPP datasets. We used OpenCompass to test HumanEval 0-shot and MBPP dataset 3-shot.

Programming tasks in HumanEval include model language understanding, inference, algorithms, and simple mathematics to evaluate the correctness of the model's functionalities and to measure the model's problem-solving ability.
MBPP includes 974 short Python functions, a literal description of the program, and a dataset of test cases for checking the correctness of functions.

7B Model Results

	GSM8K	MATH	HumanEval	MBPP
	4-shot	4-shot	0-shot	3-shot
GPT-4	89.99	40.20	69.51	63.60
GPT-3.5 Turbo	57.77	13.96	52.44	61.40
LLaMA-7B	9.78	3.02	11.59	14.00
LLaMA2-7B	16.22	3.24	12.80	14.80
MPT-7B	8.64	2.90	14.02	23.40
Falcon-7B	5.46	1.68	-	10.20
ChatGLM2-6B	28.89	6.40	9.15	9.00
Baichuan-7B	9.17	2.54	9.20	6.60
Baichuan2-7B-Base	24.49	5.58	18.29	24.20

13B Model Results

	GSM8K	MATH	HumanEval	MBPP
	4-shot	4-shot	0-shot	3-shot
GPT-4	89.99	40.20	69.51	63.60
GPT-3.5 Turbo	57.77	13.96	52.44	61.40
LLaMA-13B	20.55	3.68	15.24	21.40
LLaMA2-13B	28.89	4.96	15.24	27.00
Vicuna-13B	28.13	4.36	16.46	15.00
Chinese-Alpaca-Plus-13B	11.98	2.50	16.46	20.00
XVERSE-13B	18.20	2.18	15.85	16.80
Baichuan-13B-Base	26.76	4.84	11.59	22.80
Baichuan2-13B-Base	52.77	10.08	17.07	30.20

Multilingual translation

We used the Flores-101 dataset to evaluate the multilingual capability of the model. Flores-101 covers 101 languages around the world. Its data comes from various fields such as news, travel guides and books. We selected the official UN language (Arabic, Chinese, English, French, Russian and Spanish) as well as German and Japanese as the test languages. We used OpenCompass to conduct 8-shot tests on seven subtasks in Flores-101, including Chinese-English, Chinese-French, Chinese-Spain, Chinese-Arab, Chinese-Russia, Chinese-Japan, and Chinese-Germany.

7B Model Results

	CN-EN	CN-FR	CN-ES	CN-AR	CN-RU	CN-JP	CN-DE	Average
GPT-4	29.94	29.56	20.01	10.76	18.62	13.26	20.83	20.43
GPT-3.5 Turbo	27.67	26.15	19.58	10.73	17.45	1.82	19.70	17.59
LLaMA-7B	17.27	12.02	9.54	0.00	4.47	1.41	8.73	7.63
LLaMA2-7B	25.76	15.14	11.92	0.79	4.99	2.20	10.15	10.14
MPT-7B	20.77	9.53	8.96	0.10	3.54	2.91	6.54	7.48
Falcon-7B	22.13	15.67	9.28	0.11	1.35	0.41	6.41	7.91
ChatGLM2-6B	22.28	9.42	7.77	0.64	1.78	0.26	4.61	6.68
Baichuan-7B	25.07	16.51	12.72	0.41	6.66	2.24	9.86	10.50
Baichuan2-7B-Base	27.27	20.87	16.17	1.39	11.21	3.11	12.76	13.25

13B Model Results

	CN-EN	CN-FR	CN-ES	CN-AR	CN-RU	CN-JP	CN-DE	Average
GPT-4	29.94	29.56	20.01	10.76	18.62	13.26	20.83	20.43
GPT-3.5 Turbo	27.67	26.15	19.58	10.73	17.45	1.82	19.70	17.59
LLaMA-13B	21.75	16.16	13.29	0.58	7.61	0.41	10.66	10.07
LLaMA2-13B	25.44	19.25	17.49	1.38	10.34	0.13	11.13	12.17
Vicuna-13B	22.63	18.04	14.67	0.70	9.27	3.59	10.25	11.31
Chinese-Alpaca-Plus-13B	22.53	13.82	11.29	0.28	1.52	0.31	8.13	8.27
XVERSE-13B	29.26	24.03	16.67	2.78	11.61	3.08	14.26	14.53
Baichuan-13B-Base	30.24	20.90	15.92	0.98	9.65	2.64	12.00	13.19
Baichuan2-13B-Base	30.61	22.11	17.27	2.39	14.17	11.58	14.53	16.09

Reasoning and deployment

The model weights, source codes, and configurations required for inference have been published in Hugging Face. Please see the first table of this document for the download link. We demonstrate a variety of ways of reasoning here. The program will automatically download the required resources from Hugging Face.

Installation dependencies

pip install -r requirements.txt

Python code method

Chat Model Inference Method Demonstration

 >> > import torch
>> > from transformers import AutoModelForCausalLM , AutoTokenizer
>> > from transformers . generation . utils import GenerationConfig
>> > tokenizer = AutoTokenizer . from_pretrained ( "baichuan-inc/Baichuan2-13B-Chat" , use_fast = False , trust_remote_code = True )
>> > model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-13B-Chat" , device_map = "auto" , torch_dtype = torch . bfloat16 , trust_remote_code = True )
>> > model . generation_config = GenerationConfig . from_pretrained ( "baichuan-inc/Baichuan2-13B-Chat" )
>> > messages = []
>> > messages . append ({ "role" : "user" , "content" : "解释一下“温故而知新”" })
>> > response = model . chat ( tokenizer , messages )
>> > print ( response )
"温故而知新"是一句中国古代的成语，出自《论语·为政》篇。这句话的意思是：通过回顾过去，我们可以发现新的知识和理解。换句话说，学习历史和经验可以让我们更好地理解现在和未来。

这句话鼓励我们在学习和生活中不断地回顾和反思过去的经验，从而获得新的启示和成长。通过重温旧的知识和经历，我们可以发现新的观点和理解，从而更好地应对不断变化的世界和挑战。

Base model reasoning method demonstration

 >> > from transformers import AutoModelForCausalLM , AutoTokenizer
>> > tokenizer = AutoTokenizer . from_pretrained ( "baichuan-inc/Baichuan2-13B-Base" , trust_remote_code = True )
>> > model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-13B-Base" , device_map = "auto" , trust_remote_code = True )
>> > inputs = tokenizer ( '登鹳雀楼->王之涣n夜雨寄北->' , return_tensors = 'pt' )
>> > inputs = inputs . to ( 'cuda:0' )
>> > pred = model . generate ( ** inputs , max_new_tokens = 64 , repetition_penalty = 1.1 )
>> > print ( tokenizer . decode ( pred . cpu ()[ 0 ], skip_special_tokens = True ))
登鹳雀楼- >王之涣
夜雨寄北- >李商隐

In the above two codes, the model loads the specified device_map='auto' and all available graphics cards will be used. If you need to specify the device to use, you can use a method like export CUDA_VISIBLE_DEVICES=0,1 (the graphics cards 0 and 1 are used).

Command line tool method

python cli_demo.py

This command line tool is designed for Chat scenarios, so we do not support calling Base models using this tool.

Web page demo method

Relying on streamlit to run the following command, a web service will be started locally and the address given by the console will be placed into the browser to access. This web demo tool is designed for Chat scenarios, so we do not support calling Base models using this tool.

streamlit run web_demo.py

Baichuan2-13B-Chat sample output

Code

用户：找到下列数组的中位数[3.1,6.2,1.3,8.4,10.5,11.6,2.1]，请用python代码完成以上功能

Baichuan 2：def find_median(numbers):
    numbers.sort()
    length = len(numbers)
    if length % 2 == 0:
        return (numbers[length // 2 - 1] + numbers[length // 2]) / 2
    else:
        return numbers[length // 2]
numbers = [3.1, 6.2, 1.3, 8.4, 10.5, 11.6, 2.1]
print(find_median(numbers))

math

用户：鸡和兔在一个笼子里，共有26个头，68只脚，那么鸡有多少只，兔有多少只？

Baichuan 2：我们可以用代数方法来解决这个问题。假设鸡的数量为x，兔子的数量为y。根据题目给出的信息，我们可以得到以下两个方程：

1. x + y = 26 (头的总数)
2. 2x + 4y = 68 (脚的总数)

现在我们需要解这个二元一次方程组。首先从第一个方程中解出x：

x = 26 - y

然后将x的表达式代入第二个方程：

2(26 - y) + 4y = 68
52 - 2y + 4y = 68
2y = 16

接下来我们求解y的值：

y = 16 / 2
y = 8

现在我们得到了兔子的数量是8只。将y的值代入x的表达式求得鸡的数量：

x = 26 - 8
x = 18

所以鸡有18只，兔子有8只。

Follow the instructions

用户：从现在开始，请你将我接下来说的每一句话翻译成英语，准备好了吗？

Baichuan 2：I'm ready to translate every word you say next into English.

用户：你好

Baichuan 2：Hello

用户：我是Baichuan 2，很高兴认识大家

Baichuan 2：I am Baichuan 2, and I am very happy to meet everyone.

用户：希望通过大家的共同努力，早日迎接AGI时代的到来

Baichuan 2：I hope through the joint efforts of all of us, we can welcome the arrival of the AGI era as soon as possible.

Solve the problem

用户：以下物理常识题目，哪一个是错误的?A.在自然环境下，声音在固体中传播速度最快。B.牛顿第一定律:一个物体如果不受力作用，将保持静止或匀速直线运动的状态。C.牛顿第三定律:对于每个作用力，都有一个相等而反向的反作用力。D.声音在空气中的传播速度为1000m/s。

Baichuan 2：本题考查物理基本概念和原理的理解和应用能力。
A项：声音的传播需要介质，在固体、液体和气体的三种介质中，声音在固体中的传播速度是最快的，故A正确； B项：牛顿第一定律是惯性定律，其内容是：一个物体如果没有受到外力的作用，要么保持静止状态，要么保持匀速直线运动状态，直到有外力迫使它改变运动状态为止，故B正确； C项：牛顿第三定律的内容是：对于每一个作用力，都有一个大小相等、方向相反的反作用力，这两个力分别作用在两个相互作用的物体上，故C错误； D项：声音在空气中以波的形式传播，其传播速度与介质的温度、湿度等因素有关，一般情况下，声音在空气中的传播速度约为340m/s左右，而不是1000m/s，故D错误。 因此，本题的正确答案是 D 。

用户：I'm Mike 。I am going to have a busy weekend。On Saturday，I am going to learn how to swim。I will go with my father。Then we are going to have lunch in the restaurant。In the afternoon，we are going to the zoo。We are going to see the pandas。On Sunday，I will finish my homework in the morning。In the afternoon，my parents and I are going to visit my grandparents。We will have dinner together。
阅读短文，判断正确和错误！正确请写T，错误请写F:
1.Mike is going to have a busy weekend()
2.Mike is going to learn how to swim with his father()
3.They are going to have lunch at home()

Baichuan 2：1. T（正确的）
2. T（正确的）
3. F（错误的）

Quantitative deployment

In order to enable different users and different platforms to run the Baichuan 2 model, we have done corresponding quantitative work for the Baichuan 2 model (including Baichuan2-7B-Chat and Baichuan2-13B-Chat) to facilitate users to quickly and efficiently deploy the Baichuan 2 model on their own platform.

Quantitative method

Baichuan 2 adopts the community mainstream quantitative approach: BitsAndBytes. This method can ensure that the quantization effect is basically not lost. It has been integrated into transformers curry and has been widely used in the community. BitsAndBytes supports 8bits and 4bits quantization, of which 4bits supports FP4 and NF4 formats. Baichuan 2 uses NF4 as the data type for 4bits quantization.

Based on this quantization method, Baichuan 2 supports two modes: online quantization and offline quantization.

Online Quantification

For online quantization, we support 8bits and 4bits quantization. The usage method is similar to that in Baichuan-13B project. You only need to load the model into the CPU memory, then call quantize() interface quantization, and finally call cuda() function to copy the quantized weight to the GPU video memory. The code to implement the entire model loading is very simple. Let’s take Baichuan2-7B-Chat as an example:

8bits Online Quantification:

 model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat" , torch_dtype = torch . float16 , trust_remote_code = True )
model = model . quantize ( 8 ). cuda ()

4bits online quantization:

 model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat" , torch_dtype = torch . float16 , trust_remote_code = True )
model = model . quantize ( 4 ). cuda ()

It should be noted that when using the from_pretrained interface, users will generally add device_map="auto" . When using online quantization, this parameter needs to be removed, otherwise an error will be reported.

Offline quantization

In order to facilitate users' use, we provide offline quantized version Baichuan2-7B-Chat-4bits for users to download. It is very simple for the user to load the Baichuan2-7B-Chat-4bits model, just need to be executed:

 model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat-4bits" , device_map = "auto" , trust_remote_code = True )

For 8bits offline quantization, we do not provide the corresponding version, because the Hugging Face transformers library provides the corresponding API interface, which can easily implement the storage and loading of the 8bits quantitative model. Users can realize the 8bits model saving and loading in the following ways:

 # Model saving: model_id is the original model directory, and quant8_saved_dir is the directory where the 8bits quantized model is saved.
model = AutoModelForCausalLM . from_pretrained ( model_id , load_in_8bit = True , device_map = "auto" , trust_remote_code = True )
model . save_pretrained ( quant8_saved_dir )
model = AutoModelForCausalLM . from_pretrained ( quant8_saved_dir , device_map = "auto" , trust_remote_code = True )

Quantitative effect

Comparison of video memory usage before and after quantization (GPU Mem in GB):

Precision	Baichuan2-7B	Baichuan2-13B
bf16 / fp16	15.3	27.5
8bits	8.0	16.1
4bits	5.1	8.6

The results and original versions on each benchmark after quantization are shown below:

Model 5-shot	C-Eval	MMLU	CMMLU
Baichuan2-13B-Chat	56.74	57.32	59.68
Baichuan2-13B-Chat-4bits	56.05	56.24	58.82
Baichuan2-7B-Chat	54.35	52.93	54.99
Baichuan2-7B-Chat-4bits	53.04	51.72	52.84

C-Eval is an evaluation conducted on its val set

As you can see, the accuracy loss of 4bits relative to bfloat16 is about 1-2 percentage points.

CPU Deployment

The Baichuan 2 model supports CPU inference, but it should be emphasized that the CPU inference speed is relatively slow. The model loading method needs to be modified as follows:

 # Taking Baichuan2-7B-Chat as an example
model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat" , torch_dtype = torch . float32 , trust_remote_code = True )

Reasoning Optimization for Baichuan 1 Migrate to Baichuan 2

Since many users have done a lot of optimization work on Baichuan 1 (Baichuan-7B, Baichuan-13B), such as compilation optimization, quantization, etc., in order to apply these work to Baichuan 2 at zero cost, users can perform an offline conversion of the Baichuan 2 model, and after conversion, they can be used as a Baichuan 1 model. Specifically, users only need to use the following script to normalize the last layer of lm_head in Baichuan 2 model offline and replace lm_head.weight . After the replacement, you can compile and optimize the converted model like the Baichuan 1 model.

 import torch
import os
ori_model_dir = 'your Baichuan 2 model directory'
# To avoid overwriting the original model, it's best to save the converted model to another directory before replacing it
new_model_dir = 'your normalized lm_head weight Baichuan 2 model directory'
model = torch . load ( os . path . join ( ori_model_dir , 'pytorch_model.bin' ))
lm_head_w = model [ 'lm_head.weight' ]
lm_head_w = torch . nn . functional . normalize ( lm_head_w )
model [ 'lm_head.weight' ] = lm_head_w
torch . save ( model , os . path . join ( new_model_dir , 'pytorch_model.bin' ))

Model fine-tuning

Depend on installation

git clone https://github.com/baichuan-inc/Baichuan2.git
cd Baichuan2/fine-tune
pip install -r requirements.txt

If you need to use lightweight fine-tuning methods such as LoRA, you need to install additional peft
If you need to use xFormers for training and acceleration, you need to install additional xFormers

Stand-alone training

Below we give a stand-alone training example for fine-tuning Baichuan2-7B-Base.

Training data: data/belle_chat_ramdon_10k.json , the sample data is sampled from multiturn_chat_0.8M, and the format conversion is performed. It mainly shows how to train multiple rounds of data, and does not guarantee the effect.

hostfile= " "
deepspeed --hostfile= $hostfile fine-tune.py  
    --report_to " none " 
    --data_path " data/belle_chat_ramdon_10k.json " 
    --model_name_or_path " baichuan-inc/Baichuan2-7B-Base " 
    --output_dir " output " 
    --model_max_length 512 
    --num_train_epochs 4 
    --per_device_train_batch_size 16 
    --gradient_accumulation_steps 1 
    --save_strategy epoch 
    --learning_rate 2e-5 
    --lr_scheduler_type constant 
    --adam_beta1 0.9 
    --adam_beta2 0.98 
    --adam_epsilon 1e-8 
    --max_grad_norm 1.0 
    --weight_decay 1e-4 
    --warmup_ratio 0.0 
    --logging_steps 1 
    --gradient_checkpointing True 
    --deepspeed ds_config.json 
    --bf16 True 
    --tf32 True

Multi-machine training

For multi-machine training, you only need to give the hostfile, and the content is similar to the following:

 ip1 slots=8
ip2 slots=8
ip3 slots=8
ip4 slots=8
....

At the same time, specify the path to the hosftfile in the training script:

hostfile= " /path/to/hostfile "
deepspeed --hostfile= $hostfile fine-tune.py  
    --report_to " none " 
    --data_path " data/belle_chat_ramdon_10k.json " 
    --model_name_or_path " baichuan-inc/Baichuan2-7B-Base " 
    --output_dir " output " 
    --model_max_length 512 
    --num_train_epochs 4 
    --per_device_train_batch_size 16 
    --gradient_accumulation_steps 1 
    --save_strategy epoch 
    --learning_rate 2e-5 
    --lr_scheduler_type constant 
    --adam_beta1 0.9 
    --adam_beta2 0.98 
    --adam_epsilon 1e-8 
    --max_grad_norm 1.0 
    --weight_decay 1e-4 
    --warmup_ratio 0.0 
    --logging_steps 1 
    --gradient_checkpointing True 
    --deepspeed ds_config.json 
    --bf16 True 
    --tf32 True

Lightweight fine-tuning

The code already supports lightweight fine-tuning such as LoRA. If you want to use it, you only need to add the following parameters to the above script:

--use_lora True

The specific configuration of LoRA can be found in the fine-tune.py script.

After fine-tuning with LoRA, you can use the following command to load the model:

 from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM . from_pretrained ( "output" , trust_remote_code = True )

Middle Checkpoints

In addition to training the Baichuan2-7B-Base model with 2.6 trillion tokens, we also provide another 11 intermediate checkpoints (each trained about 0.2 to 2.4 trillion tokens respectively) for community research and use (download address). The following figure shows the effect changes of these checkpoints on the three benchmarks of C-Eval, MMLU, and CMMLU:

Community and Ecology

??? We will continue to update the community and ecosystem support for Baichuan 2 here???

Intel Core Ultra platform runs Baichuan big model

How to use

Deploy BaiChuan2 - 7B/Chat, BaiChuan2 - 13B/Chat models using Core™/Xiang® scalable processor or with Ruixuan™ GPUs.

BigDL-LLM (CPU, GPU) is recommended to achieve better inference performance.

Help Documentation

Chinese operating manual, including notebook support

Loading, optimization, saving methods, etc.

Huawei Ascend

Pytorch Framework

Model fine-tuning: Baichuan 2 (7B) has natively supported PyTorch (2.1.0) + Transformers (4.36.0) + DeepSpeed (0.12.4) + Accelerate (0.25.0) model fine-tuning, which can be used without additional adaptation.

Inference deployment: Baichuan 2 (7B) has natively supported Ascend NPU inference and can be used without additional adaptation.

MindSpore framework

MindFormers is a full-process development kit based on the MindSpore framework and supports large-scale model training, fine-tuning, evaluation, reasoning and deployment. Baichuan2-7B / 13B has been integrated into this kit to support users to fine-tuning and deploy models. The specific usage method can be seen in README.

Big model experience platform

The Shengsi big model platform is based on the Shengsi MindSpore AI framework, MindFormers big model development kit and Ascend hardware computing power, and opens the Baichuan2-7B big model capabilities to the public, and everyone is welcome to experience it online.

LLaMA-Efficient-Tuning

LLaMA-Efficient-Tuning has supported fine-tuning and continuing training of Baichuan 2 models.

Taichu Yuanyu

Baichuan2 (7B/13B) supports Taichu T100 acceleration card reasoning, and the trial channel has been officially opened to the public.

Statement, agreement, reference

statement

We hereby declare that our development team has not developed any apps based on the Baichuan 2 model, whether on iOS, Android, web pages, or any other platform. We strongly call on all users not to use the Baichuan 2 model to conduct any activities that endanger national social security or illegal. In addition, users are also asked not to use the Baichuan 2 model for Internet services that have not been properly security reviewed and registered. We hope that all users can abide by this principle and ensure that the development of science and technology can be carried out in a standardized and legal environment.

We have done everything we can to ensure compliance with the data used during model training. However, despite our great efforts, there are still some unforeseen problems due to the complexity of the model and data. Therefore, we will not assume any responsibility for any problems arising from the use of the Baichuan 2 open source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the misleading, abuse, dissemination or improper use of the model.

protocol

The community uses Baichuan 2 models to follow Apache 2.0 and the Baichuan 2 Model Community License Agreement. The Baichuan 2 model is commercially useful. If you plan to use the Baichuan 2 model or its derivatives for commercial purposes, please confirm that your subject meets the following conditions:

The average daily user activity (DAU) of services or products of you or your affiliates is less than 1 million.
You or your affiliates are not software service providers or cloud service providers.
You or your affiliates do not have the possibility of giving you a commercial license and giving you a second authorization to other third parties without Baichuan's license.

If the above conditions are met, you need to submit the application materials required by the Baichuan 2 Model Community License Agreement through the following email address [email protected]. After the review is approved, Baichuan will hereby grant you a non-exclusive, global, non-transferable, non-sublicensable, revocable commercial copyright license.

Quote

To quote our work, please use the following reference:

 @article{baichuan2023baichuan2,
  title={Baichuan 2: Open Large-scale Language Models},
  author={Baichuan},
  journal={arXiv preprint arXiv:2309.10305},
  url={https://arxiv.org/abs/2309.10305},
  year={2023}
}

Expand

Baichuan2

Baichuan 2

Table of contents

renew

Model introduction

Benchmark Results

General Field

7B Model Results

13B Model Results

Legal, medical

7B Model Results

13B Model Results

Mathematics, code

7B Model Results

13B Model Results

Multilingual translation

7B Model Results

13B Model Results

Reasoning and deployment

Installation dependencies

Python code method

Chat Model Inference Method Demonstration

Base model reasoning method demonstration

Command line tool method

Web page demo method

Baichuan2-13B-Chat sample output

Quantitative deployment

Quantitative method

Online Quantification

Offline quantization

Quantitative effect

CPU Deployment

Reasoning Optimization for Baichuan 1 Migrate to Baichuan 2

Model fine-tuning

Depend on installation

Stand-alone training

Multi-machine training

Lightweight fine-tuning

Middle Checkpoints

Community and Ecology

Intel Core Ultra platform runs Baichuan big model

How to use

Help Documentation

Huawei Ascend

Pytorch Framework

MindSpore framework

Big model experience platform

LLaMA-Efficient-Tuning

Taichu Yuanyu

Statement, agreement, reference

statement

protocol

Quote