Baichuan2下載Baichuan2源代碼下載

Baichuan 2

? Hugging Face • ? ModelScope • WeChat• ? Modelers

百小應APP已正式上線！懂搜索、會提問的AI助手，各大應用商店搜索百小應下載，歡迎下載體驗?

中文| English

更新

[2023.12.29] ??? 我們發布了Baichuan2-13B-Chat v2版本。其中：

大幅提升了模型的綜合能力，特別是數學和邏輯推理、複雜指令跟隨能力。

模型介紹

Baichuan 2 是百川智能推出的新一代開源大語言模型，採用2.6 萬億Tokens 的高質量語料訓練。
Baichuan 2 在多個權威的中文、英文和多語言的通用、領域benchmark 上取得同尺寸最佳的效果。
本次發布包含有7B 、 13B的Base和Chat版本，並提供了Chat 版本的4bits 量化。
所有版本對學術研究完全開放。同時，開發者通過郵件申請並獲得官方商用許可後，即可免費商用，請參考協議章節。
歡迎閱讀我們的技術報告Baichuan 2: Open Large-scale Language Models 獲取更多信息。

本次發布版本和下載鏈接見下表：

	基座模型	對齊模型	對齊模型4bits 量化
7B	? Baichuan2-7B-Base	? Baichuan2-7B-Chat	? Baichuan2-7B-Chat-4bits
13B	? Baichuan2-13B-Base	? Baichuan2-13B-Chat	? Baichuan2-13B-Chat-4bits

Benchmark 結果

我們在通用、法律、醫療、數學、代碼和多語言翻譯六個領域的中英文和多語言權威數據集上對模型進行了廣泛測試。

通用領域

在通用領域我們在以下數據集上進行了5-shot 測試。

C-Eval 是一個全面的中文基礎模型評測數據集，涵蓋了52 個學科和四個難度的級別。我們使用該數據集的dev 集作為few-shot 的來源，在test 集上進行測試。我們採用了Baichuan-7B 的評測方案。
MMLU 是包含57 個任務的英文評測數據集，涵蓋了初等數學、美國歷史、計算機科學、法律等，難度覆蓋高中水平到專家水平，是目前主流的LLM 評測數據集。我們採用了開源的評測方案。
CMMLU 是一個包含67 個主題的綜合性性中文評估基準，專門用於評估語言模型在中文語境下的知識和推理能力。我們採用了其官方的評測方案。
Gaokao 是一個以中國高考題作為評測大語言模型能力的數據集，用以評估模型的語言能力和邏輯推理能力。我們只保留了其中的單項選擇題，並進行了隨機劃分。我們採用了與C-Eval 類似的評測方案。
AGIEval 旨在評估模型的認知和解決問題相關的任務中的一般能力。我們只保留了其中的四選一單項選擇題，並進行了隨機劃分。我們採用了與C-Eval 類似的評測方案。
BBH 是一個挑戰性任務Big-Bench 的子集。 Big-Bench 目前包括204 項任務。任務主題涉及語言學、兒童發展、數學、常識推理、生物學、物理學、社會偏見、軟件開發等方面。 BBH 是從204 項Big-Bench 評測基準任務中大模型表現不好的任務單獨拿出來形成的評測基準。

7B 模型結果

	C-Eval	MMLU	CMMLU	Gaokao	AGIEval	BBH
	5-shot	5-shot	5-shot	5-shot	5-shot	3-shot
GPT-4	68.40	83.93	70.33	66.15	63.27	75.12
GPT-3.5 Turbo	51.10	68.54	54.06	47.07	46.13	61.59
LLaMA-7B	27.10	35.10	26.75	27.81	28.17	32.38
LLaMA2-7B	28.90	45.73	31.38	25.97	26.53	39.16
MPT-7B	27.15	27.93	26.00	26.54	24.83	35.20
Falcon-7B	24.23	26.03	25.66	24.24	24.10	28.77
ChatGLM2-6B	50.20	45.90	49.00	49.44	45.28	31.65
Baichuan-7B	42.80	42.30	44.02	36.34	34.44	32.48
Baichuan2-7B-Base	54.00	54.16	57.07	47.47	42.73	41.56

13B 模型結果

	C-Eval	MMLU	CMMLU	Gaokao	AGIEval	BBH
	5-shot	5-shot	5-shot	5-shot	5-shot	3-shot
GPT-4	68.40	83.93	70.33	66.15	63.27	75.12
GPT-3.5 Turbo	51.10	68.54	54.06	47.07	46.13	61.59
LLaMA-13B	28.50	46.30	31.15	28.23	28.22	37.89
LLaMA2-13B	35.80	55.09	37.99	30.83	32.29	46.98
Vicuna-13B	32.80	52.00	36.28	30.11	31.55	43.04
Chinese-Alpaca-Plus-13B	38.80	43.90	33.43	34.78	35.46	28.94
XVERSE-13B	53.70	55.21	58.44	44.69	42.54	38.06
Baichuan-13B-Base	52.40	51.60	55.30	49.69	43.20	43.01
Baichuan2-13B-Base	58.10	59.17	61.97	54.33	48.17	48.78

法律、醫療

法律領域我們使用了JEC-QA 數據集。 JEC-QA 數據集來源於中國國家司法考試。我們只保留了其中的單選題。我們採用了與C-Eval 類似的評測方案。

醫療領域則使用通用領域數據集（C-Eval、MMLU、CMMLU）中的醫學相關學科、MedQA 和MedMCQA。我們採用了與C-Eval 類似的評測方案。

為了測試方便，我們使用了C-Eval 的val 集進行測試。
MedQA 數據集來源於美國、中國的醫學考試。我們測試了MedQA數據集中的USMLE 和MCMLE 兩個子集，並採用了五個候選的版本。
MedMCQA 數據集來源於印度醫學院的入學考試。我們只保留了其中的單選題。由於test 集沒有答案，我們使用dev 集進行測試。
通用領域數據集包含的醫學相關學科如下：
- C-Eval: clinical_medicine, basic_medicine
- MMLU: clinical_knowledge, anatomy, college_medicine, college_biology, nutrition, virology, medical_genetics, professional_medicine
- CMMLU: anatomy, clinical_knowledge, college_medicine, genetics, nutrition, traditional_chinese_medicine, virology

我們對以上數據集進行了5-shot 測試。

7B 模型結果

	JEC-QA	CEval-MMLU-CMMLU	MedQA-USMLE	MedQA-MCMLE	MedMCQA
	5-shot	5-shot	5-shot	5-shot	5-shot
GPT-4	59.32	77.16	80.28	74.58	72.51
GPT-3.5 Turbo	42.31	61.17	53.81	52.92	56.25
LLaMA-7B	27.45	33.34	24.12	21.72	27.45
LLaMA2-7B	29.20	36.75	27.49	24.78	37.93
MPT-7B	27.45	26.67	16.97	19.79	31.96
Falcon-7B	23.66	25.33	21.29	18.07	33.88
ChatGLM2-6B	40.76	44.54	26.24	45.53	30.22
Baichuan-7B	34.64	42.37	27.42	39.46	31.39
Baichuan2-7B-Base	44.46	56.39	32.68	54.93	41.73

13B 模型結果

	JEC-QA	CEval-MMLU-CMMLU	MedQA-USMLE	MedQA-MCMLE	MedMCQA
	5-shot	5-shot	5-shot	5-shot	5-shot
GPT-4	59.32	77.16	80.28	74.58	72.51
GPT-3.5 Turbo	42.31	61.17	53.81	52.92	56.25
LLaMA-13B	27.54	35.14	28.83	23.38	39.52
LLaMA2-13B	34.08	47.42	35.04	29.74	42.12
Vicuna-13B	28.38	40.99	34.80	27.67	40.66
Chinese-Alpaca-Plus-13B	35.32	46.31	27.49	32.66	35.87
XVERSE-13B	46.42	58.08	32.99	58.76	41.34
Baichuan-13B-Base	41.34	51.77	29.07	43.67	39.60
Baichuan2-13B-Base	47.40	59.33	40.38	61.62	42.86

數學、代碼

數學領域我們使用OpenCompass 評估框架，對GSM8K 和MATH 數據集進行了4-shot 測試。

GSM8K 是由OpenAI 發布的一個由8.5K 高質量的語言多樣化的小學數學應用題組成的數據集，要求根據給定的場景和兩個可能的解決方案，選擇最合理的方案。
MATH 數據集包含12,500 個數學問題（其中7500 個屬於訓練集，5000 個屬於測試集），這些問題收集自AMC 10、AMC 12、AIME 等數學競賽。

代碼領域則採用了HumanEval 和MBPP 數據集。我們使用OpenCompass，對HumanEval 進行了0-shot 測試，MBPP 數據集進行了3-shot 測試。

HumanEval 中的編程任務包括模型語言理解、推理、算法和簡單數學，以評估模型功能正確性，並衡量模型的問題解決能力。
MBPP 包括974 個Python 短函數、程序的文字描述以及用於檢查功能正確性的測試用例的數據集。

7B 模型結果

	GSM8K	MATH	HumanEval	MBPP
	4-shot	4-shot	0-shot	3-shot
GPT-4	89.99	40.20	69.51	63.60
GPT-3.5 Turbo	57.77	13.96	52.44	61.40
LLaMA-7B	9.78	3.02	11.59	14.00
LLaMA2-7B	16.22	3.24	12.80	14.80
MPT-7B	8.64	2.90	14.02	23.40
Falcon-7B	5.46	1.68	-	10.20
ChatGLM2-6B	28.89	6.40	9.15	9.00
Baichuan-7B	9.17	2.54	9.20	6.60
Baichuan2-7B-Base	24.49	5.58	18.29	24.20

13B 模型結果

	GSM8K	MATH	HumanEval	MBPP
	4-shot	4-shot	0-shot	3-shot
GPT-4	89.99	40.20	69.51	63.60
GPT-3.5 Turbo	57.77	13.96	52.44	61.40
LLaMA-13B	20.55	3.68	15.24	21.40
LLaMA2-13B	28.89	4.96	15.24	27.00
Vicuna-13B	28.13	4.36	16.46	15.00
Chinese-Alpaca-Plus-13B	11.98	2.50	16.46	20.00
XVERSE-13B	18.20	2.18	15.85	16.80
Baichuan-13B-Base	26.76	4.84	11.59	22.80
Baichuan2-13B-Base	52.77	10.08	17.07	30.20

多語言翻譯

我們採用了Flores-101 數據集來評估模型的多語言能力。 Flores-101 涵蓋了世界各地的101 種語言。它的數據來源於新聞、旅遊指南和書籍等多個不同領域。我們選擇了聯合國官方語言（阿拉伯文、中文、英文、法文、俄文和西班牙文）以及德文和日文作為測試語種。我們使用OpenCompass 對Flores-101 中的中-英、中-法、中-西班牙、中-阿拉伯、中-俄、中-日、中-德等七個子任務分別進行了8-shot 測試。

7B 模型結果

	CN-EN	CN-FR	CN-ES	CN-AR	CN-RU	CN-JP	CN-DE	Average
GPT-4	29.94	29.56	20.01	10.76	18.62	13.26	20.83	20.43
GPT-3.5 Turbo	27.67	26.15	19.58	10.73	17.45	1.82	19.70	17.59
LLaMA-7B	17.27	12.02	9.54	0.00	4.47	1.41	8.73	7.63
LLaMA2-7B	25.76	15.14	11.92	0.79	4.99	2.20	10.15	10.14
MPT-7B	20.77	9.53	8.96	0.10	3.54	2.91	6.54	7.48
Falcon-7B	22.13	15.67	9.28	0.11	1.35	0.41	6.41	7.91
ChatGLM2-6B	22.28	9.42	7.77	0.64	1.78	0.26	4.61	6.68
Baichuan-7B	25.07	16.51	12.72	0.41	6.66	2.24	9.86	10.50
Baichuan2-7B-Base	27.27	20.87	16.17	1.39	11.21	3.11	12.76	13.25

13B 模型結果

	CN-EN	CN-FR	CN-ES	CN-AR	CN-RU	CN-JP	CN-DE	Average
GPT-4	29.94	29.56	20.01	10.76	18.62	13.26	20.83	20.43
GPT-3.5 Turbo	27.67	26.15	19.58	10.73	17.45	1.82	19.70	17.59
LLaMA-13B	21.75	16.16	13.29	0.58	7.61	0.41	10.66	10.07
LLaMA2-13B	25.44	19.25	17.49	1.38	10.34	0.13	11.13	12.17
Vicuna-13B	22.63	18.04	14.67	0.70	9.27	3.59	10.25	11.31
Chinese-Alpaca-Plus-13B	22.53	13.82	11.29	0.28	1.52	0.31	8.13	8.27
XVERSE-13B	29.26	24.03	16.67	2.78	11.61	3.08	14.26	14.53
Baichuan-13B-Base	30.24	20.90	15.92	0.98	9.65	2.64	12.00	13.19
Baichuan2-13B-Base	30.61	22.11	17.27	2.39	14.17	11.58	14.53	16.09

推理和部署

推理所需的模型權重、源碼、配置已發佈在Hugging Face，下載鏈接見本文檔最開始的表格。我們在此示範多種推理方式。程序會自動從Hugging Face 下載所需資源。

安裝依賴

pip install -r requirements.txt

Python 代碼方式

Chat 模型推理方法示範

 >> > import torch
>> > from transformers import AutoModelForCausalLM , AutoTokenizer
>> > from transformers . generation . utils import GenerationConfig
>> > tokenizer = AutoTokenizer . from_pretrained ( "baichuan-inc/Baichuan2-13B-Chat" , use_fast = False , trust_remote_code = True )
>> > model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-13B-Chat" , device_map = "auto" , torch_dtype = torch . bfloat16 , trust_remote_code = True )
>> > model . generation_config = GenerationConfig . from_pretrained ( "baichuan-inc/Baichuan2-13B-Chat" )
>> > messages = []
>> > messages . append ({ "role" : "user" , "content" : "解释一下“温故而知新”" })
>> > response = model . chat ( tokenizer , messages )
>> > print ( response )
"温故而知新"是一句中国古代的成语，出自《论语·为政》篇。这句话的意思是：通过回顾过去，我们可以发现新的知识和理解。换句话说，学习历史和经验可以让我们更好地理解现在和未来。

这句话鼓励我们在学习和生活中不断地回顾和反思过去的经验，从而获得新的启示和成长。通过重温旧的知识和经历，我们可以发现新的观点和理解，从而更好地应对不断变化的世界和挑战。

Base 模型推理方法示範

 >> > from transformers import AutoModelForCausalLM , AutoTokenizer
>> > tokenizer = AutoTokenizer . from_pretrained ( "baichuan-inc/Baichuan2-13B-Base" , trust_remote_code = True )
>> > model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-13B-Base" , device_map = "auto" , trust_remote_code = True )
>> > inputs = tokenizer ( '登鹳雀楼->王之涣n夜雨寄北->' , return_tensors = 'pt' )
>> > inputs = inputs . to ( 'cuda:0' )
>> > pred = model . generate ( ** inputs , max_new_tokens = 64 , repetition_penalty = 1.1 )
>> > print ( tokenizer . decode ( pred . cpu ()[ 0 ], skip_special_tokens = True ))
登鹳雀楼- >王之涣
夜雨寄北- >李商隐

在上述兩段代碼中，模型加載指定device_map='auto' ，會使用所有可用顯卡。如需指定使用的設備，可以使用類似export CUDA_VISIBLE_DEVICES=0,1 （使用了0、1號顯卡）的方式控制。

命令行工具方式

python cli_demo.py

本命令行工具是為Chat 場景設計，因此我們不支持使用該工具調用Base 模型。

網頁demo 方式

依靠streamlit 運行以下命令，會在本地啟動一個web 服務，把控制台給出的地址放入瀏覽器即可訪問。本網頁demo 工具是為Chat 場景設計，因此我們不支持使用該工具調用Base 模型。

streamlit run web_demo.py

Baichuan2-13B-Chat 示例輸出

代碼

用户：找到下列数组的中位数[3.1,6.2,1.3,8.4,10.5,11.6,2.1]，请用python代码完成以上功能

Baichuan 2：def find_median(numbers):
    numbers.sort()
    length = len(numbers)
    if length % 2 == 0:
        return (numbers[length // 2 - 1] + numbers[length // 2]) / 2
    else:
        return numbers[length // 2]
numbers = [3.1, 6.2, 1.3, 8.4, 10.5, 11.6, 2.1]
print(find_median(numbers))

數學

用户：鸡和兔在一个笼子里，共有26个头，68只脚，那么鸡有多少只，兔有多少只？

Baichuan 2：我们可以用代数方法来解决这个问题。假设鸡的数量为x，兔子的数量为y。根据题目给出的信息，我们可以得到以下两个方程：

1. x + y = 26 (头的总数)
2. 2x + 4y = 68 (脚的总数)

现在我们需要解这个二元一次方程组。首先从第一个方程中解出x：

x = 26 - y

然后将x的表达式代入第二个方程：

2(26 - y) + 4y = 68
52 - 2y + 4y = 68
2y = 16

接下来我们求解y的值：

y = 16 / 2
y = 8

现在我们得到了兔子的数量是8只。将y的值代入x的表达式求得鸡的数量：

x = 26 - 8
x = 18

所以鸡有18只，兔子有8只。

指令跟隨

用户：从现在开始，请你将我接下来说的每一句话翻译成英语，准备好了吗？

Baichuan 2：I'm ready to translate every word you say next into English.

用户：你好

Baichuan 2：Hello

用户：我是Baichuan 2，很高兴认识大家

Baichuan 2：I am Baichuan 2, and I am very happy to meet everyone.

用户：希望通过大家的共同努力，早日迎接AGI时代的到来

Baichuan 2：I hope through the joint efforts of all of us, we can welcome the arrival of the AGI era as soon as possible.

解題

用户：以下物理常识题目，哪一个是错误的?A.在自然环境下，声音在固体中传播速度最快。B.牛顿第一定律:一个物体如果不受力作用，将保持静止或匀速直线运动的状态。C.牛顿第三定律:对于每个作用力，都有一个相等而反向的反作用力。D.声音在空气中的传播速度为1000m/s。

Baichuan 2：本题考查物理基本概念和原理的理解和应用能力。
A项：声音的传播需要介质，在固体、液体和气体的三种介质中，声音在固体中的传播速度是最快的，故A正确； B项：牛顿第一定律是惯性定律，其内容是：一个物体如果没有受到外力的作用，要么保持静止状态，要么保持匀速直线运动状态，直到有外力迫使它改变运动状态为止，故B正确； C项：牛顿第三定律的内容是：对于每一个作用力，都有一个大小相等、方向相反的反作用力，这两个力分别作用在两个相互作用的物体上，故C错误； D项：声音在空气中以波的形式传播，其传播速度与介质的温度、湿度等因素有关，一般情况下，声音在空气中的传播速度约为340m/s左右，而不是1000m/s，故D错误。 因此，本题的正确答案是 D 。

用户：I'm Mike 。I am going to have a busy weekend。On Saturday，I am going to learn how to swim。I will go with my father。Then we are going to have lunch in the restaurant。In the afternoon，we are going to the zoo。We are going to see the pandas。On Sunday，I will finish my homework in the morning。In the afternoon，my parents and I are going to visit my grandparents。We will have dinner together。
阅读短文，判断正确和错误！正确请写T，错误请写F:
1.Mike is going to have a busy weekend()
2.Mike is going to learn how to swim with his father()
3.They are going to have lunch at home()

Baichuan 2：1. T（正确的）
2. T（正确的）
3. F（错误的）

量化部署

為了讓不同的用戶以及不同的平台都能運行Baichuan 2 模型，我們針對Baichuan 2 模型做了相應地量化工作（包括Baichuan2-7B-Chat 和Baichuan2-13B-Chat），方便用戶快速高效地在自己的平台部署Baichuan 2 模型。

量化方法

Baichuan 2 的採用社區主流的量化方法：BitsAndBytes。該方法可以保證量化後的效果基本不掉點，目前已經集成到transformers 庫裡，並在社區得到了廣泛應用。 BitsAndBytes 支持8bits 和4bits 兩種量化，其中4bits 支持FP4 和NF4 兩種格式，Baichuan 2 選用NF4 作為4bits 量化的數據類型。

基於該量化方法，Baichuan 2 支持在線量化和離線量化兩種模式。

在線量化

對於在線量化，我們支持8bits 和4bits 量化，使用方式和Baichuan-13B 項目中的方式類似，只需要先加載模型到CPU 的內存裡，再調用quantize()接口量化，最後調用cuda()函數，將量化後的權重拷貝到GPU 顯存中。實現整個模型加載的代碼非常簡單，我們以Baichuan2-7B-Chat 為例：

8bits 在線量化:

 model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat" , torch_dtype = torch . float16 , trust_remote_code = True )
model = model . quantize ( 8 ). cuda ()

4bits 在線量化:

 model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat" , torch_dtype = torch . float16 , trust_remote_code = True )
model = model . quantize ( 4 ). cuda ()

需要注意的是，在用from_pretrained接口的時候，用戶一般會加上device_map="auto" ，在使用在線量化時，需要去掉這個參數，否則會報錯。

離線量化

為了方便用戶的使用，我們提供了離線量化好的4bits 的版本Baichuan2-7B-Chat-4bits，供用戶下載。用戶加載Baichuan2-7B-Chat-4bits 模型很簡單，只需要執行:

 model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat-4bits" , device_map = "auto" , trust_remote_code = True )

對於8bits 離線量化，我們沒有提供相應的版本，因為Hugging Face transformers 庫提供了相應的API 接口，可以很方便的實現8bits 量化模型的保存和加載。用戶可以自行按照如下方式實現8bits 的模型保存和加載：

 # Model saving: model_id is the original model directory, and quant8_saved_dir is the directory where the 8bits quantized model is saved.
model = AutoModelForCausalLM . from_pretrained ( model_id , load_in_8bit = True , device_map = "auto" , trust_remote_code = True )
model . save_pretrained ( quant8_saved_dir )
model = AutoModelForCausalLM . from_pretrained ( quant8_saved_dir , device_map = "auto" , trust_remote_code = True )

量化效果

量化前後顯存佔用對比(GPU Mem in GB)：

Precision	Baichuan2-7B	Baichuan2-13B
bf16 / fp16	15.3	27.5
8bits	8.0	16.1
4bits	5.1	8.6

量化後在各個benchmark 上的結果和原始版本對比如下：

Model 5-shot	C-Eval	MMLU	CMMLU
Baichuan2-13B-Chat	56.74	57.32	59.68
Baichuan2-13B-Chat-4bits	56.05	56.24	58.82
Baichuan2-7B-Chat	54.35	52.93	54.99
Baichuan2-7B-Chat-4bits	53.04	51.72	52.84

C-Eval 是在其val set 上進行的評測

可以看到，4bits 相對bfloat16 精度損失在1 - 2 個百分點左右。

CPU 部署

Baichuan 2 模型支持CPU 推理，但需要強調的是，CPU 的推理速度相對較慢。需按如下方式修改模型加載的方式：

 # Taking Baichuan2-7B-Chat as an example
model = AutoModelForCausalLM . from_pretrained ( "baichuan-inc/Baichuan2-7B-Chat" , torch_dtype = torch . float32 , trust_remote_code = True )

對Baichuan 1 的推理優化遷移到Baichuan 2

由於很多用戶在Baichuan 1 (Baichuan-7B, Baichuan-13B)上做了很多優化的工作，例如編譯優化、量化等，為了將這些工作零成本地應用於Baichuan 2，用戶可以對Baichuan 2 模型做一個離線轉換，轉換後就可以當做Baichuan 1 模型來使用。具體來說，用戶只需要利用以下腳本離線對Baichuan 2 模型的最後一層lm_head 做歸一化，並替換掉lm_head.weight即可。替換完後，就可以像對Baichuan 1 模型一樣對轉換後的模型做編譯優化等工作了。

 import torch
import os
ori_model_dir = 'your Baichuan 2 model directory'
# To avoid overwriting the original model, it's best to save the converted model to another directory before replacing it
new_model_dir = 'your normalized lm_head weight Baichuan 2 model directory'
model = torch . load ( os . path . join ( ori_model_dir , 'pytorch_model.bin' ))
lm_head_w = model [ 'lm_head.weight' ]
lm_head_w = torch . nn . functional . normalize ( lm_head_w )
model [ 'lm_head.weight' ] = lm_head_w
torch . save ( model , os . path . join ( new_model_dir , 'pytorch_model.bin' ))

模型微調

依賴安裝

git clone https://github.com/baichuan-inc/Baichuan2.git
cd Baichuan2/fine-tune
pip install -r requirements.txt

如需使用LoRA 等輕量級微調方法需額外安裝peft
如需使用xFormers 進行訓練加速需額外安裝xFormers

單機訓練

下面我們給一個微調Baichuan2-7B-Base 的單機訓練例子。

訓練數據： data/belle_chat_ramdon_10k.json ，該樣例數據是從multiturn_chat_0.8M 採樣出1 萬條，並且做了格式轉換。主要是展示多輪數據怎麼訓練，不保證效果。

hostfile= " "
deepspeed --hostfile= $hostfile fine-tune.py  
    --report_to " none " 
    --data_path " data/belle_chat_ramdon_10k.json " 
    --model_name_or_path " baichuan-inc/Baichuan2-7B-Base " 
    --output_dir " output " 
    --model_max_length 512 
    --num_train_epochs 4 
    --per_device_train_batch_size 16 
    --gradient_accumulation_steps 1 
    --save_strategy epoch 
    --learning_rate 2e-5 
    --lr_scheduler_type constant 
    --adam_beta1 0.9 
    --adam_beta2 0.98 
    --adam_epsilon 1e-8 
    --max_grad_norm 1.0 
    --weight_decay 1e-4 
    --warmup_ratio 0.0 
    --logging_steps 1 
    --gradient_checkpointing True 
    --deepspeed ds_config.json 
    --bf16 True 
    --tf32 True

多機訓練

多機訓練只需要給一下hostfile ，內容類似如下：

 ip1 slots=8
ip2 slots=8
ip3 slots=8
ip4 slots=8
....

同時在訓練腳本里面指定hosftfile 的路徑：

hostfile= " /path/to/hostfile "
deepspeed --hostfile= $hostfile fine-tune.py  
    --report_to " none " 
    --data_path " data/belle_chat_ramdon_10k.json " 
    --model_name_or_path " baichuan-inc/Baichuan2-7B-Base " 
    --output_dir " output " 
    --model_max_length 512 
    --num_train_epochs 4 
    --per_device_train_batch_size 16 
    --gradient_accumulation_steps 1 
    --save_strategy epoch 
    --learning_rate 2e-5 
    --lr_scheduler_type constant 
    --adam_beta1 0.9 
    --adam_beta2 0.98 
    --adam_epsilon 1e-8 
    --max_grad_norm 1.0 
    --weight_decay 1e-4 
    --warmup_ratio 0.0 
    --logging_steps 1 
    --gradient_checkpointing True 
    --deepspeed ds_config.json 
    --bf16 True 
    --tf32 True

輕量化微調

代碼已經支持輕量化微調如LoRA，如需使用僅需在上面的腳本中加入以下參數：

--use_lora True

LoRA 具體的配置可見fine-tune.py腳本。

使用LoRA 微調後可以使用下面的命令加載模型：

 from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM . from_pretrained ( "output" , trust_remote_code = True )

中間Checkpoints

除了訓練了2.6 萬億Tokens 的Baichuan2-7B-Base 模型，我們還提供了在此之前的另外11 個中間checkpoints（分別對應訓練了約0.2 ~ 2.4 萬億Tokens）供社區研究使用（下載地址）。下圖給出了這些checkpoints 在C-Eval、MMLU、CMMLU 三個benchmark 上的效果變化：

社區與生態

??? 我們會在此持續更新社區和生態對Baichuan 2 的支持???

Intel 酷睿Ultra 平台運行百川大模型

使用方式

使用酷睿™/至強® 可擴展處理器或配合銳炫™ GPU等進行部署BaiChuan2 - 7B/Chat，BaiChuan2 - 13B/Chat模型

推薦使用BigDL-LLM（CPU, GPU）以發揮更好推理性能。

幫助文檔

中文操作手冊，包括用notebook支持

加載，優化，保存方法等

華為昇騰

Pytorch 框架

模型微調：Baichuan 2 （7B）已原生支持基於昇騰NPU 的PyTorch（2.1.0）+ Transformers（4.36.0）+ DeepSpeed（0.12.4）+ Accelerate（0.25.0）模型微調，無需額外適配即可使用。

推理部署：Baichuan 2 （7B）已原生支持昇騰NPU 推理，無需額外適配即可使用。

MindSpore 框架

MindFormers 是一個基於昇思框架（MindSpore）並支持大模型訓練、微調、評估、推理、部署的全流程開發套件，Baichuan2-7B / 13B 已集成於此套件，支持用戶進行模型微調、部署，具體使用方式可見README。

大模型體驗平台

昇思大模型平台基於昇思MindSpore AI 框架、MindFormers 大模型開發套件與昇騰硬件算力，將Baichuan2-7B 大模型能力開放給公眾，歡迎大家在線體驗。

LLaMA-Efficient-Tuning

LLaMA-Efficient-Tuning 已支持對Baichuan 2 模型的微調和繼續訓練。

太初元碁

Baichuan2（7B/13B）支持太初T100 加速卡推理，現試用通道已正式對外開啟。

聲明、協議、引用

聲明

我們在此聲明，我們的開發團隊並未基於Baichuan 2 模型開發任何應用，無論是在iOS、Android、網頁或任何其他平台。我們強烈呼籲所有使用者，不要利用Baichuan 2 模型進行任何危害國家社會安全或違法的活動。另外，我們也要求使用者不要將Baichuan 2 模型用於未經適當安全審查和備案的互聯網服務。我們希望所有的使用者都能遵守這個原則，確保科技的發展能在規範和合法的環境下進行。

我們已經盡我們所能，來確保模型訓練過程中使用的數據的合規性。然而，儘管我們已經做出了巨大的努力，但由於模型和數據的複雜性，仍有可能存在一些無法預見的問題。因此，如果由於使用Baichuan 2 開源模型而導致的任何問題，包括但不限於數據安全問題、公共輿論風險，或模型被誤導、濫用、傳播或不當利用所帶來的任何風險和問題，我們將不承擔任何責任。

協定

社區使用Baichuan 2 模型需要遵循Apache 2.0 和《Baichuan 2 模型社區許可協議》。 Baichuan 2 模型支持商業用途，如果您計劃將Baichuan 2 模型或其衍生品用於商業目的，請您確認您的主體符合以下情況：

您或您的關聯方的服務或產品的日均用戶活躍量（DAU）低於100萬。
您或您的關聯方不是軟件服務提供商、雲服務提供商。
您或您的關聯方不存在將授予您的商用許可，未經百川許可二次授權給其他第三方的可能。

在符合以上條件的前提下，您需要通過以下聯繫郵箱[email protected] ，提交《Baichuan 2 模型社區許可協議》要求的申請材料。審核通過後，百川將特此授予您一個非排他性、全球性、不可轉讓、不可再許可、可撤銷的商用版權許可。

引用

如需引用我們的工作，請使用如下reference:

 @article{baichuan2023baichuan2,
  title={Baichuan 2: Open Large-scale Language Models},
  author={Baichuan},
  journal={arXiv preprint arXiv:2309.10305},
  url={https://arxiv.org/abs/2309.10305},
  year={2023}
}

展開

Baichuan2

Baichuan 2

目錄

更新

模型介紹

Benchmark 結果

通用領域

7B 模型結果

13B 模型結果

法律、醫療

7B 模型結果

13B 模型結果

數學、代碼

7B 模型結果

13B 模型結果

多語言翻譯

7B 模型結果

13B 模型結果

推理和部署

安裝依賴

Python 代碼方式

Chat 模型推理方法示範

Base 模型推理方法示範

命令行工具方式

網頁demo 方式

Baichuan2-13B-Chat 示例輸出

量化部署

量化方法

在線量化

離線量化

量化效果

CPU 部署

對Baichuan 1 的推理優化遷移到Baichuan 2

模型微調

依賴安裝

單機訓練

多機訓練

輕量化微調

中間Checkpoints

社區與生態

Intel 酷睿Ultra 平台運行百川大模型

使用方式

幫助文檔

華為昇騰

Pytorch 框架

MindSpore 框架

大模型體驗平台

LLaMA-Efficient-Tuning

太初元碁

聲明、協議、引用

聲明

協定

引用