Chinese LLaMA Alpaca 3 Download - Chinese LLaMA Alpaca 3 Source code download

Chinese LLaMA Alpaca 3

Other source code

v3.0

Download

? Hugging Face • ? ModelScope • ?️ Machine Heart SOTA! Model • ? wisemodel • ? Online Demo

This project is developed based on the new generation of open source model Llama-3, which is the third phase of the Chinese-LLaMA-Alpaca series of projects related to open source model (Phase I and Phase II). This project open source Chinese Llama-3 base model and Chinese Llama-3-Instruct instruction fine-tuning model . These models use large-scale Chinese data for incremental pre-training based on the original Llama-3, and use selected instruction data for fine adjustment, further improving the basic Chinese semantics and instruction comprehension capabilities, and obtaining significant performance improvements compared with the second-generation related models.

Main content

Open source Llama-3-Chinese dock model and Llama-3-Chinese-Instruct instruction model (v1, v2, v3)
Open source pre-training scripts and instruction fine-tuning scripts, users can further train or fine-tune the model as needed.
Open source alpaca_zh_51k, stem_zh_instruction, ruozhiba_gpt4 (4o/4T) instruction fine-tuning data
Provides tutorials to quickly quantify and deploy large-scale models locally using personal computer CPU/GPU
Supports Llama-3 ecology such as transformers, llama.cpp, text-generation-webui, vLLM, Ollama, etc.

news

[2024/05/30] Released the Llama-3-Chinese-8B-Instruct-v3 instruction model, which has achieved significant improvements in downstream tasks compared to v1/v2. View details: v3.0 version release log

[2024/05/08] Release the Llama-3-Chinese-8B-Instruct-v2 instruction model, directly using 5 million instruction data to fine-tune it on Meta-Llama-3-8B-Instruct. View details: v2.0 version release log

[2024/05/07] Add pre-training scripts and instruction fine-tuning scripts. View details: v1.1 version release log

[2024/04/30] Release the Llama-3-Chinese-8B base model and the Llama-3-Chinese-8B-Instruct instruction model. View details: v1.0 version release log

[2024/04/19] Officially launch the Chinese-LLaMA-Alpaca-3 project

Content guidance

chapter	describe
??‍♂️Model Introduction	Briefly introduce the technical characteristics of the relevant models of this project
⏬Model Download	Chinese Llama-3 big model download address
Reasoning and deployment	Introduces how to quantify models and deploy and experience large models using a personal computer
?Model effect	The effect of the model on some tasks is introduced
Training and fine tune	Introducing how to train and fine tune the Chinese Llama-3 model
❓FAQ	Replies to some FAQs

Model Introduction

This project launches the Chinese open source model Llama-3-Chinese and Llama-3-Chinese-Instruct based on Meta Llama-3. The main features are as follows:

Use the original Llama-3 word list

Compared with its previous two generations, Llama-3 has significantly expanded the vocabulary size, from 32K to 128K, and has been changed to BPE vocabulary list.
Preliminary experiments found that the encoding efficiency of the Llama-3 word list is comparable to that of our extended vocabulary list, with an efficiency of about 95% of the Chinese word list (based on the encoding efficiency test on Wikipedia data)
Based on our relevant experience in Chinese Mixtral and experimental conclusions ¹ , we did not add additional vocabulary

? Long context length is expanded from 4K to 8K in the second generation

Llama-3 increases the native context window length from 4K to 8K, allowing further processing of longer context information
Users can also extend the model with long context through PI, NTK, YaRN and other methods to support the processing of longer texts.

⚡ Use grouping to query attention mechanism

Llama-3 adopts the grouping query attention (GQA) mechanism applied to the large-parameter version in Llama-2, which can further improve the efficiency of the model.

? Brand new command template

Llama-3-Instruct adopts a brand new instruction template, which is incompatible with Llama-2-chat. It should follow the official instruction template when using it (see instruction template)

Model download

Model selection guidelines

The following is a comparison of the model of this project and the recommended usage scenarios. For chat interaction, select Instruct version.

Comparison items	Llama-3-Chinese-8B	Llama-3-Chinese-8B-Instruct
Model Type	Base model	Directive/Chat model (Class ChatGPT)
Model size	8B	8B
Training Type	Causal-LM (CLM)	Instruction fine adjustment
Training method	LoRA + full amount emb/lm-head	LoRA + full amount emb/lm-head
Initialize the model	Original Meta-Llama-3-8B	v1: Llama-3-Chinese-8B v2: Original Meta-Llama-3-8B-Instruct v3: mix of inst/inst-v2/inst-meta
Training materials	Unmarked universal corpus (approximately 120GB)	Marked instruction data (about 5 million pieces)
Vocabulary size	Original vocabulary (128,256)	Original vocabulary (128,256)
Supports context length	8K	8K
Input template	unnecessary	Need to apply the Llama-3-Instruct template
Applicable scenarios	Text continuation: Given the above text, let the model generate the following text	Command understanding: Q&A, writing, chat, interaction, etc.

The following is a comparison between the Instruct versions. If there is no clear preference, please give priority to the Instruct-v3 version.

Comparison items	Instruct-v1	Instruct-v2	Instruct-v3
Release time	2024/4/30	2024/5/8	2024/5/30
Basic model	Original Meta-Llama-3-8B	Original Meta-Llama-3-8B-Instruct	(See training method)
Training method	Phase 1: 120G Chinese corpus pre-training phase 2: 5 million instruction data fine adjustment	Directly use 5 million instruction data to fine tune	Model fusion is performed using inst-v1, inst-v2, and inst-meta, and it is obtained by fine-tuning of a small amount of instruction data (~5K pieces)
Chinese ability ^[1]	49.3 / 51.5	51.6 / 51.6	55.2 / 54.8 ??
English proficiency ^[1]	63.21	66.68	66.81 ??
Long text capability ^[1]	29.6	46.4 ??	40.5
Mockup Arena Win Rate/Elo Rating ^[2]	49.4% / 1430	66.1% / 1559	83.6% / 1627 ??

Note

[1] The Chinese ability effect comes from C-Eval (valid); the English ability effect comes from Open LLM Leaderboard (avg); the long text ability comes from LongBench (avg); please refer to the section on the model effect for details. [2] The acquisition time of the big model arena effect: 2024/5/30, for reference only.

Download address

Model name	Full version	LoRA version	GGUF version
Llama-3-Chinese-8B-Instruct-v3 (Instruction model)	[?Hugging Face] [?ModelScope] [?wisemodel]	N/A	[?Hugging Face] [?ModelScope]
Llama-3-Chinese-8B-Instruct-v2 (Instruction model)	[?Hugging Face] [?ModelScope] [?wisemodel]	[?Hugging Face] [?ModelScope] [?wisemodel]	[?Hugging Face] [?ModelScope]
Llama-3-Chinese-8B-Instruct (Instruction model)	[?Hugging Face] [?ModelScope] [?wisemodel]	[?Hugging Face] [?ModelScope] [?wisemodel]	[?Hugging Face] [?ModelScope]
Llama-3-Chinese-8B (Pedestal Model)	[?Hugging Face] [?ModelScope] [?wisemodel]	[?Hugging Face] [?ModelScope] [?wisemodel]	[?Hugging Face] [?ModelScope]

Model type description:

Complete model : Can be used directly for training and reasoning without additional merging steps
LoRA model : It needs to be merged with the base model before it can be converted into a full version of the model. Merge method: Model merging steps
- v1 basic model: original Meta-Llama-3-8B
- v2 basic model: original Meta-Llama-3-8B-Instruct
GGUF model : The quantization format launched by llama.cpp is adapted to common inference tools such as ollama. It is recommended to download users who only need to do inference deployment; the model name suffix is -im that means that the importance matrix is used for quantization, usually with lower PPL, and it is recommended to use (the usage is the same as the regular version)

Note

If you cannot access HF, you can consider some mirror sites (such as hf-mirror.com). Please find and solve the specific methods yourself.

Reasoning and deployment

The relevant models in this project mainly support the following quantization, reasoning and deployment methods. For details, please refer to the corresponding tutorial.

tool	Features	CPU	GPU	Quantification	GUI	API	vLLM	Tutorial
llama.cpp	Rich GGUF quantization options and efficient local reasoning	✅	✅	✅	✅	✅		[link]
?transformers	Native transformers inference interface	✅	✅	✅	✅		✅	[link]
Imitation of OpenAI API calls	Server demo that emulates OpenAI API interface	✅	✅	✅		✅	✅	[link]
text-generation-webui	How to deploy the front-end Web UI interface	✅	✅	✅	✅	✅		[link]
LM Studio	Multi-platform chat software (with interface)	✅	✅	✅	✅	✅		[link]
Ollama	Locally run mockup model reasoning	✅	✅	✅		✅		[link]

Model Effect

In order to evaluate the effects of related models, this project conducted generative effect evaluation and objective effect evaluation (NLU class) respectively, and evaluated the large model from different angles. It is recommended that users test on tasks they are concerned about and select models that adapt to related tasks.

Generate effect evaluation

This project has launched an online model battle platform modeled after Fastchat Chatbot Arena, which can browse and evaluate the quality of model responses. The battle platform provides evaluation indicators such as winning rate and Elo score, and can view the results of the winning rate of the pair-to-door model. ⚔️ Model Arena: http://llm-arena.ymcui.com
The examples directory provides output samples of Llama-3-Chinese-8B-Instruct and Chinese-Mixtral-Instruct, and score comparisons are performed through GPT-4-turbo. The average score of Llama-3-Chinese-8B-Instruct is 8.1 and the average score of Chinese-Mixtral-Instruct is 7.8 . ? Output sample comparison: examples
This project has been moved into the Machine Heart SOTA! model platform, and the online experience will be realized in the later stage: https://sota.jiqizhixin.com/project/chinese-llama-alpaca-3

Objective effect evaluation

C-Eval

C-Eval is a comprehensive Chinese basic model evaluation suite, in which the verification set and the test set contain 1.3K and 12.3K multiple-choice questions, covering 52 subjects, respectively. Please refer to this project for C-Eval inference code: GitHub Wiki

Models	Valid (0-shot)	Valid (5-shot)	Test (0-shot)	Test (5-shot)
Llama-3-Chinese-8B-Instruct-v3	55.2	54.8	52.1	52.4
Llama-3-Chinese-8B-Instruct-v2	51.6	51.6	49.7	49.8
Llama-3-Chinese-8B-Instruct	49.3	51.5	48.3	49.4
Llama-3-Chinese-8B	47.0	50.5	46.1	49.0
Meta-Llama-3-8B-Instruct	51.3	51.3	49.5	51.0
Meta-Llama-3-8B	49.3	51.2	46.1	49.4
Chinese-Mixtral-Instruct (8x7B)	51.7	55.0	50.0	51.5
Chinese-Mixtral (8x7B)	45.8	54.2	43.1	49.1
Chinese-Alpaca-2-13B	44.3	45.9	42.6	44.0
Chinese-LLaMA-2-13B	40.6	42.7	38.0	41.6

CMMLU

CMMLU is another comprehensive Chinese evaluation dataset, specifically used to evaluate the knowledge and reasoning ability of language models in the Chinese context, covering 67 topics from basic subjects to advanced professional level, with a total of 11.5K multiple-choice questions. Please refer to this project for CMMLU inference code: GitHub Wiki

Models	Test (0-shot)	Test (5-shot)
Llama-3-Chinese-8B-Instruct-v3	54.4	54.8
Llama-3-Chinese-8B-Instruct-v2	51.8	52.4
Llama-3-Chinese-8B-Instruct	49.7	51.5
Llama-3-Chinese-8B	48.0	50.9
Meta-Llama-3-8B-Instruct	53.0	53.5
Meta-Llama-3-8B	47.8	50.8
Chinese-Mixtral-Instruct (8x7B)	50.0	53.0
Chinese-Mixtral (8x7B)	42.5	51.0
Chinese-Alpaca-2-13B	43.2	45.5
Chinese-LLaMA-2-13B	38.9	42.5

MMLU

MMLU is an English evaluation dataset for evaluating natural language comprehension ability. It is one of the main datasets used to evaluate large model capabilities today. The verification set and test set contain 1.5K and 14.1K multiple-choice questions, respectively, covering 57 subjects. Please refer to this project for MMLU inference code: GitHub Wiki

Models	Valid (0-shot)	Valid (5-shot)	Test (0-shot)	Test (5-shot)
Llama-3-Chinese-8B-Instruct-v3	64.7	65.0	64.8	65.9
Llama-3-Chinese-8B-Instruct-v2	62.1	63.9	62.6	63.7
Llama-3-Chinese-8B-Instruct	60.1	61.3	59.8	61.8
Llama-3-Chinese-8B	55.5	58.5	57.3	61.1
Meta-Llama-3-8B-Instruct	63.4	64.8	65.1	66.4
Meta-Llama-3-8B	58.6	62.5	60.5	65.0
Chinese-Mixtral-Instruct (8x7B)	65.1	69.6	67.5	69.8
Chinese-Mixtral (8x7B)	63.2	67.1	65.5	68.3
Chinese-Alpaca-2-13B	49.6	53.2	50.9	53.5
Chinese-LLaMA-2-13B	46.8	50.0	46.6	51.8

LongBench

LongBench is a benchmark for evaluating long text comprehension ability of a large model. It consists of 6 major categories and 20 different tasks. The average length of most tasks is between 5K-15K, and contains about 4.75K test data. The following is the evaluation effect of this project model on this Chinese task (including code tasks). Please refer to this project for LongBench inference code: GitHub Wiki

Models	Single document QA	Multi-document QA	summary	FS Learning	Code	synthesis	average
Llama-3-Chinese-8B-Instruct-v3	20.3	28.8	24.5	28.1	59.4	91.9	40.5
Llama-3-Chinese-8B-Instruct-v2	57.3	27.1	13.9	30.3	60.6	89.5	46.4
Llama-3-Chinese-8B-Instruct	44.1	24.0	12.4	33.5	51.8	11.5	29.6
Llama-3-Chinese-8B	16.4	19.3	4.3	28.7	14.3	4.6	14.6
Meta-Llama-3-8B-Instruct	55.1	15.1	0.1	24.0	51.3	94.5	40.0
Meta-Llama-3-8B	21.2	22.9	2.7	35.8	65.9	40.8	31.6
Chinese-Mixtral-Instruct (8x7B)	50.3	34.2	16.4	42.0	56.1	89.5	48.1
Chinese-Mixtral (8x7B)	32.0	23.7	0.4	42.5	27.4	14.0	23.3
Chinese-Alpaca-2-13B-16K	47.9	26.7	13.0	22.3	46.6	21.5	29.7
Chinese-LLaMA-2-13B-16K	36.7	17.7	3.1	29.8	13.8	3.0	17.3
Chinese-Alpaca-2-7B-64K	44.7	28.1	14.4	39.0	44.6	5.0	29.3
Chinese-LLaMA-2-7B-64K	27.2	16.4	6.5	33.0	7.8	5.0	16.0

Open LLM Leaderboard

Open LLM Leaderboard is a large model comprehensive capability evaluation benchmark (English) initiated by the HuggingFaceH4 team, including 6 single tests including ARC, HellaSwag, MMLU, TruthfulQA, Winograde, GSM8K. The following is the evaluation effect of this project model on this list.

Models	ARC	HellaS	MMLU	TQA	WinoG	GSM8K	average
Llama-3-Chinese-8B-Instruct-v3	63.40	80.51	67.90	53.57	76.24	59.21	66.81
Llama-3-Chinese-8B-Instruct-v2	62.63	79.72	66.48	53.93	76.72	60.58	66.68
Llama-3-Chinese-8B-Instruct	61.26	80.24	63.10	55.15	75.06	44.43	63.21
Llama-3-Chinese-8B	55.88	79.53	63.70	41.14	77.03	37.98	59.21
Meta-Llama-3-8B-Instruct	60.75	78.55	67.07	51.65	74.51	68.69	66.87
Meta-Llama-3-8B	59.47	82.09	66.69	43.90	77.35	45.79	62.55
Chinese-Mixtral-Instruct (8x7B)	67.75	85.67	71.53	57.46	83.11	55.65	70.19
Chinese-Mixtral (8x7B)	67.58	85.34	70.38	46.86	82.00	0.00	58.69

Note: The main reason for the difference between MMLU results is that the evaluation scripts are different.

Quantitative effect evaluation

Under llama.cpp, the quantitative performance of Llama-3-Chinese-8B (base model) was tested as shown in the table below. The actual test speed is slightly slower than the second-generation Llama-2-7B.

	F16	Q8_0	Q6_K	Q5_K	Q5_0	Q4_K	Q4_0	Q3_K	Q2_K
Size (GB)	14.97	7.95	6.14	5.34	5.21	4.58	4.34	3.74	2.96
BPW	16.00	8.50	6.56	5.70	5.57	4.89	4.64	4.00	3.16
PPL	5.130	5.135	5.148	5.181	5.222	5.312	5.549	5.755	11.859
PP Speed	5.99	6.10	7.17	7.34	6.65	6.38	6.00	6.85	6.43
TG Speed	44.03	26.08	21.61	22.33	20.93	18.93	17.09	22.50	19.21

Note

Model size: Unit GB
BPW (Bits-Per-Weight): Unit parameter bits, for example, the actual average accuracy of Q8_0 is 8.50
PPL (confusion): measured in 8K context (native support length), the lower the value, the better
PP/TG speed: Provides the instruction processing (PP) and text generation (TG) speed of Apple M3 Max (Metal), unit ms/token, the lower the value, the faster it is.

Training and fine tune

Manual training and fine adjustment

Pre-training with unlabeled data: Pre-training script Wiki
Use labeled data for instruction fine adjustment: Instruction fine adjustment script Wiki

Directive template

This project Llama-3-Chinese-Instruct continues to use the original Llama-3-Instruct instruction template. Here are a set of conversation examples:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>>
You are a helpful assistant. <|eot_id|><|start_header_id|>user<|end_header_id|>>
Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>>
Hello! Is there anything that can help you? <|eot_id|>

Instruction data

The following are some of the open source instructions data of this project. For details, please check: Command data

Data name	illustrate	quantity
alpaca_zh_51k	Alpaca data translated using gpt-3.5	51K
stem_zh_instruction	STEM data crawled using gpt-3.5, including physics, chemistry, medicine, biology, and earth sciences	256K
ruozhiba_gpt4	Ruozhiba Q&A data obtained using GPT-4o and GPT-4T	2449

Frequently Asked Questions

Please be sure to check whether the solution already exists in the FAQ before submitting the Issue. For specific questions and answers, please refer to this project GitHub Wiki

问题1：为什么没有像一期、二期项目一样做词表扩充？
问题2：会有70B版本发布吗？
问题3：为什么指令模型不叫Alpaca了？
问题4：本仓库模型能否商用？
问题5：为什么不对模型做全量预训练而是用LoRA？
问题6：为什么Llama-3-Chinese对话效果不好？
问题7：为什么指令模型会回复说自己是ChatGPT？
问题8：Instruct模型的v1（原版）和v2有什么区别？

Quote

If you have used relevant resources for this project, please refer to the technical report citing this project: https://arxiv.org/abs/2304.08177

 @article{chinese-llama-alpaca,
    title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca},
    author={Cui, Yiming and Yang, Ziqing and Yao, Xin},
    journal={arXiv preprint arXiv:2304.08177},
    url={https://arxiv.org/abs/2304.08177},
    year={2023}
}

For analysis of whether to expand the word list, please refer to the quotation: https://arxiv.org/abs/2403.01851

 @article{chinese-mixtral,
      title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, 
      author={Cui, Yiming and Yao, Xin},
      journal={arXiv preprint arXiv:2403.01851},
      url={https://arxiv.org/abs/2403.01851},
      year={2024}
}

Disclaimer

This project is developed based on the Llama-3 model released by Meta. Please strictly abide by the open source license agreement of Llama-3 during use. If using third-party code is involved, be sure to comply with the relevant open source license agreement. The content generated by the model may affect its accuracy due to calculation methods, random factors, and quantitative accuracy losses. Therefore, this project does not provide any guarantee for the accuracy of the model output, nor will it be liable for any losses caused by the use of relevant resources and output results. If the relevant models of this project are used for commercial purposes, the developer shall abide by local laws and regulations to ensure compliance with the output content of the model. This project shall not be liable for any products or services derived therefrom.

Question feedback

If you have any questions, please submit it in GitHub Issue. Ask questions politely and build a harmonious discussion community.

Before submitting the question, please check whether the FAQ can solve the problem. It is also recommended to check whether the previous issue can solve your problem.
To submit a question, please use the Issue template set by this project to help quickly locate specific questions.
Repeat and issues not related to this project will be processed by stable-bot. Please understand.

Cui and Yao, 2024. Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral ↩

Expand

Additional Information

Version v3.0
Type Other source code
Update Time 2025-04-16
size 1.56MB
From Github

Related Applications

node llama cpp

2024-11-11
llama models

2024-11-10
LLaMA Factory

2024-11-02
Code Llama

2023-10-30
Llama 2

2023-08-17
Alpaca Ball: Allstars

2022-08-08

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All

Chinese LLaMA Alpaca 3

Main content

news

Content guidance

Model Introduction

Use the original Llama-3 word list

? Long context length is expanded from 4K to 8K in the second generation

⚡ Use grouping to query attention mechanism

? Brand new command template

Model download

Model selection guidelines

Download address

Reasoning and deployment

Model Effect

Generate effect evaluation

Objective effect evaluation

C-Eval

CMMLU

MMLU

LongBench

Open LLM Leaderboard

Quantitative effect evaluation

Training and fine tune

Manual training and fine adjustment

Directive template

Instruction data

Frequently Asked Questions

Quote

Disclaimer

Question feedback

Footnotes