??Chinese | English | Documents/Docs | ❓ Questions/Issues | Discussions/Discussions | ⚔️ Arena/Arena

? Hugging Face • ? ModelScope • ?️ Machine Heart SOTA! Model • ? wisemodel • ? Online Demo
This project is developed based on the new generation of open source model Llama-3, which is the third phase of the Chinese-LLaMA-Alpaca series of projects related to open source model (Phase I and Phase II). This project open source Chinese Llama-3 base model and Chinese Llama-3-Instruct instruction fine-tuning model . These models use large-scale Chinese data for incremental pre-training based on the original Llama-3, and use selected instruction data for fine adjustment, further improving the basic Chinese semantics and instruction comprehension capabilities, and obtaining significant performance improvements compared with the second-generation related models.
Chinese Mixtral Mockup | Chinese LLaMA-2&Alpaca-2 Mockup | Chinese LLaMA&Alpaca Mockup | Multimodal Chinese LLaMA&Alpaca Mockup | Multimodal VLE | Chinese MiniRBT | Chinese LERT | Chinese English PERT | Chinese MacBERT | Chinese ELECTRA | Chinese XLNet | Chinese BERT | Knowledge Distillation Tool TextBrewer | Model Cropping Tool TextPruner | Distillation and Cutting Integrated GRAIN
[2024/05/30] Released the Llama-3-Chinese-8B-Instruct-v3 instruction model, which has achieved significant improvements in downstream tasks compared to v1/v2. View details: v3.0 version release log
[2024/05/08] Release the Llama-3-Chinese-8B-Instruct-v2 instruction model, directly using 5 million instruction data to fine-tune it on Meta-Llama-3-8B-Instruct. View details: v2.0 version release log
[2024/05/07] Add pre-training scripts and instruction fine-tuning scripts. View details: v1.1 version release log
[2024/04/30] Release the Llama-3-Chinese-8B base model and the Llama-3-Chinese-8B-Instruct instruction model. View details: v1.0 version release log
[2024/04/19] Officially launch the Chinese-LLaMA-Alpaca-3 project
| chapter | describe |
|---|---|
| ??♂️Model Introduction | Briefly introduce the technical characteristics of the relevant models of this project |
| ⏬Model Download | Chinese Llama-3 big model download address |
| Reasoning and deployment | Introduces how to quantify models and deploy and experience large models using a personal computer |
| ?Model effect | The effect of the model on some tasks is introduced |
| Training and fine tune | Introducing how to train and fine tune the Chinese Llama-3 model |
| ❓FAQ | Replies to some FAQs |
This project launches the Chinese open source model Llama-3-Chinese and Llama-3-Chinese-Instruct based on Meta Llama-3. The main features are as follows:
The following is a comparison of the model of this project and the recommended usage scenarios. For chat interaction, select Instruct version.
| Comparison items | Llama-3-Chinese-8B | Llama-3-Chinese-8B-Instruct |
|---|---|---|
| Model Type | Base model | Directive/Chat model (Class ChatGPT) |
| Model size | 8B | 8B |
| Training Type | Causal-LM (CLM) | Instruction fine adjustment |
| Training method | LoRA + full amount emb/lm-head | LoRA + full amount emb/lm-head |
| Initialize the model | Original Meta-Llama-3-8B | v1: Llama-3-Chinese-8B v2: Original Meta-Llama-3-8B-Instruct v3: mix of inst/inst-v2/inst-meta |
| Training materials | Unmarked universal corpus (approximately 120GB) | Marked instruction data (about 5 million pieces) |
| Vocabulary size | Original vocabulary (128,256) | Original vocabulary (128,256) |
| Supports context length | 8K | 8K |
| Input template | unnecessary | Need to apply the Llama-3-Instruct template |
| Applicable scenarios | Text continuation: Given the above text, let the model generate the following text | Command understanding: Q&A, writing, chat, interaction, etc. |
The following is a comparison between the Instruct versions. If there is no clear preference, please give priority to the Instruct-v3 version.
| Comparison items | Instruct-v1 | Instruct-v2 | Instruct-v3 |
|---|---|---|---|
| Release time | 2024/4/30 | 2024/5/8 | 2024/5/30 |
| Basic model | Original Meta-Llama-3-8B | Original Meta-Llama-3-8B-Instruct | (See training method) |
| Training method | Phase 1: 120G Chinese corpus pre-training phase 2: 5 million instruction data fine adjustment | Directly use 5 million instruction data to fine tune | Model fusion is performed using inst-v1, inst-v2, and inst-meta, and it is obtained by fine-tuning of a small amount of instruction data (~5K pieces) |
| Chinese ability [1] | 49.3 / 51.5 | 51.6 / 51.6 | 55.2 / 54.8 ?? |
| English proficiency [1] | 63.21 | 66.68 | 66.81 ?? |
| Long text capability [1] | 29.6 | 46.4 ?? | 40.5 |
| Mockup Arena Win Rate/Elo Rating [2] | 49.4% / 1430 | 66.1% / 1559 | 83.6% / 1627 ?? |
Note
[1] The Chinese ability effect comes from C-Eval (valid); the English ability effect comes from Open LLM Leaderboard (avg); the long text ability comes from LongBench (avg); please refer to the section on the model effect for details. [2] The acquisition time of the big model arena effect: 2024/5/30, for reference only.
| Model name | Full version | LoRA version | GGUF version |
|---|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 (Instruction model) | [?Hugging Face] [?ModelScope] [?wisemodel] | N/A | [?Hugging Face] [?ModelScope] |
| Llama-3-Chinese-8B-Instruct-v2 (Instruction model) | [?Hugging Face] [?ModelScope] [?wisemodel] | [?Hugging Face] [?ModelScope] [?wisemodel] | [?Hugging Face] [?ModelScope] |
| Llama-3-Chinese-8B-Instruct (Instruction model) | [?Hugging Face] [?ModelScope] [?wisemodel] | [?Hugging Face] [?ModelScope] [?wisemodel] | [?Hugging Face] [?ModelScope] |
| Llama-3-Chinese-8B (Pedestal Model) | [?Hugging Face] [?ModelScope] [?wisemodel] | [?Hugging Face] [?ModelScope] [?wisemodel] | [?Hugging Face] [?ModelScope] |
Model type description:
-im that means that the importance matrix is used for quantization, usually with lower PPL, and it is recommended to use (the usage is the same as the regular version) Note
If you cannot access HF, you can consider some mirror sites (such as hf-mirror.com). Please find and solve the specific methods yourself.
The relevant models in this project mainly support the following quantization, reasoning and deployment methods. For details, please refer to the corresponding tutorial.
| tool | Features | CPU | GPU | Quantification | GUI | API | vLLM | Tutorial |
|---|---|---|---|---|---|---|---|---|
| llama.cpp | Rich GGUF quantization options and efficient local reasoning | ✅ | ✅ | ✅ | ✅ | ✅ | [link] | |
| ?transformers | Native transformers inference interface | ✅ | ✅ | ✅ | ✅ | ✅ | [link] | |
| Imitation of OpenAI API calls | Server demo that emulates OpenAI API interface | ✅ | ✅ | ✅ | ✅ | ✅ | [link] | |
| text-generation-webui | How to deploy the front-end Web UI interface | ✅ | ✅ | ✅ | ✅ | ✅ | [link] | |
| LM Studio | Multi-platform chat software (with interface) | ✅ | ✅ | ✅ | ✅ | ✅ | [link] | |
| Ollama | Locally run mockup model reasoning | ✅ | ✅ | ✅ | ✅ | [link] |
In order to evaluate the effects of related models, this project conducted generative effect evaluation and objective effect evaluation (NLU class) respectively, and evaluated the large model from different angles. It is recommended that users test on tasks they are concerned about and select models that adapt to related tasks.
C-Eval is a comprehensive Chinese basic model evaluation suite, in which the verification set and the test set contain 1.3K and 12.3K multiple-choice questions, covering 52 subjects, respectively. Please refer to this project for C-Eval inference code: GitHub Wiki
| Models | Valid (0-shot) | Valid (5-shot) | Test (0-shot) | Test (5-shot) |
|---|---|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 | 55.2 | 54.8 | 52.1 | 52.4 |
| Llama-3-Chinese-8B-Instruct-v2 | 51.6 | 51.6 | 49.7 | 49.8 |
| Llama-3-Chinese-8B-Instruct | 49.3 | 51.5 | 48.3 | 49.4 |
| Llama-3-Chinese-8B | 47.0 | 50.5 | 46.1 | 49.0 |
| Meta-Llama-3-8B-Instruct | 51.3 | 51.3 | 49.5 | 51.0 |
| Meta-Llama-3-8B | 49.3 | 51.2 | 46.1 | 49.4 |
| Chinese-Mixtral-Instruct (8x7B) | 51.7 | 55.0 | 50.0 | 51.5 |
| Chinese-Mixtral (8x7B) | 45.8 | 54.2 | 43.1 | 49.1 |
| Chinese-Alpaca-2-13B | 44.3 | 45.9 | 42.6 | 44.0 |
| Chinese-LLaMA-2-13B | 40.6 | 42.7 | 38.0 | 41.6 |
CMMLU is another comprehensive Chinese evaluation dataset, specifically used to evaluate the knowledge and reasoning ability of language models in the Chinese context, covering 67 topics from basic subjects to advanced professional level, with a total of 11.5K multiple-choice questions. Please refer to this project for CMMLU inference code: GitHub Wiki
| Models | Test (0-shot) | Test (5-shot) |
|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 | 54.4 | 54.8 |
| Llama-3-Chinese-8B-Instruct-v2 | 51.8 | 52.4 |
| Llama-3-Chinese-8B-Instruct | 49.7 | 51.5 |
| Llama-3-Chinese-8B | 48.0 | 50.9 |
| Meta-Llama-3-8B-Instruct | 53.0 | 53.5 |
| Meta-Llama-3-8B | 47.8 | 50.8 |
| Chinese-Mixtral-Instruct (8x7B) | 50.0 | 53.0 |
| Chinese-Mixtral (8x7B) | 42.5 | 51.0 |
| Chinese-Alpaca-2-13B | 43.2 | 45.5 |
| Chinese-LLaMA-2-13B | 38.9 | 42.5 |
MMLU is an English evaluation dataset for evaluating natural language comprehension ability. It is one of the main datasets used to evaluate large model capabilities today. The verification set and test set contain 1.5K and 14.1K multiple-choice questions, respectively, covering 57 subjects. Please refer to this project for MMLU inference code: GitHub Wiki
| Models | Valid (0-shot) | Valid (5-shot) | Test (0-shot) | Test (5-shot) |
|---|---|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 | 64.7 | 65.0 | 64.8 | 65.9 |
| Llama-3-Chinese-8B-Instruct-v2 | 62.1 | 63.9 | 62.6 | 63.7 |
| Llama-3-Chinese-8B-Instruct | 60.1 | 61.3 | 59.8 | 61.8 |
| Llama-3-Chinese-8B | 55.5 | 58.5 | 57.3 | 61.1 |
| Meta-Llama-3-8B-Instruct | 63.4 | 64.8 | 65.1 | 66.4 |
| Meta-Llama-3-8B | 58.6 | 62.5 | 60.5 | 65.0 |
| Chinese-Mixtral-Instruct (8x7B) | 65.1 | 69.6 | 67.5 | 69.8 |
| Chinese-Mixtral (8x7B) | 63.2 | 67.1 | 65.5 | 68.3 |
| Chinese-Alpaca-2-13B | 49.6 | 53.2 | 50.9 | 53.5 |
| Chinese-LLaMA-2-13B | 46.8 | 50.0 | 46.6 | 51.8 |
LongBench is a benchmark for evaluating long text comprehension ability of a large model. It consists of 6 major categories and 20 different tasks. The average length of most tasks is between 5K-15K, and contains about 4.75K test data. The following is the evaluation effect of this project model on this Chinese task (including code tasks). Please refer to this project for LongBench inference code: GitHub Wiki
| Models | Single document QA | Multi-document QA | summary | FS Learning | Code | synthesis | average |
|---|---|---|---|---|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 | 20.3 | 28.8 | 24.5 | 28.1 | 59.4 | 91.9 | 40.5 |
| Llama-3-Chinese-8B-Instruct-v2 | 57.3 | 27.1 | 13.9 | 30.3 | 60.6 | 89.5 | 46.4 |
| Llama-3-Chinese-8B-Instruct | 44.1 | 24.0 | 12.4 | 33.5 | 51.8 | 11.5 | 29.6 |
| Llama-3-Chinese-8B | 16.4 | 19.3 | 4.3 | 28.7 | 14.3 | 4.6 | 14.6 |
| Meta-Llama-3-8B-Instruct | 55.1 | 15.1 | 0.1 | 24.0 | 51.3 | 94.5 | 40.0 |
| Meta-Llama-3-8B | 21.2 | 22.9 | 2.7 | 35.8 | 65.9 | 40.8 | 31.6 |
| Chinese-Mixtral-Instruct (8x7B) | 50.3 | 34.2 | 16.4 | 42.0 | 56.1 | 89.5 | 48.1 |
| Chinese-Mixtral (8x7B) | 32.0 | 23.7 | 0.4 | 42.5 | 27.4 | 14.0 | 23.3 |
| Chinese-Alpaca-2-13B-16K | 47.9 | 26.7 | 13.0 | 22.3 | 46.6 | 21.5 | 29.7 |
| Chinese-LLaMA-2-13B-16K | 36.7 | 17.7 | 3.1 | 29.8 | 13.8 | 3.0 | 17.3 |
| Chinese-Alpaca-2-7B-64K | 44.7 | 28.1 | 14.4 | 39.0 | 44.6 | 5.0 | 29.3 |
| Chinese-LLaMA-2-7B-64K | 27.2 | 16.4 | 6.5 | 33.0 | 7.8 | 5.0 | 16.0 |
Open LLM Leaderboard is a large model comprehensive capability evaluation benchmark (English) initiated by the HuggingFaceH4 team, including 6 single tests including ARC, HellaSwag, MMLU, TruthfulQA, Winograde, GSM8K. The following is the evaluation effect of this project model on this list.
| Models | ARC | HellaS | MMLU | TQA | WinoG | GSM8K | average |
|---|---|---|---|---|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 | 63.40 | 80.51 | 67.90 | 53.57 | 76.24 | 59.21 | 66.81 |
| Llama-3-Chinese-8B-Instruct-v2 | 62.63 | 79.72 | 66.48 | 53.93 | 76.72 | 60.58 | 66.68 |
| Llama-3-Chinese-8B-Instruct | 61.26 | 80.24 | 63.10 | 55.15 | 75.06 | 44.43 | 63.21 |
| Llama-3-Chinese-8B | 55.88 | 79.53 | 63.70 | 41.14 | 77.03 | 37.98 | 59.21 |
| Meta-Llama-3-8B-Instruct | 60.75 | 78.55 | 67.07 | 51.65 | 74.51 | 68.69 | 66.87 |
| Meta-Llama-3-8B | 59.47 | 82.09 | 66.69 | 43.90 | 77.35 | 45.79 | 62.55 |
| Chinese-Mixtral-Instruct (8x7B) | 67.75 | 85.67 | 71.53 | 57.46 | 83.11 | 55.65 | 70.19 |
| Chinese-Mixtral (8x7B) | 67.58 | 85.34 | 70.38 | 46.86 | 82.00 | 0.00 | 58.69 |
Note: The main reason for the difference between MMLU results is that the evaluation scripts are different.
Under llama.cpp, the quantitative performance of Llama-3-Chinese-8B (base model) was tested as shown in the table below. The actual test speed is slightly slower than the second-generation Llama-2-7B.
| F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | Q2_K | |
|---|---|---|---|---|---|---|---|---|---|
| Size (GB) | 14.97 | 7.95 | 6.14 | 5.34 | 5.21 | 4.58 | 4.34 | 3.74 | 2.96 |
| BPW | 16.00 | 8.50 | 6.56 | 5.70 | 5.57 | 4.89 | 4.64 | 4.00 | 3.16 |
| PPL | 5.130 | 5.135 | 5.148 | 5.181 | 5.222 | 5.312 | 5.549 | 5.755 | 11.859 |
| PP Speed | 5.99 | 6.10 | 7.17 | 7.34 | 6.65 | 6.38 | 6.00 | 6.85 | 6.43 |
| TG Speed | 44.03 | 26.08 | 21.61 | 22.33 | 20.93 | 18.93 | 17.09 | 22.50 | 19.21 |
Note
This project Llama-3-Chinese-Instruct continues to use the original Llama-3-Instruct instruction template. Here are a set of conversation examples:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>>
You are a helpful assistant. <|eot_id|><|start_header_id|>user<|end_header_id|>>
Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>>
Hello! Is there anything that can help you? <|eot_id|>
The following are some of the open source instructions data of this project. For details, please check: Command data
| Data name | illustrate | quantity |
|---|---|---|
| alpaca_zh_51k | Alpaca data translated using gpt-3.5 | 51K |
| stem_zh_instruction | STEM data crawled using gpt-3.5, including physics, chemistry, medicine, biology, and earth sciences | 256K |
| ruozhiba_gpt4 | Ruozhiba Q&A data obtained using GPT-4o and GPT-4T | 2449 |
Please be sure to check whether the solution already exists in the FAQ before submitting the Issue. For specific questions and answers, please refer to this project GitHub Wiki
问题1:为什么没有像一期、二期项目一样做词表扩充?
问题2:会有70B版本发布吗?
问题3:为什么指令模型不叫Alpaca了?
问题4:本仓库模型能否商用?
问题5:为什么不对模型做全量预训练而是用LoRA?
问题6:为什么Llama-3-Chinese对话效果不好?
问题7:为什么指令模型会回复说自己是ChatGPT?
问题8:Instruct模型的v1(原版)和v2有什么区别?
If you have used relevant resources for this project, please refer to the technical report citing this project: https://arxiv.org/abs/2304.08177
@article{chinese-llama-alpaca,
title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca},
author={Cui, Yiming and Yang, Ziqing and Yao, Xin},
journal={arXiv preprint arXiv:2304.08177},
url={https://arxiv.org/abs/2304.08177},
year={2023}
}
For analysis of whether to expand the word list, please refer to the quotation: https://arxiv.org/abs/2403.01851
@article{chinese-mixtral,
title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral},
author={Cui, Yiming and Yao, Xin},
journal={arXiv preprint arXiv:2403.01851},
url={https://arxiv.org/abs/2403.01851},
year={2024}
}
This project is developed based on the Llama-3 model released by Meta. Please strictly abide by the open source license agreement of Llama-3 during use. If using third-party code is involved, be sure to comply with the relevant open source license agreement. The content generated by the model may affect its accuracy due to calculation methods, random factors, and quantitative accuracy losses. Therefore, this project does not provide any guarantee for the accuracy of the model output, nor will it be liable for any losses caused by the use of relevant resources and output results. If the relevant models of this project are used for commercial purposes, the developer shall abide by local laws and regulations to ensure compliance with the output content of the model. This project shall not be liable for any products or services derived therefrom.
If you have any questions, please submit it in GitHub Issue. Ask questions politely and build a harmonious discussion community.
Cui and Yao, 2024. Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral ↩