Mengzi Download - Mengzi Source code download

Mengzi

Other source code

1.0.0

Download

Chinese | English

Mengzi

Although pre-trained language models have been widely used in various fields of NLP, their high time and computing power costs are still an urgent problem. This requires us to develop models with better indicators under certain computing power constraints.

Our goal is not to pursue larger model sizes, but lightweight but more powerful models, while more deployable and industrial landing-friendly.

Based on methods such as linguistic information integration and training acceleration, we developed the Mengzi series model. Thanks to the model structure consistent with BERT, the Mengzi model can quickly replace existing pretrained models.

For detailed technical reports, please refer to:

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

Update 2022-11-10

Add Guohua-Diffusion

Update 2022-10-13

Add ReGPT-125M-200G, based on Mengzi-Retrieval-LM model trained on GPT-Neo-125M

Update 2022-09-01

Add 4 BLOOM models based on Chinese corpus cropping

Update 2022-08-29

Add two open source GPT architecture models:

GPT-neo model Mengzi-GPT-neo-base based on de novo training in Chinese corpus
BLOOM model that trims multilingual versions based on Chinese corpus BLOOM-389m-zh

Update 2022-08-18

@huajingyun

Added the open source Mencius distillation model Mengzi-BERT-L6-H768. This model was obtained by mengzi-bert-large distillation.
Added the open source Mencius multitasking model Mengzi-T5-base-MT. This model is a multi-task model, based on Mengzi-T5-base, multi-task training was obtained using an additional 27 data sets and 301 propts. The Mengzi Zero-Shot open source project has provided capabilities such as entity extraction, semantic similarity, financial relationship extraction, advertising copy generation, medical field intention classification, emotional classification, comment object extraction, news classification, etc., and is available out of the box.

Update 2022-02-26

@hululuzhu Based on mengzi-t5-base, Chinese AI writing model is trained to generate poetry and pairs. For the model and specific usage, please refer to: chinese-ai-writing-share

Some generation examples:

上： 不待鸣钟已汗颜，重来试手竟何艰
下： 何堪击鼓频催泪？一别伤心更枉然
上： 北国风光，千里冰封，万里雪飘
下： 南疆气象，五湖浪涌，三江潮来

標題： 作诗：中秋
詩歌： 秋氣侵肌骨，寒光入鬢毛。雲收千里月，風送一帆高。
標題： 作诗：中秋 模仿：苏轼
詩歌： 月從海上生，照我庭下影。不知此何夕，但見天宇靜。

Update 2022-02-10

Thanks to the PaddleNLP version model and documentation provided by the PaddlePaddle team @yingyibiao.

Note: The PaddleNLP version model is not a product of Lanzhou Technology, and we do not assume corresponding responsibility for its results and results.

navigation

Model introduction
Get started quickly
Depend on installation
Contact information
Disclaimer
Literature Citations

Model introduction

Model	Parameter quantity	Applicable scenarios	Features	Download link
Mengzi-BERT-base	110M	Natural language comprehension tasks such as text classification, entity recognition, relationship extraction, and reading comprehension	The same as the BERT structure, the existing BERT weights can be replaced directly.	HuggingFace, Domestic ZIP Download, PaddleNLP
Mengzi-BERT-L6-H768	60M	Natural language comprehension tasks such as text classification, entity recognition, relationship extraction, and reading comprehension	Obtained by Mengzi-BERT-large distillation	HuggingFace
Mengzi-BERT-base-fin	110M	Natural language understanding tasks in the financial field	Training on financial corpus based on Mengzi-BERT-base	HuggingFace, Domestic ZIP Download, PaddleNLP
Mengzi-T5-base	220M	Suitable for controllable text generation tasks such as copywriting generation and news generation	The same structure as T5, does not include downstream tasks, and needs to be used after Finetune on a specific task. Unlike GPT positioning, it is not suitable for text sequel	HuggingFace, Domestic ZIP Download, PaddleNLP
Mengzi-T5-base-MT	220M	Provide Zero-Shot and Few-Shot capabilities	Multitasking model, can complete various tasks through prompt	HuggingFace
Mengzi-Oscar-base	110M	Suitable for pictures description, picture and text inspection and other tasks	Multimodal model based on Mengzi-BERT-base. Training on million-level pictures and text pairs	HuggingFace
Mengzi-GPT-neo-base	125M	Text Continuation Task	Based on Chinese corpus refrain training, suitable as a baseline model for related work	HuggingFace
BLOOM-389m-zh	389M	Text Continuation Task	The BLOOM model that trims multilingual versions based on Chinese corpus reduces the need for video memory	HuggingFace
BLOOM-800m-zh	800M	Text Continuation Task	The BLOOM model that trims multilingual versions based on Chinese corpus reduces the need for video memory	HuggingFace
BLOOM-1b4-zh	1400M	Text Continuation Task	The BLOOM model that trims multilingual versions based on Chinese corpus reduces the need for video memory	HuggingFace
BLOOM-2b5-zh	2500M	Text Continuation Task	The BLOOM model that trims multilingual versions based on Chinese corpus reduces the need for video memory	HuggingFace
BLOOM-6b4-zh	6400M	Text Continuation Task	The BLOOM model that trims multilingual versions based on Chinese corpus reduces the need for video memory	HuggingFace
ReGPT-125M-200G	125M	Text Continuation Task	Model trained on GPT-Neo-125M via https://github.com/Langboat/mengzi-retrieval-lm	HuggingFace
Guohua-Diffusion	-	Generation of Chinese painting style and text	DreamBooth training based on StableDiffusion v1.5	HuggingFace

Get started quickly

Mengzi-BERT

 # 使用 Huggingface transformers 加载
from transformers import BertTokenizer , BertModel

tokenizer = BertTokenizer . from_pretrained ( "Langboat/mengzi-bert-base" )
model = BertModel . from_pretrained ( "Langboat/mengzi-bert-base" )

or

 # 使用 PaddleNLP 加载
from paddlenlp . transformers import BertTokenizer , BertModel

tokenizer = BertTokenizer . from_pretrained ( "Langboat/mengzi-bert-base" )
model = BertModel . from_pretrained ( "Langboat/mengzi-bert-base" )

Integrated to Huggingface Spaces with Gradio. See demo:

Mengzi-T5

 # 使用 Huggingface transformers 加载
from transformers import T5Tokenizer , T5ForConditionalGeneration

tokenizer = T5Tokenizer . from_pretrained ( "Langboat/mengzi-t5-base" )
model = T5ForConditionalGeneration . from_pretrained ( "Langboat/mengzi-t5-base" )

or

 # 使用 PaddleNLP 加载
from paddlenlp . transformers import T5Tokenizer , T5ForConditionalGeneration

tokenizer = T5Tokenizer . from_pretrained ( "Langboat/mengzi-t5-base" )
model = T5ForConditionalGeneration . from_pretrained ( "Langboat/mengzi-t5-base" )

Mengzi-Oscar

Reference Documents

Depend on installation

 # 使用 Huggingface transformers 加载
pip install transformers

or

 # 使用 PaddleNLP 加载
pip install paddlenlp

Downstream tasks

CLUE score

Model	AFQMC	TNEWS	IFLYTEK	CMNLI	WSC	CSL	CMRC2018	C3	CHID
RoBERTa-wwm-ext	74.30	57.51	60.80	80.70	67.20	80.67	77.59	67.06	83.78
Mengzi-BERT-base	74.58	57.97	60.68	82.12	87.50	85.40	78.54	71.70	84.16
Mengzi-BERT-L6-H768	74.75	56.68	60.22	81.10	84.87	85.77	78.06	65.49	80.59

RoBERTa-wwm-ext score comes from CLUE baseline

Corresponding to super-registration

Task	Learning rate	Global batch size	Epochs
AFQMC	3e-5	32	10
TNEWS	3e-5	128	10
IFLYTEK	3e-5	64	10
CMNLI	3e-5	512	10
WSC	8e-6	64	50
CSL	5e-5	128	5
CMRC2018	5e-5	8	5
C3	1e-4	240	3
CHID	5e-5	256	5

Contact information

WeChat discussion group

Mail

wangyulong[at]langboat[dot]com

FAQ

Q. mengzi-bert-base The saved model size is 196M. But is the model size of bert-base around 389M? Is there any difference in the defined base, or is it missing some unnecessary content when it is saved?
A: This is because Mengzi-bert-base is trained with FP16.

Q. What is the source of data for financial pre-trained models?
A: Financial news, announcements, and research reports crawling on web pages.

Q. Is there a Tensorflow version model?
A: You can convert it by yourself.

Q. Can training code be open sourced?
A: Due to the tight coupling with internal infrastructure, there is currently no plan.

Q. How can we achieve the same effect as text generation on Langboat official website?
A: Our core text generation model is based on the T5 architecture. The basic text generation algorithm can refer to Google's T5 paper: https://arxiv.org/pdf/1910.10683.pdf. Our open source Mengzi-T5 model is the same as Google's T5 pre-trained model architecture, which is a general pre-trained model and does not have special text generation tasks. Our marketing copywriting generation feature is to use a large amount of data on it for specific downstream tasks Finetune. On this basis, in order to achieve controllable generation effects, we have built a complete set of text generation Pipelines: from data cleaning, knowledge extraction, training data construction to generation quality evaluation. Most of them are customized according to commercial implementation scenarios: different pre-training and Finetune tasks are constructed according to different business needs and different data forms. This part involves relatively complex software architectures and specific business scenarios, and we have not yet conducted open source.

Q. Can Mengzi-T5-base directly Inference?
A: We refer to T5 v1.1 and do not include downstream tasks.

Q: What should I do if I load errors with Huggingface Transformer?
A: Try adding force_download=True .

Q: Mengzi-T5-base always tends to generate candidates for word granularity when doing constrain generation, while mT5 is the opposite, word granularity is preferred. Is this the training process the word granularity process?
A: Instead of using mT5's vocabulary, we retrained the Tokenizer based on the corpus, including more vocabulary. In this way, after encode texts of the same length, the number of tokens will be smaller, the memory usage will be smaller, and the training speed will be faster.

Disclaimer

The content in this project is for technical research reference only and is not used as any concluding basis. Users may use the model at any time within the scope of the license, but we are not responsible for direct or indirect losses caused by the use of the content of the project. The experimental results presented in the technical report only show that performance under a specific data set and hyperparameter combination does not represent the nature of each model. The experimental results may change due to random number seeds and computing devices.

During the process of using this model in various ways (including but not limited to modification, direct use, and use through third parties), users shall not directly or indirectly engage in acts that violate the laws and regulations of the jurisdiction to which they belong (including but not limited to modification, direct use, and social morality in any way. Users are responsible for their own actions. The user shall bear all legal and joint liability for all disputes arising from the use of this model. We do not assume any legal or joint liability.

We have the right to interpret, modify and update this disclaimer.

Literature Citations

 @misc{zhang2021mengzi,
      title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, 
      author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou},
      year={2021},
      eprint={2110.06696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-04-19
size 462.91KB
From Github

Related Applications

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All