Python toolkit for Chinese Language Understanding Evaluation benchmark.
The Python toolkit for Chinese language understanding evaluation benchmarks can quickly evaluate representative data sets and benchmark (pretrained) models, and select appropriate benchmark (pretrained) models for their own data for rapid application.
datasets, baselines, pre-trained models, corpus and leaderboard
Chinese language understanding assessment benchmarks, including representative data sets, benchmark (pretrained) models, corpus, and rankings.
We will select a series of data sets corresponding to certain representative tasks as the data set for our test benchmark. These data sets cover different tasks, data volume, and task difficulty.
Now, PyCLUE can be installed via pip:
pip install --upgrade PyCLUEOr directly install PyCLUE by git clone:
pip install git+https://www.github.com/CLUEBenchmark/PyCLUE.gitPre-trained language models are supported
Waiting for support
Note: The dataset is consistent with the dataset provided by CLUEBenchmark and is only modified accordingly in the format to suit the PyCLUE project.
数据量:训练集(34334)验证集(4316)测试集(3861)
例子:
{"sentence1": "双十一花呗提额在哪", "sentence2": "里可以提花呗额度", "label": "0"}
每一条数据有三个属性,从前往后分别是 句子1,句子2,句子相似度标签。其中label标签,1 表示sentence1和sentence2的含义类似,0表示两个句子的含义不同。
Link: https://pan.baidu.com/s/1It1SiMJbsrNl1dEOBoOGXg Extraction code: ksd1
Training model script location: PyCLUE/clue/sentence_pair/afqmc/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/afqmc/train.ipynb
Submit file script location: PyCLUE/clue/sentence_pair/afqmc/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/afqmc/predict.ipynb
This dataset comes from the news section of Toutiao, and a total of 15 categories of news were extracted, including tourism, education, finance, military, etc.
数据量:训练集(266,000),验证集(57,000),测试集(57,000)
例子:
{"label": "102", "label_des": "news_entertainment", "sentence": "江疏影甜甜圈自拍,迷之角度竟这么好看,美吸引一切事物"}
每一条数据有三个属性,从前往后分别是 分类ID,分类名称,新闻字符串(仅含标题)。
Link: https://pan.baidu.com/s/1Rs9oXoloKgwI-RgNS_GTQQ Extraction code: s9go
Training model script location: PyCLUE/clue/classification/tnews/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/tnews/train.ipynb
Submit file script location: PyCLUE/clue/classification/tnews/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/tnews/predict.ipynb
There are more than 17,000 long text labeled data about app application descriptions in this data set, including various application topics related to daily life, with a total of 119 categories: "Taxi": 0, "Map Navigation": 1, "Free WIFI": 2, "Car Rental": 3,...., "Female": 115, "Business": 116, "Cash Collection": 117, "Others": 118 (represented by 0-118 respectively).
数据量:训练集(12,133),验证集(2,599),测试集(2,600)
例子:
{"label": "110", "label_des": "社区超市", "sentence": "朴朴快送超市创立于2016年,专注于打造移动端30分钟即时配送一站式购物平台,商品品类包含水果、蔬菜、肉禽蛋奶、海鲜水产、粮油调味、酒水饮料、休闲食品、日用品、外卖等。朴朴公司希望能以全新的商业模式,更高效快捷的仓储配送模式,致力于成为更快、更好、更多、更省的在线零售平台,带给消费者更好的消费体验,同时推动中国食品安全进程,成为一家让社会尊敬的互联网公司。,朴朴一下,又好又快,1.配送时间提示更加清晰友好2.保障用户隐私的一些优化3.其他提高使用体验的调整4.修复了一些已知bug"}
每一条数据有三个属性,从前往后分别是 类别ID,类别名称,文本内容。
Link: https://pan.baidu.com/s/1EKtHXmgt1t038QTO9VKr3A Extraction code: u00v
Training model script location: PyCLUE/clue/classification/iflytek/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/iflytek/train.ipynb
Submit file script location: PyCLUE/clue/classification/iflytek/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/iflytek/predict.ipynb
CMNLI data consists of two parts: XNLI and MNLI. The data comes from fiction, telephone, travel, government, slate, etc. The original MNLI data and XNLI data were transformed in Chinese and English, retaining the original training set, combining the dev in XNLI and matched in MNLI as dev of CMNLI, combining the test in XNLI and mismatched in MNLI as CMNLI test, and disrupting the order. This data set can be used to determine the relationship between the given two sentences that are implications, neutrals, and contradictory.
数据量:train(391,782),matched(12,426),mismatched(13,880)
例子:
{"sentence1": "新的权利已经足够好了", "sentence2": "每个人都很喜欢最新的福利", "label": "neutral"}
每一条数据有三个属性,从前往后分别是 句子1,句子2,蕴含关系标签。其中label标签有三种:neutral,entailment,contradiction。
Link: https://pan.baidu.com/s/1mFT31cBs2G6e69As6H65dQ Extraction code: kigh
Training model script location: PyCLUE/clue/sentence_pair/cmnli/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/cmnli/train.ipynb
Submit file script location: PyCLUE/clue/sentence_pair/cmnli/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/cmnli/predict.ipynb
Diagnostic sets used to evaluate the performance of different models on Chinese language phenomena summarized by 9 linguists.
Using the model trained on CMNLI, the results on this diagnostic set are directly predicted. The submission format is consistent with CMNLI. You can see the results in the ranking details page. (Note: This dataset contains the training set and test set of CMNLI)
Link: https://pan.baidu.com/s/1DYDUGO6xN_4xAT0Y4aNsiw Extraction code: u194
Training model script location: PyCLUE/clue/sentence_pair/diagnostics/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/diagnostics/train.ipynb
Submit file script location: PyCLUE/clue/sentence_pair/diagnostics/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/diagnostics/predict.ipynb
Supplementary.
Multi-classification tasks, such as text classification, emotion classification, etc., can accept two forms: single sentence input and sentence pair input.
The data directory should contain at least train.txt, dev.txt and labels.txt files, and the test.txt files can be added.
Save form reference:
Single sentence input (corresponding to task_type = 'single' in the evaluation script): PyCLUE/examples/classification/single_data_templates/, https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/single_data_templates
Sentence pair input (corresponding to task_type = 'pairs' in the evaluation script): PyCLUE/examples/classification/pairs_data_templates/, https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/pairs_data_templates
Note: t should be used as the separator.
Training model script location: PyCLUE/examples/classification/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/train.ipynb
Predicted script location: PyCLUE/examples/classification/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/predict.ipynb
Sentence-to-task (twin network), such as similar sentence-to-task, etc. Different from the sentence-to-input model in multi-classification tasks: Sentences in multi-classification tasks use Bert-like splicing form to input the task, while this task uses the form of a twin network.
The data directory should contain at least train.txt, dev.txt and labels.txt files, and the test.txt files can be added.
Save form reference:
Enter: PyCLUE/examples/sentence_pair/data_templates/, https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/sentence_pair/data_templates
Note: t should be used as the separator.
Training model script location: PyCLUE/examples/sentence_pair/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/sentence_pair/train.ipynb
Predicted script location: PyCLUE/examples/sentence_pair/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/sentence_pair/predict.ipynb
Text matching tasks (twin network), such as FAQ search, QQ matching search and other tasks, use the twin network to generate embedding information for input sentences, and use hnswlib to retrieve the most similar sentences.
The data directory should contain at least cache.txt, train.txt, dev.txt and labels.txt files, and you can add test.txt files.
Save form reference:
Input: PyCLUE/examples/text_matching/data_templates/, https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/text_matching/data_templates
Note: t should be used as the separator.
Training model script location: PyCLUE/examples/text_matching/train.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/text_matching/train.ipynb
Predicted script location: PyCLUE/examples/text_matching/predict.ipynb
Reference: https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/text_matching/predict.ipynb
The model file contains 10 latest checkpoint model files and pb model files (10 checkpoint model files that perform best on the test set dev.txt).
The indicator files (train_metrics.png) generated by the training process are accuracy, total_loss, batch_loss, precision, recall and f1 indicators.
If there is a verification file test.txt and each line of the verification file starts with true_label, the indicator of the best model on the verification file is printed.
Updated.
Official address: https://github.com/CLUEBenchmark/PyCLUE
Debugging address: https://github.com/liushaoweihua/PyCLUE