
This project integrates a variety of NLP tasks implemented based on the transformers library.
huggingface transformers is a very good open source framework that supports very convenient loading/training transformer models. You can see the installation methods and entry-level calls of the library here. The library can also support users to fine-tune a model very conveniently.
In this project, we have integrated some mainstream NLP tasks. You can find the corresponding tasks and replace训练数据集in the code你自己任务下的数据集to train a model that matches your own tasks.
The NLP tasks that have been implemented are as follows (updated):
Calculate the similarity between texts, and is mostly used for:
搜索召回,文本检索,蕴含识别and other tasks.
| Model | Portal |
|---|---|
| 【Supervision】Overview | [here] |
| 【Supervision】PointWise (single tower) | [here] |
| 【Supervision】DSSM (Twin Towers) | [here] |
| 【Supervision】Sentence Bert (Twin Towers) | [here] |
| 【Unsupervised】SimCSE | [here] |
Extract target information from a given text paragraph, which is mostly used for tasks
命名实体识别(NER),实体关系抽取(RE), etc.
| Model | Portal |
|---|---|
| Universe Information Extraction (UIE) | [here] |
By designing the prompt template, we can achieve better results on the pretrained model using a smaller amount of data, and it is mostly used for:
Few-Shot,Zero-Shotand other tasks.
| Model | Portal |
|---|---|
| PET (based on the method of manually defining the propt pattern) | [here] |
| p-tuning (the method for machine to automatically learn propt pattern) | [here] |
Classify a given text, which is mostly used for:
情感识别,文章分类识别and other tasks.
| Model | Portal |
|---|---|
| BERT-CLS (BERT-based classifier) | [here] |
RLHF (Reinforcement Learning from Human Feedback) uses reinforcement learning (RL) to update the language generation model (LM) through human feedback, thereby achieving better generation results (representative example: ChatGPT); usually includes two stages:
奖励模型(Reward Model)training and强化学习(Reinforcement Learning)training.
| Model | Portal |
|---|---|
| RLHF (Reward Model training, PPO update GPT2) | [here] |
Text generation (NLG), usually used for:
小说续写,智能问答,对话机器人and other tasks.
| Model | Portal |
|---|---|
| Chinese Q&A Model (T5-Based) | [here] |
| Filling model (T5-Based) | [here] |
Building a big model (LLM) zero-shot Prompt pattern(s) required to solve multiple tasks.
| Model | Portal |
|---|---|
| Text classification (chatglm-6b-Based) | [here] |
| Text matching (chatglm-6b-Based) | [here] |
| Information extraction (chatglm-6b-Based) | [here] |
| Big Model Personality Test (LLMs MBTI) | [here] |
Large model training is related to it, covering pre-training, instruction fine-tuning, reward model, and reinforcement learning.
| Model | Portal |
|---|---|
| ChatGLM-6B Finetune | [here] |
| Training large models from scratch | [here] |
Some common tools collection.
| Tool name | Portal |
|---|---|
| Tokenizer Viewer | [here] |