Project Description
Implementation of common tasks in the NLP field includes new word discovery, as well as word vectors based on pytorch, Chinese text classification, entity recognition, text generation, sentence similarity judgment, triple extraction, pre-trained models, etc.
rely
python 3.7
pytorch 1.8.0
torchtext 0.9.1
optuna 2.6.0
transformers 3.0.2
Table of contents
0. New word discovery algorithm
1. Word vector
- 1-1. Word2Vec (Skip-gram)
- 1-2. Glove
2. Text classification (optuna is used to adjust parameters internally)
- 2-1. TextCNN
- 2-2. FastText
- 2-3. TextRCNN
- 2-4. TextRNN_Att
- 2-5. DPCNN
- 2-6. XGBoost
- 2-7. Distill_& fine tune Bert
- 2-8. Pattern-Exploiting-Training Use MLM to classify text
- 2-9. R-Drop
Data set (data folder): a binary public opinion data set, divided as follows:
| Dataset | Data volume |
|---|
| Training set | 56700 |
| Verification Set | 7000 |
| Test set | 6300 |
3. Entity Identification NER
- 3-1. Bert-MRC
- 3-2. Bert-CRF
- 3-3. Bert-Label-Semantics
- 3-4. Bert-MLM
4. Text summary generation
1). Generation formula
- 4-1. Seq2seq model
- 4-2. Seq2seq model + attention mechanism
- 4-3. Transformer model
- 4-4. GPT summary generation
- 4-5. Bert-seq2seq
2). Extraction
- 4-6. Bert-extractive-summarizer
5. Sentence similarity discrimination
6. Multi-label classification
- 6-1. MultiLabel-Classification
7. Triple extraction
8. Pre-trained model (ELECTRA + SimCSE)
- 8-1. Pretrained-Language-Model
9. Tip to learn
10. PaperwithCode
This folder records some papers and their corresponding model code:
- 10.1. Co-Interactive-Transformer
- 10.2. Lattice_LSTM
11. QA
This folder records a simple summary of some knowledge points of machine learning/deep learning.