与Pytorch分类有关的NLP纸张实施
这些论文是在使用韩国语料库中实施的
前和用法
- 初步的
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt- 用法
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter单句分类(情感分类任务)
- 使用naver情感电影语料库v1.0(又称
nsmc) - 配置
-
conf/model/{type}.json(例如type = ["sencnn", "charcnn",...]) -
conf/dataset/nsmc.json
-
- 结构
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── nsmc.json
│ └── model
│ └── sencnn.json
├── evaluate.py
├── experiments
│ └── sencnn
│ └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── nsmc
│ ├── ratings_test.txt
│ ├── ratings_train.txt
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py| 模型\精度 | 火车(120,000) | 验证(30,000) | 测试(50,000) | 日期 |
|---|---|---|---|---|
| Sencnn | 91.95% | 86.54% | 85.84% | 20/05/30 |
| 查克恩 | 86.29% | 81.69% | 81.38% | 20/05/30 |
| convrec | 86.23% | 82.93% | 82.43% | 20/05/30 |
| vdcnn | 86.59% | 84.29% | 84.10% | 20/05/30 |
| 圣 | 90.71% | 86.70% | 86.37% | 20/05/30 |
| 埃特里伯特 | 91.12% | 89.24% | 88.98% | 20/05/30 |
| Sktbert | 92.20% | 89.08% | 88.96% | 20/05/30 |
-
句子分类的卷积神经网络(作为SENCNN)
- https://ar*xiv.o*rg/*abs/1408.5882
-
字符级卷积网络用于文本分类(作为charcnn)
- https://arxiv.o*r**g/abs/1509.01626
-
通过组合卷积和经常性层(作为Convrec),有效的角色级文档分类
- https://a*rxi*v.o*rg/abs/1602.00367
-
文本分类的非常深的卷积网络(作为VDCNN)
- https://arx*iv.**org/abs/1606.01781
-
结构化的自我实践句子嵌入(作为SAN)
- https://*arxiv.org**/abs/1703.03130
-
bert_single_sentence_classification(作为Etribert,Sktbert)
- https://arx*iv**.org/abs/1810.04805
成对的文本分类(解释任务)
- 从https://githu*b**.com/songys/question_pair创建数据集
- 配置
-
conf/model/{type}.json(例如type = ["siam", "san",...]) -
conf/dataset/qpair.json
-
- 结构
# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── qpair.json
│ └── model
│ └── siam.json
├── evaluate.py
├── experiments
│ └── siam
│ └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── qpair
│ ├── kor_pair_test.csv
│ ├── kor_pair_train.csv
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py| 模型\精度 | 火车(6,136) | 验证(682) | 测试(758) | 日期 |
|---|---|---|---|---|
| 暹 | 93.00% | 83.13% | 83.64% | 20/05/30 |
| 圣 | 89.47% | 82.11% | 81.53% | 20/05/30 |
| 随机 | 89.26% | 82.69% | 80.07% | 20/05/30 |
| 埃特里伯特 | 95.07% | 94.42% | 94.06% | 20/05/30 |
| Sktbert | 95.43% | 92.52% | 93.93% | 20/05/30 |
-
结构化的自我实践句子嵌入(作为SAN)
- https://*arxiv.org**/abs/1703.03130
-
用于学习句子相似性的暹罗经常性架构(作为暹罗)
- https://www.*aa**ai.org/ocs/index.php/aaai/aaai16/paper/paper/viewpaper/12195
-
自然语言推断的随机答案网络(随机推理)
- https://ar**xiv.org*/abs/1804.07888
-
bert_pairwise_text_classification(作为Etribert,Sktbert)
- https://arx*iv**.org/abs/1810.04805
下载源码
通过命令行克隆项目:
git clone https://github.com/seopbo/nlp_classification.git