PyTorchText 다운로드 PyTorchText 소스 코드 다운로드

中文用户请查看 readme-zh.md

이것은 Zhihu Machine Learning Challenge 2017의 솔루션입니다. 우리는 963 개 팀 중 챔피언을 수상했습니다.

1. 설정

pytorch.org (Python 2, Cuda)에서 pytorch 설치
다른 층을 설치하십시오.
```
pip2 install -r requirements.txt
```

데이터 전처리를 위해서는 tf.contrib.keras.preprocessing.sequence.pad_sequences 가 필요할 수 있습니다.

시각화를위한 볼로 시작 :
```
python2 -m visdom.server
```

2. 데이터 전처리

관련 파일의 데이터 경로를 수정하십시오

2.1 WordVector 파일 -> numpy 파일

python scripts/data_process/embedding2matrix.py main char_embedding.txt char_embedding.npz 
python scripts/data_process/embedding2matrix.py main word_embedding.txt word_embedding.npz

2.2 질문 세트 -> Numpy 파일

메모리 소비, 메모리가 32g보다 큰지 확인하십시오.

python scripts/data_process/question2array.py main question_train_set.txt train.npz
python scripts/data_process/question2array.py main question_eval_set.txt test.npz

2.3 레이블 -> JSON

python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json

2.4 유효성 검사 데이터

python scripts/data_process/get_val.py

3. 훈련

모델 경로에 대한 config.py 수정하십시오

우리가 사용한 모델로가는 길 :

CNN : models/MultiCNNTextBNDeep.py
rnn − (lstm) : models/LSTMText.py
RCNN : models/RCNN.py
처음 : models/CNNText_inception.py
FastText : models/FastText3.py

3.1 데이터가없는 트라이언 모델

 # LSTM char
python2 main.py main --max_epoch=5 --plot_every=100 --env= ' lstm_char ' --weight=1 --model= ' LSTMText '  --batch-size=128  --lr=0.001 --lr2=0 --lr_decay=0.5 --decay_every=10000  --type_= ' char '   --zhuge=True --linear-hidden-size=2000 --hidden-size=256 --kmax-pooling=3   --num-layers=3  --augument=False

# LSTM word
python2 main.py main --max_epoch=5 --plot_every=100 --env= ' lstm_word ' --weight=1 --model= ' LSTMText '  --batch-size=128  --lr=0.001 --lr2=0.0000 --lr_decay=0.5 --decay_every=10000  --type_= ' word '   --zhuge=True --linear-hidden-size=2000 --hidden-size=320 --kmax-pooling=2  --augument=False

#  RCNN char
python2 main.py main --max_epoch=5 --plot_every=100 --env= ' rcnn_char ' --weight=1 --model= ' RCNN '  --batch-size=128  --lr=0.001 --lr2=0 --lr_decay=0.5 --decay_every=5000  --title-dim=1024 --content-dim=1024  --type_= ' char ' --zhuge=True --kernel-size=3 --kmax-pooling=2 --linear-hidden-size=2000 --debug-file= ' /tmp/debugrcnn ' --hidden-size=256 --num-layers=3 --augument=False

# RCNN word
main.py main --max_epoch=5 --plot_every=100 --env= ' RCNN-word ' --weight=1 --model= ' RCNN '  --zhuge=True --num-workers=4 --batch-size=128 --model-path=None --lr2=0 --lr=1e-3 --lr-decay=0.8  --decay-every=5000  --title-dim=1024 --content-dim=512  --kernel-size=3 --debug-file= ' /tmp/debugrc '  --kmax-pooling=1 --type_= ' word ' --augument=False
# CNN word
 python main.py main --max_epoch=5 --plot_every=100 --env= ' MultiCNNText ' --weight=1 --model= ' MultiCNNTextBNDeep '  --batch-size=64  --lr=0.001 --lr2=0.000 --lr_decay=0.8 --decay_every=10000  --title-dim=250 --content-dim=250    --weight-decay=0 --type_= ' word ' --debug-file= ' /tmp/debug '  --linear-hidden-size=2000 --zhuge=True  --augument=False

# inception word
python2 main.py main --max_epoch=5 --plot_every=100 --env= ' inception-word ' --weight=1 --model= ' CNNText_inception '  --zhuge=True --num-workers=4 --batch-size=512 --model-path=None --lr2=0 --lr=1e-3 --lr-decay=0.8  --decay-every=2500 --title-dim=1200 --content-dim=1200 --type_= ' word ' --augument=False                                                   
# inception char
python2 main.py main --max_epoch=5 --plot_every=100 --env= ' inception-char ' --weight=1 --model= ' CNNText_inception '  --zhuge=True --num-workers=4 --batch-size=512 --model-path=None --lr2=0 --lr=1e-3 --lr-decay=0.8  --decay-every=2500 --title-dim=1200 --content-dim=1200 --type_= ' char '   --augument=False

# FastText3 word
python2 main.py main --max_epoch=5 --plot_every=100 --env= ' fasttext3-word ' --weight=5 --model= ' FastText3 ' --zhuge=True --num-workers=4 --batch-size=512  --lr2=1e-4 --lr=1e-3 --lr-decay=0.8  --decay-every=2500 --linear_hidden_size=2000 --type_= ' word '  --debug-file=/tmp/debugf --augument=False

대부분의 경우 Finetune에 의해 점수가 높아질 수 있습니다. 예를 들어:

python2 main.py main --max_epoch=2 --plot_every=100 --env= ' LSTMText-word-ft ' --model= ' LSTMText '  --zhuge=True --num-workers=4 --batch-size=256 --model-path=None --lr2=5e-5 --lr=5e-5 --decay-every=5000 --type_= ' word '  --model-path= ' checkpoints/LSTMText_word_0.409196378421 '

3.2 데이터 오거가있는 열차 모델

훈련 명령에 --augument 추가하십시오.

3.3 점수

모델	점수
CNN_WORD	0.4103
RNN_WORD	0.4119
rcnn_word	0.4115
inceptin_word	0.4109
FASTTEXT_WORD	0.4091
RNN_CHAR	0.4031
RCNN_CHAR	0.4037
inception_char	0.4024
rcnn_word_aug	0.41344
CNN_WORD_AUG	0.41051
rnn_word_aug	0.41368
incetpion_word_aug	0.41254
FASTTEXT3_WORD_AUG	0.40853
CNN_CHAR_AUG	0.38738
RCNN_CHAR_AUG	0.39854

모델 앙상블을 사용하면 최대 0.433까지 얻을 수 있습니다.

4 테스트 및 제출

4.1 테스트

모델 : LSTMText , RCNN , MultiCNNTextBNDeep , FastText3 , CNNText_inception 포함
모델 경로 : 사전에 사전 모델로가는 길
결과 경로 : 모델을 저장하는 곳
VAL : VAL 세트 또는 테스트 세트를 테스트합니다 ..

 # LSTM
python2 test.1.py main --model= ' LSTMText '  --batch-size=512  --model-path= ' checkpoints/LSTMText_word_0.411994005382 ' --result-path= ' /data_ssd/zhihu/result/LSTMText0.4119_word_test.pth '  --val=False --zhuge=True

python2 test.1.py main --model= ' LSTMText '  --batch-size=256 --type_=char --model-path= ' checkpoints/LSTMText_char_0.403192339135 ' --result-path= ' /data_ssd/zhihu/result/LSTMText0.4031_char_test.pth '  --val=False --zhuge=True
 
# RCNN
python2 test.1.py main --model= ' RCNN '  --batch-size=512  --model-path= ' checkpoints/RCNN_word_0.411511574999 ' --result-path= ' /data_ssd/zhihu/result/RCNN_0.4115_word_test.pth '  --val=False --zhuge=True

python2 test.1.py main --model= ' RCNN '  --batch-size=512  --model-path= ' checkpoints/RCNN_char_0.403710422571 ' --result-path= ' /data_ssd/zhihu/result/RCNN_0.4037_char_test.pth '  --val=False --zhuge=True

# DeepText

python2 test.1.py main --model= ' MultiCNNTextBNDeep '  --batch-size=512  --model-path= ' checkpoints/MultiCNNTextBNDeep_word_0.410330780091 ' --result-path= ' /data_ssd/zhihu/result/DeepText0.4103_word_test.pth '  --val=False --zhuge=True
# more to go ...

4.2 앙상블

자세한 내용은 notebooks/val_ensemble.ipynb 및 notebooks/test_ensemble.ipynb 참조하십시오

5 개의 주요 파일

main.py : main (훈련)
config.py : 구성 파일
test.1.py : 테스트 용
data/ : 데이터 로더의 경우
scripts/ : 데이터 전처리 용
utils/ : 시각화를 위해 점수 및 래퍼를 계산하는 포함.
models/ : 모델
- models/BasicModel : 모델의 기본 모델.
- models/MultiCNNTextBNDeep : cnn
- models/LSTMText : rnn
- models/RCNN : RCNN
- models/CNNText_inception Inception
- models/MultiModelALL和models/MultiModelAll2
- 다른 모델
rep.py : 복제를위한 코드.
del/ : 메소드가 실패하거나 사용되지 않습니다.
notebooks/ : 노트북.

사방 모델

https://pan.baidu.com/s/1mjvtjgs passwd : tayb

확장하다

PyTorchText

中文用户请查看 readme-zh.md

1. 설정

2. 데이터 전처리

2.1 WordVector 파일 -> numpy 파일

2.2 질문 세트 -> Numpy 파일

2.3 레이블 -> JSON

2.4 유효성 검사 데이터

3. 훈련

3.1 데이터가없는 트라이언 모델

3.2 데이터 오거가있는 열차 모델

3.3 점수

4 테스트 및 제출

4.1 테스트

4.2 앙상블

5 개의 주요 파일

사방 모델

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express