PBLM Domain Adaptation
1.0.0
ผู้เขียน: Yftah Ziser, ROI Reichart (Technion - สถาบันเทคโนโลยีแห่งอิสราเอล)
นี่คือที่เก็บรหัสที่ใช้ในการสร้างผลลัพธ์ที่ปรากฏในการสร้างแบบจำลองภาษาที่ใช้เดือยเพื่อปรับปรุงการปรับตัวของโดเมนระบบประสาท
หากคุณใช้การใช้งานนี้ในบทความของคุณโปรดอ้างอิง :)
@inproceedings { ziser2018pivot ,
title = { Pivot Based Language Modeling for Improved Neural Domain Adaptation } ,
author = { Ziser, Yftah and Reichart, Roi } ,
booktitle = { Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) } ,
volume = { 1 } ,
pages = { 1241--1251 } ,
year = { 2018 }
}PBLM ต้องการแพ็คเกจต่อไปนี้:
Python> = 2.7
นม
คนขี้เกียจ
Theano/Tensorflow
เครส
Scikit-learn
คุณสามารถค้นหาตัวอย่างที่อธิบายได้ใน Run.py:
import tr
import sentiment
import pre
import os
import itertools
if __name__ == '__main__' :
domain = []
domain . append ( "books" )
domain . append ( "kitchen" )
domain . append ( "dvd" )
domain . append ( "electronics" )
# training the PBLM model in order to create structure aware representation for domain adaptation
#input:
# shared representation for both source domain and target domain
# first param: the source domain
# second param: the target domain
# third param: number of pivots
# fourth param: appearance threshold for pivots in source and target domain
# fifth param: the embedding dimension
# sixth param: maximum number of words to work with
# seventh param: maximum review length to work with
# eighth param: hidden units number for the PBLM model
#output: the software will create corresponding directory with the model
tr . train_PBLM ( domain [ 0 ], domain [ 1 ], 500 , 10 , 256 , 10000 , 500 , 256 )
# training the sentiment cnn using PBLM's representation
# shared representation for both source domain and target domain
# this phase needs a corresponding trained PBLM model in order to work
# first param: the source domain
# second param: the target domain
# third param: number of pivots
# fourth param: maximum review length to work with
# fifth param: the embedding dimension
# sixth param: maximum number of words to work with
# seventh param: hidden units number for the PBLM model
# eighth param: the number of filters for the CNN
# ninth param: the kernel size for the CNN
# output: the results file will be created in the same directory
# of the model under the results directory in the "cnn" dir
sentiment . PBLM_CNN ( domain [ 0 ], domain [ 1 ], 500 , 500 , 256 , 10000 , 256 , 250 , 3 )
# training the sentiment LSTM using PBLM's representation
# shared representation for both source domain and target domain
# this phase needs a corresponding trained PBLM model in order to work
# first param: the source domain
# second param: the target domain
# third param: number of pivots
# fourth param: maximum review length to work with
# fifth param: the embedding dimension
# sixth param: maximum number of words to work with
# seventh param: hidden units number for the PBLM model
# eighth param: hidden units number for the lstm model
# output: the results file will be created in the same directory
# of the model under the results directory in the "lstm" dir
sentiment . PBLM_LSTM ( domain [ 0 ], domain [ 1 ], 500 , 500 , 256 , 10000 , 256 , 256 )