chars2vec下載chars2vec源代碼下載

chars2vec

Ai源碼

1.0.0

下載

chars2vec

基於字符的單詞嵌入模型基於RNN

如果您要處理包含縮寫，s語，錯別字或其他特定的文本數據集的文本，則chars2vec庫可能非常有用。 chars2Vec語言模型基於單詞的符號表示 - 該模型將每個單詞映射到固定長度的向量。這些矢量表示是通過自定義神經網絡獲得的，而後者則以相似和非相似單詞的對訓練。這個自定義的神經網絡包括LSTM，單詞中的字符序列，作為其部分。該模型將單詞類似地映射到近端向量。這種方法可以為任何字符序列創建嵌入矢量空間中的嵌入。 CHARS2VEC模型不保留任何嵌入字典，而是使用驗證模型生成嵌入向量的位置。

有鑑定的尺寸模型50、100、150、200和300英語。該庫提供了方便的用戶API來訓練一組任意字符的模型。閱讀有關CHARS2VEC架構的更多詳細信息：基於角色的語言模型，用於處理拼寫錯誤和黑客中的人類語的現實世界文本。

可用於Python 2.7和3.0+的型號。

安裝

1。從源構建和安裝

下載項目源並在您的命令行中運行

 >> python setup.py install

2。通過PIP

在您的命令行中運行

 >> pip install chars2vec

用法

函數chars2vec.load_model(str path)從目錄中初始化模型並返回chars2vec.Chars2Vec對象。有5個驗證的英語模型，具有尺寸：50、100、150、200和300。要加載此預處理的模型：

 import chars2vec

# Load Inutition Engineering pretrained model
# Models names: 'eng_50', 'eng_100', 'eng_150', 'eng_200', 'eng_300'
c2v_model = chars2vec . load_model ( 'eng_50' )

方法chars2vec.Chars2Vec.vectorize_words(words)返回帶有單詞嵌入的shape (n_words, dim)的numpy.ndarray 。

 words = [ 'list' , 'of' , 'words' ]

# Create word embeddings
word_embeddings = c2v_model . vectorize_words ( words )

訓練

函數chars2vec.train_model(int emb_dim, X_train, y_train, model_chars)創建並訓練新的chars2vec模型，並返回chars2vec.Chars2Vec對象。

參數emb_dim是模型的一個維度。

參數X_train是列表或numpy.ndarray word對。參數y_train是列表或numpy.ndarray的目標值，描述了單詞的接近度。

訓練集（ X_train ， y_train ）由成對的“相似”和“不相似”單詞組成；一對“類似”單詞用0個目標值標記，而一對“不相似”的單詞為1。

參數model_chars是模型的字符列表。模型列表中不在model_chars列表中的字符將被模型忽略。

在有關CHARS2VEC的文章中閱讀有關CHARS2VEC培訓和生成培訓數據集的更多信息。

函數chars2vec.save_model(c2v_model, str path_to_model)將訓練的模型保存到目錄中。

 import chars2vec

dim = 50
path_to_model = 'path/to/model/directory'

X_train = [( 'mecbanizing' , 'mechanizing' ), # similar words, target is equal 0
           ( 'dicovery' , 'dis7overy' ), # similar words, target is equal 0
           ( 'prot$oplasmatic' , 'prtoplasmatic' ), # similar words, target is equal 0
           ( 'copulateng' , 'lzateful' ), # not similar words, target is equal 1
           ( 'estry' , 'evadin6' ), # not similar words, target is equal 1
           ( 'cirrfosis' , 'afear' ) # not similar words, target is equal 1
          ]

y_train = [ 0 , 0 , 0 , 1 , 1 , 1 ]

model_chars = [ '!' , '"' , '#' , '$' , '%' , '&' , "'" , '(' , ')' , '*' , '+' , ',' , '-' , '.' ,
               '/' , '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , ':' , ';' , '<' ,
               '=' , '>' , '?' , '@' , '_' , 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'h' , 'i' ,
               'j' , 'k' , 'l' , 'm' , 'n' , 'o' , 'p' , 'q' , 'r' , 's' , 't' , 'u' , 'v' , 'w' ,
               'x' , 'y' , 'z' ]

# Create and train chars2vec model using given training data
my_c2v_model = chars2vec . train_model ( dim , X_train , y_train , model_chars )

# Save your pretrained model
chars2vec . save_model ( my_c2v_model , path_to_model )

# Load your pretrained model 
c2v_model = chars2vec . load_model ( path_to_model )