Text Classification下載 - Text Classification源代碼下載

Text Classification

其他源碼

1.0.0

下載

Text-Classification

項目介紹

通過對已有標籤的文本進行訓練，實現新文本的分類。

更新說明

2019.3.25：項目最初是公司的一個輿情分析業務，後來參加了一些比賽又增加了一些小功能。當時只是想著把機器學習、深度學習的一些簡單的模型整合在一起，鍛煉一下工程能力。和一些網友交流後，覺得沒必要搞一個通用型的模塊（反正也沒人用哈哈~）。最近剛好比較清閒，就本著越簡單越好的目的把沒啥用的花里胡哨的參數和函數都刪了，只保留了預處理和卷積網絡。

導入數據集:load_data

準備了單一標籤的電商數據4000多條和多標籤的司法罪名數據15000多條，數據僅供學術研究使用，禁止商業傳播。

單一標籤的電商數據4000條為.csv格式，來源於真實電商評論，由'evaluation'和'label'兩個字段組成，分別表示用戶評論和正負面標籤，建議pandas讀取，讀入後為dataframe。
多標籤的司法罪名數據15000條為.json格式，來源於2018'法研杯'法律智能挑戰賽（CAIL2018），由'fact'和'accusation'兩個字段組成，分別表示事實陳述和罪名，讀入後為列表。

 from TextClassification . load_data import load_data

# 单标签
data = load_data ( 'single' )
x = data [ 'evaluation' ]
y = [[ i ] for i in data [ 'label' ]]

# 多标签
data = load_data ( 'multiple' )
x = [ i [ 'fact' ] for i in data ]
y = [ i [ 'accusation' ] for i in data ]

文本預處理：DataPreprocess.py

用於對原始文本數據做預處理，包含分詞、轉編碼、長度統一等方法，已封裝進TextClassification.py

 preprocess = DataPreprocess ()

# 处理文本
texts_cut = preprocess . cut_texts ( texts , word_len )
preprocess . train_tokenizer ( texts_cut , num_words )
texts_seq = preprocess . text2seq ( texts_cut , sentence_len )

# 得到标签
preprocess . creat_label_set ( labels )
labels = preprocess . creat_labels ( labels )

模型訓練及預測：TextClassification.py

整合預處理、網絡的訓練、網絡的預測，demo請參考兩個demo腳本

方法如下：

fit：輸入原始文本和標籤，可以在已有的模型基礎上繼續訓練，不輸入模型則重新開始訓練；
predict：輸入原始文本；

 from TextClassification import TextClassification

clf = TextClassification ()
texts_seq , texts_labels = clf . get_preprocess ( x_train , y_train , 
                                             word_len = 1 , 
                                             num_words = 2000 , 
                                             sentence_len = 50 )
clf . fit ( texts_seq = texts_seq ,
        texts_labels = texts_labels ,
        output_type = data_type ,
        epochs = 10 ,
        batch_size = 64 ,
        model = None )

# 保存整个模块,包括预处理和神经网络
with open ( './%s.pkl' % data_type , 'wb' ) as f :
    pickle . dump ( clf , f )

# 导入刚才保存的模型
with open ( './%s.pkl' % data_type , 'rb' ) as f :
    clf = pickle . load ( f )
y_predict = clf . predict ( x_test )
y_predict = [[ clf . preprocess . label_set [ i . argmax ()]] for i in y_predict ]
score = sum ( y_predict == np . array ( y_test )) / len ( y_test )
print ( score )  # 0.9288

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2025-04-17
大小 7.02MB
來自於 Github

相關應用

Text With Jesus漢化

2023-08-23
與耶穌發簡訊

2023-08-17
Text With Jesus中文版

2023-08-17
發短信或死亡

2023-07-03
RTE（富文本編輯器）ASP.NET

2011-05-25
PHP文字交換鏈(Text Link Exchange)

2009-04-29

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部