Text Classification Download - Text Classification Source code download

Text Classification

Other source code

1.0.0

Download

Text-Classification

Project Introduction

By training the text with existing tags, the classification of new text is realized.

Update instructions

2019.3.25: The project was originally a public opinion analysis business of the company, but later it participated in some competitions and added some small functions. At that time, I just wanted to integrate some simple models of machine learning and deep learning to exercise my engineering skills. After communicating with some netizens, I felt that there was no need to build a general module (no one uses it anyway, haha~). I happened to be quite leisurely recently, so I deleted all the useless fancy parameters and functions for the purpose of being simpler, and only preprocessing and convolutional networks were retained.

Import dataset: load_data

More than 4,000 single-label e-commerce data and more than 15,000 multi-label judicial crime data have been prepared. The data are for academic research only and commercial dissemination is prohibited.

The 4,000 e-commerce data of a single label are in .csv format, which comes from real e-commerce reviews. It consists of two fields 'evaluation' and 'label', representing user comments and positive and negative tags respectively. It is recommended to read pandas, and after reading in it, it is a dataframe.
The 15,000 judicial crime data for multi-labels are in .json format, originated from the 2018 Fayan Cup Legal Intelligence Challenge (CAIL2018). It consists of two fields 'fact' and 'accusation', representing fact statements and crimes, and is a list after reading them.

 from TextClassification . load_data import load_data

# 单标签
data = load_data ( 'single' )
x = data [ 'evaluation' ]
y = [[ i ] for i in data [ 'label' ]]

# 多标签
data = load_data ( 'multiple' )
x = [ i [ 'fact' ] for i in data ]
y = [ i [ 'accusation' ] for i in data ]

Text preprocessing: DataPreprocess.py

Used to preprocess the original text data, including word segmentation, conversion encoding, length uniformity and other methods, which have been encapsulated into TextClassification.py

 preprocess = DataPreprocess ()

# 处理文本
texts_cut = preprocess . cut_texts ( texts , word_len )
preprocess . train_tokenizer ( texts_cut , num_words )
texts_seq = preprocess . text2seq ( texts_cut , sentence_len )

# 得到标签
preprocess . creat_label_set ( labels )
labels = preprocess . creat_labels ( labels )

Model training and prediction: TextClassification.py

Integrate preprocessing, network training, and network prediction. Please refer to two demo scripts for demo

The method is as follows:

fit: Enter the original text and tags, and you can continue to train based on the existing model. If you do not enter the model, you will start training again;
predict: Enter the original text;

 from TextClassification import TextClassification

clf = TextClassification ()
texts_seq , texts_labels = clf . get_preprocess ( x_train , y_train , 
                                             word_len = 1 , 
                                             num_words = 2000 , 
                                             sentence_len = 50 )
clf . fit ( texts_seq = texts_seq ,
        texts_labels = texts_labels ,
        output_type = data_type ,
        epochs = 10 ,
        batch_size = 64 ,
        model = None )

# 保存整个模块,包括预处理和神经网络
with open ( './%s.pkl' % data_type , 'wb' ) as f :
    pickle . dump ( clf , f )

# 导入刚才保存的模型
with open ( './%s.pkl' % data_type , 'rb' ) as f :
    clf = pickle . load ( f )
y_predict = clf . predict ( x_test )
y_predict = [[ clf . preprocess . label_set [ i . argmax ()]] for i in y_predict ]
score = sum ( y_predict == np . array ( y_test )) / len ( y_test )
print ( score )  # 0.9288

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-04-17
size 7.02MB
From Github

Related Applications

Text With Jesus Chinese

2023-08-23
Text With Jesus

2023-08-17
Text With Jesus Chinese version

2023-08-17
Text or Die

2023-07-03
RTE (Rich Text Editor) ASP.NET

2011-05-25
PHP Text Link Exchange

2009-04-29

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All