ChineseNER Download - ChineseNER Source code download

ChineseNER

Python

1.0.0

Download

ChineseNER

This project uses

python 2.7
tensorflow 1.7.0
pytorch 0.4.0

If you don’t understand named entity recognition, you can read this article first. By the way, please star~

This is the easiest way to identify a BiLSTM+CRF model for a named entity.

data

There are three open source data sets available in the data folder, Boson Data (https://bosonnlp.com), 1998 People's Daily Labeled Data, and MSRA Microsoft Asia Research Institute open source data. Among them, there are 6 entity types in the boson data set. People's Daily Corpus and MSRA generally only extract three entity types: person name, place name, and organization name.

First run the python file in the data to process the data for the model to use.

tensorflow version

Start training

Start training with python train.py , and the trained model will be saved in the model folder.

Use pre-trained word vectors

Using python train.py pretrained will start training using pretrained word vectors. vec.txt is a smaller pretrained word vector found online. You can refer to my code to modify it to use other better pretrained word vectors.

Test the trained model

Use python train.py test for testing, and the latest model in the model folder will be automatically read and enter Chinese to test. The quality of the test results depends on the accuracy of the model.

File-level entity extraction

Use python train.py input_file output_file for file-level entity extraction.

It can automatically read the latest model in the model folder, extract the entities in input_file and write them into output_file . First there is the original sentence, then the entity type and entity (can be modified as needed).

For example, python train.py test1.txt res.txt , res.txt content is as follows:

Additional modifications are added from time to time. .

pytorch version

Use the Bilstm+crf model in pytorch tutorial directly.

Just run train.py training. Since we use CPU and do not use batch, the training speed is super slow. If you want to simply run the code, it is recommended to use only some data to run it. pytorch will not be updated for the time being.

Accuracy

The parameters were not adjusted too carefully. The f value of the boson data set was around 70%~75%, and the f value of the People's Daily and MSRA data sets was around 85%~90%. (After all, boson has 6 entity types, and the other two have only 3 types)

Update log

2018-9-15 Added tensorflow version.

2018-9-17 Added the People's Daily data set in 1998 and the MSRA Microsoft Asia Research Institute data set.

2018-9-19 The code style was simply modified and the model was extracted for future expansion.

2018-9-22 Added python train.py test function.

2018-10-6 Add the use parameters to determine whether to use pre-trained word vectors for training.

2018-10-11 Added function: It can extract entities from a text file and write them to another file.

Expand

Additional Information

Version 1.0.0
Type Python
Update Time 2025-07-13
size 13.52MB
From Github

Related Applications

ToDo Co

2025-03-22
Python Portfolio

2024-11-10
Redash open source data chart tool v24.10.0

2024-11-27
datamule python

2024-11-08
smartchart data visualization platform v6.9

2024-11-27
Locust load testing tool v2.32.0

2024-11-27

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ToDo Co

Python

1.0.0
Python Portfolio

Python
datamule python

Python
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All