Chinese Word Segmentation in NLP下載 - Chinese Word Segmentation in NLP下載

下載

中文單詞細分

與BI-LSTMS（JI MA，Kuzman Ganchev和David Weiss，EMNLP 2018）的藝術中文單詞分割 - 2018年） - （https：//aclweb.org/anthology/d18-1529）

Python3.6.x，TensorFlow 1.12.0

在這個項目中，使用了四個中國數據集（AS，Cityu，MSR和PKU）來訓練中文單詞細分任務的深度學習模型。這些數據集可以從：http：//sighan.cs.uchicago.edu/bakeoff2005/

Run: python3 train.py

input_file_path是包含無空間中文序列的路徑。

Label_file_path是包含Bies格式的中文序列標籤的路徑。

Run: python3 preprocess.py original_file_path input_file_path output_file_path

Original_file_path是包含中文序列的文件。

input_file_path是保存無空間中文序列的路徑。

Label_file_path是保存中文序列標籤的途徑。

Run: python3 predict.py input_path output_path resources_path

Input_path是包含無空間中文序列的文件。

output_path是保存預測的路徑。

Resources_Path是保存模型的途徑。

可以從http://bit.ly/2pkgzbg下載保存的模型和附加器，並放置在資源文件夾中。

Run: python3 score.py predicition_file gold_file

Prediction_File是包含從上一步的Bies格式的predicitions的文件。

Gold_File是以零售格式的金文件的路徑。

展開

附加信息

相關應用

爲您推薦

相關資訊全部