text segmentation下載 - text segmentation源代碼下載

text segmentation

其他源碼

1.0.0

下載

文本細分作為監督學習任務

該存儲庫包含代碼和補充材料，這些材料需要培訓和評估紙質文本細分中所述的模型作為監督的學習任務

Downalod需要資源

Wiki-727K，Wiki-50數據集：

https://www.dropbox.com/sh/k3jh0fjbyr0gw0a/aadzad9sdtrbnvs1qlcjy5cza?dl=0

Word2Vec：

https://drive.google.com/a/audioburst.com/uc?export=Download&confirm=zrin&id=0b7xkcwpi5kdynlnuttlss21pqmm

在configgenerator.py中填寫相關路徑，然後執行腳本（git存儲庫包括choi dataset）

創建環境：

 conda create -n textseg python=2.7 numpy scipy gensim ipython 
source activate textseg
pip install http://download.pytorch.org/whl/cu80/torch-0.3.0-cp27-cp27mu-linux_x86_64.whl 
pip install tqdm pathlib2 segeval tensorboard_logger flask flask_wtf nltk
pip install pandas xlrd xlsxwriter termcolor

如何運行培訓過程？

 python run.py --help

例子：

 python run.py --cuda --model max_sentence_embedding --wiki

如何評估訓練有素的模型（在WIKI-727/CHOI數據集上）？

 python test_accuracy.py  --help

例子：

 python test_accuracy.py --cuda --model <path_to_model> --wiki

如何創建一個新的Wikipedia數據集：

 python wiki_processor.py --input <input> --temp <temp_files_folder> --output <output_folder> --train <ratio> --test <ratio>

輸入是通往Wikipedia轉儲的完整路徑，溫度是通往臨時文件文件夾的路徑，輸出是新生成的Wikipedia數據集的路徑。

可以從以下URL下載Wikipedia轉儲：

https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2025-04-19
大小 5.04MB
來自於 Github

相關應用

Text With Jesus漢化

2023-08-23
與耶穌發簡訊

2023-08-17
Text With Jesus中文版

2023-08-17
發短信或死亡

2023-07-03
RTE（富文本編輯器）ASP.NET

2011-05-25
PHP文字交換鏈(Text Link Exchange)

2009-04-29

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部