text segmentation下载 - text segmentation源代码下载

text segmentation

其他源码

1.0.0

下载

文本细分作为监督学习任务

该存储库包含代码和补充材料，这些材料需要培训和评估纸质文本细分中所述的模型作为监督的学习任务

Downalod需要资源

Wiki-727K，Wiki-50数据集：

https://www.dropbox.com/sh/k3jh0fjbyr0gw0a/aadzad9sdtrbnvs1qlcjy5cza?dl=0

Word2Vec：

https://drive.google.com/a/audioburst.com/uc?export=Download&confirm=zrin&id=0b7xkcwpi5kdynlnuttlss21pqmm

在configgenerator.py中填写相关路径，然后执行脚本（git存储库包括choi dataset）

创建环境：

 conda create -n textseg python=2.7 numpy scipy gensim ipython 
source activate textseg
pip install http://download.pytorch.org/whl/cu80/torch-0.3.0-cp27-cp27mu-linux_x86_64.whl 
pip install tqdm pathlib2 segeval tensorboard_logger flask flask_wtf nltk
pip install pandas xlrd xlsxwriter termcolor

如何运行培训过程？

 python run.py --help

例子：

 python run.py --cuda --model max_sentence_embedding --wiki

如何评估训练有素的模型（在WIKI-727/CHOI数据集上）？

 python test_accuracy.py  --help

例子：

 python test_accuracy.py --cuda --model <path_to_model> --wiki

如何创建一个新的Wikipedia数据集：

 python wiki_processor.py --input <input> --temp <temp_files_folder> --output <output_folder> --train <ratio> --test <ratio>

输入是通往Wikipedia转储的完整路径，温度是通往临时文件文件夹的路径，输出是新生成的Wikipedia数据集的路径。

可以从以下URL下载Wikipedia转储：

https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

展开

附加信息

版本 1.0.0
类型其他源码
更新时间 2025-04-19
大小 5.04MB
来自于 Github

text segmentation

文本细分作为监督学习任务

Downalod需要资源

创建环境：

如何运行培训过程？

如何评估训练有素的模型（在WIKI-727/CHOI数据集上）？

如何创建一个新的Wikipedia数据集：

Text With Jesus汉化

与耶稣发短信

Text With Jesus中文版

发短信或死亡

RTE（富文本编辑器）ASP.NET

PHP文本交换链(Text Link Exchange)

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express