paraphrase id tensorflow下载 - paraphrase id tensorflow源代码下载

paraphrase id tensorflow

其他源码

1.0.0

下载

释义-ID-tensorflow

在TensorFlow（1.1.0）中实现的释义标识的各种模型和代码。

我非常注意记录代码，并在整个模型中的各个步骤中解释了我在做什么；希望对于那些希望开始使用TensorFlow的人来说，这将是教学示例代码！

到目前为止，该仓库已经实施：

基本的Siamese LSTM基线，基于Mueller，Jonas和Aditya Thyagarajan的模型。 “学习句子相似性的暹罗复发架构。” AAAI（2016）。
如Liu，Yang等人所述，具有添加“匹配层”的暹罗LSTM模型。 “使用双向LSTM模型和内部注意力学习自然语言推断。” CORR ABS/1605.09090（2016）。
来自Wang，Zhiguo等人的双边多方面匹配模型或更无效的状态。 “与自然语言句子相匹配的双边多角度。” CORR ABS/1702.03814（2017）。

PR添加更多模型 /优化或修补现有的PR非常欢迎！大部分模型代码位于副词/型号中

许多数据处理代码均取自Allenai / deep_qa的 /启发，如果您喜欢该项目的结构化，请查看它们！

安装

该项目是在Python 3.5上进行了开发的，并已在其他版本的Python上进行了测试，并且包装要求符合requirements.txt 。

安装要求：

 pip install -r requirements.txt

请注意，安装要求后，您必须通过运行（在外壳中）下载必要的NLTK数据：

 python -m nltk.downloader punkt

GPU培训和推理

请注意， requirements.txt文件将tensorflow指定为依赖项，这是TensorFlow的CPU结合版本。如果您有GPU，则应卸载此CPU TensorFlow并通过运行安装GPU版本：

 pip uninstall tensorflow
pip install tensorflow-gpu

获取 /处理数据

首先，运行以下内容以生成用于存储数据，训练模型和日志的辅助目录：

 make aux_dirs

此外，如果您想使用验证的手套向量，请运行：

 make glove

它将下载验证的手套向量到data/external/ 。在同一目录中提取文件。

Quora问题对

要使用Quora问题对数据，请从Kaggle下载数据集（可能需要帐户）。将下载的ZIP档案放在data/raw/中，然后将文件提取到同一目录。

然后，运行：

 make quora_data

为了自动清洁和处理scripts/data/quora中的脚本的数据。

运行模型

要使用模型训练模型或负载 +预测，然后用python <script_path>在scripts/run_model/中运行脚本。您可以通过运行python <script_path> -h获取有关其获取的参数的其他文档

这是基线暹罗比尔斯特的示例运行命令：

 python scripts/run_model/run_siamese.py train --share_encoder_weights --model_name=baseline_siamese --run_id=0

这是带有匹配层的暹罗bilstm的示例运行命令：

 python scripts/run_model/run_siamese_matching_bilstm.py train --share_encoder_weights --model_name=siamese_matching --run_id=0

这是BIMPM模型的示例运行命令：

 python scripts/run_model/run_bimpm.py train --early_stopping_patience=5 --model_name=biMPM --run_id=0

请注意，默认值可能不是您使用的理想选择，因此请随时随意转动旋钮。

贡献者

尼尔森刘
奥马尔·汗（Omar Khan）

贡献

您对如何改善此存储库有想法吗？有功能请求，错误报告还是补丁？随时开放问题或公关，因为我很乐意解决问题并查看拉力请求。

项目组织

 ├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- Original immutable data (e.g. Quora Question Pairs).
|
├── logs               <- Logs from training or prediction, including TF model summaries.
│
├── models             <- Serialized models.
|
├── requirements.txt   <- The requirements file for reproducing the analysis environment
│
├── duplicate_questions<- Module with source code for models and data.
│   ├── data           <- Methods and classes for manipulating data.
│   │
│   ├── models         <- Methods and classes for training models.
│   │
│   └── util           <- Various helper methods and classes for use in models.
│
├── scripts            <- Scripts for generating the data
│   ├── data           <- Scripts to clean and split data
│   │
│   └── run_model      <- Scripts to train and predict with models.
│
└── tests              <- Directory with unit tests.

展开

附加信息