This thing is a major homework for an undergraduate exam course. It was basically written in a random way. At that time, I didn’t know anything and couldn’t use all kinds of APIs, so the implementation in many places was very inappropriate. Moreover, the model has also selected the simplest model in order to save trouble, and it has no reference value at both the code level and the academic level.
In addition, since there is no authorization to use the data, I cannot disclose the data set. Please understand
Chinese Q&A system based on LSTM
This project realizes the function of finding the sentence where the answer to a given question is located in multiple sentences by establishing a two-way long and short-term memory network model. On the premise of using third-party Internet resources , develop.data can be verified by using the model trained in training.data. The MRR can reach 0.75 or above.
How to run
Environmental dependency
program Version python 3.5.2 TensorFlow 1.2.1 jieba 0.38 CUDA 8.0(8.0.61.2) cuDNN 5.1 CUDA and cuDNN are both dependencies of TensorFlow. Please check the official TensorFlow documentation to get the installation method. The rest can be installed using the
pip installcommandInstructions for use of third-party resources
- When performing word segmentation on Chinese text, jieba segmentation is used
- When encoding good words, word embedding encoding is used in order to avoid the performance losses caused by One-hot encoding. The word vector uses a 50-dimensional word vector file obtained through offline data training in Chinese Wikipedia
Run the program
After installing the dependency library, just execute main.py directly. If there is a model that has been trained, the program will prompt you whether to load the model directly or start training again.
main.py does not receive parameters. If you need to modify the configuration, please modify the code directly. There are detailed Chinese comments in the file, please modify them accordingly
taevaluation.py is an evaluation script that can provide evaluations of MRR, MAP, and ACC@1, written by the assistant teaching assistant. I made some modifications to the input and output format
About training
When you choose not to use the trained model, or there is no trained model, the program will use the data in training.data and develop.data to train the model. When using default parameters, training will consume up to 8G memory + 2G video memory. Please ensure that the computer has sufficient hardware resources in advance to prevent error reports. The complete training process took about 12 hours under my GTX 850M+i5 4210H conditions.
In addition, when I adjusted the parameters, even with the same parameters, the results of each training may still fluctuate by a maximum of 0.03 using the MRR metric, and the reason is not clear. Due to personal hardware and time limitations, only a very rough parameter adjustment has been carried out, and most parameters still have room for further optimization. If you are interested, you might as well try to optimize it.