speechT
v2Letter paper
用TensorFlow编写的OpenSource语音到文本软件。在Librispeech测试语料库中,达到8%的字母错误率,单词错误率为20% 。
需要Python3,Portaudio19-Dev和FFMPEG。
在Ubuntu安装上
sudo apt install python3-pip portaudio19-dev ffmpeg
pip3 install git+https://github.com/timediv/speechT
当前的SpeechT基于WAV2LETTER纸张和CTC损耗函数。
来自http://www.opensl.org/12/的语音语料库自动下载。
注意:语料库约为30GB!
数据必须在培训之前进行预处理
speecht-cli preprocess
然后,进行培训,执行
speecht-cli train
使用--help以获取更多详细信息。
您可以监视培训并在张板中查看其他日志
tensorboard --logdir log/
在整个测试集运行中评估
speecht-cli evaluate
评估一批
speecht-cli evaluate --step-count 1
默认情况下,使用贪婪解码。请参阅有关如何使用Kenlm进行解码的Using a language model部分。
使用--help以获取更多详细信息。
使用麦克风记录并打印预测运行
speecht-cli record
使用--help以获取更多详细信息。
您没有自己的资源可以自己训练?从这里下载权重
mkdir train
tar xf speechT-weights.tgz -C train/
然后,您可以将模型与EG评估一起使用
speecht-cli evaluate --run-name best_run
如果您想将KENLM用作解码的语言模型,则需要编译并安装tensorflow-with-kenlm。如果您仅需要Linux的CPU版本的TensorFlow版本,则也可以在此处下载。
从这里下载所有必要的文件,然后
tar xf kenlm-english.tgz
speecht-cli evaluate --language-model kenlm-english/
默认参数在NVIDIA TITAN X上进行了约5至6天的训练。

总体统计数据
Average Letter Edit Distance: 7.7125
Average Letter Error Rate: 8%
Average Word Edit Distance: 3.801953125
Average Word Error Rate: 20%
在一些示例中进行的预测和预测
expected: but that is kaffar's knife
decoded: but that is caffr's klife
LED: 4 LER: 0.15 WED: 2 WER: 0.40
expected: he moved uneasily and his chair creaked
decoded: he moved uneasily in his chair creet
LED: 5 LER: 0.13 WED: 2 WER: 0.29
expected: it is indeed true that the importance of tact and skill in the training of the young and of cultivating their reason and securing their affection can not be overrated
decoded: it is indeed true that the importance of tact and skill in the training of the young and of cultivating their reason and so carrying their affection can not be o rated
LED: 8 LER: 0.05 WED: 4 WER: 0.13
expected: she pressed his hand gently in gratitude
decoded: she pressed his hand gently in gratitude
LED: 0 LER: 0.00 WED: 0 WER: 0.00
expected: don't worry sizzle dear it'll all come right pretty soon
decoded: don't worry i l dear it all come riprety soon
LED: 13 LER: 0.23 WED: 5 WER: 0.50
expected: may we see gates at once asked kenneth
decoded: may we see gates at once asked keneth
LED: 2 LER: 0.05 WED: 1 WER: 0.12
可以在此处找到整个评估日志。