speechT
v2Letter paper
用TensorFlow編寫的OpenSource語音到文本軟件。在Librispeech測試語料庫中,達到8%的字母錯誤率,單詞錯誤率為20% 。
需要Python3,Portaudio19-Dev和FFMPEG。
在Ubuntu安裝上
sudo apt install python3-pip portaudio19-dev ffmpeg
pip3 install git+https://github.com/timediv/speechT
當前的SpeechT基於WAV2LETTER紙張和CTC損耗函數。
來自http://www.opensl.org/12/的語音語料庫自動下載。
注意:語料庫約為30GB!
數據必須在培訓之前進行預處理
speecht-cli preprocess
然後,進行培訓,執行
speecht-cli train
使用--help以獲取更多詳細信息。
您可以監視培訓並在張板中查看其他日誌
tensorboard --logdir log/
在整個測試集運行中評估
speecht-cli evaluate
評估一批
speecht-cli evaluate --step-count 1
默認情況下,使用貪婪解碼。請參閱有關如何使用Kenlm進行解碼的Using a language model部分。
使用--help以獲取更多詳細信息。
使用麥克風記錄並打印預測運行
speecht-cli record
使用--help以獲取更多詳細信息。
您沒有自己的資源可以自己訓練?從這裡下載權重
mkdir train
tar xf speechT-weights.tgz -C train/
然後,您可以將模型與EG評估一起使用
speecht-cli evaluate --run-name best_run
如果您想將KENLM用作解碼的語言模型,則需要編譯並安裝tensorflow-with-kenlm。如果您僅需要Linux的CPU版本的TensorFlow版本,則也可以在此處下載。
從這裡下載所有必要的文件,然後
tar xf kenlm-english.tgz
speecht-cli evaluate --language-model kenlm-english/
默認參數在NVIDIA TITAN X上進行了約5至6天的訓練。

總體統計數據
Average Letter Edit Distance: 7.7125
Average Letter Error Rate: 8%
Average Word Edit Distance: 3.801953125
Average Word Error Rate: 20%
在一些示例中進行的預測和預測
expected: but that is kaffar's knife
decoded: but that is caffr's klife
LED: 4 LER: 0.15 WED: 2 WER: 0.40
expected: he moved uneasily and his chair creaked
decoded: he moved uneasily in his chair creet
LED: 5 LER: 0.13 WED: 2 WER: 0.29
expected: it is indeed true that the importance of tact and skill in the training of the young and of cultivating their reason and securing their affection can not be overrated
decoded: it is indeed true that the importance of tact and skill in the training of the young and of cultivating their reason and so carrying their affection can not be o rated
LED: 8 LER: 0.05 WED: 4 WER: 0.13
expected: she pressed his hand gently in gratitude
decoded: she pressed his hand gently in gratitude
LED: 0 LER: 0.00 WED: 0 WER: 0.00
expected: don't worry sizzle dear it'll all come right pretty soon
decoded: don't worry i l dear it all come riprety soon
LED: 13 LER: 0.23 WED: 5 WER: 0.50
expected: may we see gates at once asked kenneth
decoded: may we see gates at once asked keneth
LED: 2 LER: 0.05 WED: 1 WER: 0.12
可以在此處找到整個評估日誌。