Automatic Speech Recognition下載 - Automatic Speech Recognition源代碼下載

Automatic Speech Recognition

Ai源碼

1.0.0

下載

自動語音識別

該項目的目的是提煉自動語音識別研究。一開始，您可以使用預訓練的模型加載現成的管道。受益於急切的TensorFlow 2.0 ，並自由監控模型的權重，激活或梯度。

 import automatic_speech_recognition as asr

file = 'to/test/sample.wav'  # sample rate 16 kHz, and 16 bit depth
sample = asr . utils . read_audio ( file )
pipeline = asr . load ( 'deepspeech2' , lang = 'en' )
pipeline . model . summary ()     # TensorFlow model
sentences = pipeline . predict ([ sample ])

我們支持英語（感謝Open Seq2Seq）。在表格中，英語基準Librispeech-Dev-Clean的評估結果在表格中。引用的是，深層語（Mozilla）的實現約為7.5％，而最先進的（亞法大學rwth”）等於2.3％（最近的評估結果可以在此處找到）。他們倆都使用外部語言模型來提高結果。相比之下，人類在這里達到5.83％（librispeech-dev-clean）

模型名稱	解碼器	wer-dev
`deepspeech2`	貪婪的	6.71

很快，您需要稍微調整管道。看看CTC管道。該管道負責將神經網絡模型與所有非差異轉換（功能提取或預測解碼）聯繫起來。管道組件是獨立的。您可以根據自己的需求進行調整，例如使用更複雜的功能提取，不同的數據增強，或添加語言模型解碼器（靜態N-gram或巨大的變壓器）。您可以做更多的事情，例如使用策略分發培訓，或者嘗試使用混合精確政策。

 import numpy as np
import tensorflow as tf
import automatic_speech_recognition as asr

dataset = asr . dataset . Audio . from_csv ( 'train.csv' , batch_size = 32 )
dev_dataset = asr . dataset . Audio . from_csv ( 'dev.csv' , batch_size = 32 )
alphabet = asr . text . Alphabet ( lang = 'en' )
features_extractor = asr . features . FilterBanks (
    features_num = 160 ,
    winlen = 0.02 ,
    winstep = 0.01 ,
    winfunc = np . hanning
)
model = asr . model . get_deepspeech2 (
    input_dim = 160 ,
    output_dim = 29 ,
    rnn_units = 800 ,
    is_mixed_precision = False
)
optimizer = tf . optimizers . Adam (
    lr = 1e-4 ,
    beta_1 = 0.9 ,
    beta_2 = 0.999 ,
    epsilon = 1e-8
)
decoder = asr . decoder . GreedyDecoder ()
pipeline = asr . pipeline . CTCPipeline (
    alphabet , features_extractor , model , optimizer , decoder
)
pipeline . fit ( dataset , dev_dataset , epochs = 25 )
pipeline . save ( '/checkpoint' )

test_dataset = asr . dataset . Audio . from_csv ( 'test.csv' )
wer , cer = asr . evaluate . calculate_error_rates ( pipeline , test_dataset )
print ( f'WER: { wer }   CER: { cer } ' )

安裝

您可以使用PIP：

pip install automatic-speech-recognition

否則，克隆代碼並通過conda創建一個新環境：

git clone https://github.com/rolczynski/Automatic-Speech-Recognition.git
conda env create -f=environment.yml     # or use: environment-gpu.yml
conda activate Automatic-Speech-Recognition