TensorflowASR下載TensorflowASR源代碼下載

TensorflowASR

C/C++

1.0.0

下載

TensorflowASR

基於Conformer的Tensorflow 2版本的端到端語音識別模型，並且CPU的RTF(實時率)在0.1左右

當前branch為V2版本，為CTC+translate結構

歡迎使用並反饋bug

舊版請看V1版本

項目對比

Aishell-1 上訓練結果：

離線結果

Name	參數量	中文CER	訓練輪數	online/offline	測試數據	解碼方式
Wenet(Conformer)	9.5M	6.48%	100	offline	aishell1-test	ctc_greedy
Wenet(transformer)	9.7M	8.68%	100	offline	aishell1-test	ctc_greedy
Wenet(Paraformer)	9.0M	6.99%	100	offline	aishell1-test	paraformer_greedy
FunASR(Paraformer)	9.5M	6.37%	100	offline	aishell1-test	paraformer_greedy
FunASR(Conformer)	9.5M	6.64%	100	offline	aishell1-test	ctc_greedy
FunASR(e_branchformer)	10.1M	6.65%	100	offline	aishell1-test	ctc_greedy
repo(ConformerCTC)	10.1M	6.8%	100	offline	aishell1-test	ctc_greedy

流式結果

Name	參數量	中文CER	訓練輪數	online/offline	測試數據	解碼方式
Wenet(U2++Conformer)	10.6M	8.18%	100	online	aishell1-test	ctc_greedy
Wenet(U2++transformer)	10.3M	9.88%	100	online	aishell1-test	ctc_greedy
repo(StreamingConformerCTC)	10.1M	7.2%	100	online	aishell1-test	ctc_greedy
repo(ChunkConformer)	10.7M	8.9%	100	online	aishell1-test	ctc_greedy

實現功能

VAD+降噪
在線流式識別/離線識別
標點恢復
TTS數據增強
音色轉換數據增強
遠近場數據增強

其它項目

TTS：https://github.com/Z-yq/TensorflowTTS

NLU: -

BOT: -

TTS數據增強系統

沒有數據也可以達到一定水平的ASR效果喲。

針對ASR的TTS：訓練數據為aishell1和aishell3，數據類型比較適合ASR。

tips:

一共有500個音色
僅支持中文
如果待合成文本有標點符號請手動去除
如果想添加停頓，請在文本中間添加sil

step1：準備一個待合成的文本列表,假如命名為text.list, egs：

这是第一句话
这是第二句话
这是一句sil有停顿的话
...

step2: 下載model

鏈接：https://pan.baidu.com/s/1deN1PmJ4olkRKw8ceQrUNA 提取碼：c0tp

兩個都要下載，然後放到目錄./augmentations/tts_for_asr/models 下面

step3：然後在根目錄下運行腳本：

 python . / augmentations / tts_for_asr / tts_augment . py - f text . list - o save_dir - - voice_num 10 - - vc_num 3

其中：

-f 是step1準備的列表

-o 用於保存合成的語料路徑，建議是絕對路徑。

--voice_num 每句話用多少個音色合成

--vc_num 每句話使用音色轉換增強多少次

運行完畢後，會在-o 下生成wavs目錄和utterance.txt

Mel Layer

參照librosa庫，用TF2實現了語音頻譜特徵提取的層。

或者可以使用更小參數量的Leaf 。

使用:

am_data.yml

 mel_layer_type: Melspectrogram #Spectrogram/leaf
trainable_kernel: True #support train model,not recommend

Cpp Inference

已經更新基於ONNX的CPP項目，

詳見CppInference ONNX

Python Inference

基於ONNX的python inference方案，詳情見python inference

Streaming Conformer

現在支持流式的Conformer結構啦。

當前實現了兩種方式：

Block Conformer + Global CTC
- 可用於有VAD的短時識別系統，global CTC 來構建上下文信息。

Chunk Conformer + CTC Picker

參考了百度的SMLTA2，先利用音素CTC採樣出有效的Feature，再給到lookahead的chunk conformer進行上下文信息構建做出預測。可用於長時間的流式識別系統。

Pretrained Model

所有結果測試於AISHELL TEST數據集.

RTF (實時率) 測試於CPU單核解碼任務。

AM:

Model Name	Mel layer(USE/TRAIN)	link	code	train data	phoneme CER(%)	Params Size	RTF
ConformerCTC(S)	True/False	pan.baidu.com/s/1k6miY1yNgLrT0cB-xsqqag	8s53	aishell-1(50 epochs)	6.4	10M	0.056
StreamingConformerCTC	True/False	pan.baidu.com/s/1Rc0x7LOiExaAC0GNhURkHw	zwh9	aishell-1(50 epochs)	7.2	15M	0.08
ChunkConformer	True/False	pan.baidu.com/s/1o_x677WUyWNld-8sNbydxg	ujmg	aishell-1(50 epochs)	11.4	15M	0.1

VAD:

Model Name	link	code	train data	params size	RTF
8k_online_vad	pan.baidu.com/s/1ag9VwTxIqW4C2AgF-6nIgg	ofc9	openslr開源數據	80K	0.0001

Punc:

Model Name	link	code	train data	acc	params size	RTF
PuncModel	pan.baidu.com/s/1gtvRKYIE2cAbfiqBn9bhaw	515t	NLP開源數據	95%	600K	0.0001

使用：

test_asr.py 中將model轉成onnx文件放入pythonInference中

Community

歡迎加入，討論和分享問題。群已滿200人需邀請進入，請添加備註"TensorflowASR"。

What's New?

Supported Structure

CTC + Streaming

Supported Models

Conformer
BlockConformer
ChunkConformer

Requirements

Python 3.6+
Tensorflow 2.8+: pip install tensorflow-gpu 可以参考https://www.bilibili.com/read/cv14876435
librosa
pypinyin if you need use the default phoneme
keras-bert
addons For LAS structure,pip install tensorflow-addons
tqdm
tf2onnx
rir_generator pip install rir-generator
onnxruntime pip install onnxruntime or pip install onnxruntime-gpu

Usage

準備train_list和test_list.

asr_train_list格式，其中't'為tap，建議用程序寫入一個文本文件中，路徑+'t'+文本

 wav_path = "xxx/xx/xx/xxx.wav"
wav_label = "这是个例子"
with open ( 'train.list' , 'w' , encoding = 'utf-8' ) as f :
  f . write ( wav_path + ' t ' + wav_label + ' n ' ) :

例如得到的train.list：

 /opt/data/test.wav	这个是一个例子
......

以下為vad和標點恢復的訓練數據準備格式（非必需）：

vad_train_list格式:

 wav_path1
wav_path2
……

例如：

 /opt/data/test.wav

vad訓練內部處理邏輯是靠能量做訓練樣本，所以確保你準備的訓練語料是安靜條件下錄製的。

punc_train_list格式：

 text1
 text2
 ……

同LM的格式，每行的text包含標點，目前標點只支持每個字後跟一個標點，連續的標點視為無效。

比如：

这是：一个例子哦。 √(正确格式）

这是：“一个例子哦”。 ×(错误格式）

这是：一个例子哦“。 ×(错误格式）

下載bert的預訓練模型，用於標點恢復模型的輔助訓練，如果你不需要標點恢復可以跳過:
```
 https://pan.baidu.com/s/1_HDAhfGZfNhXS-cYoLQucA extraction code: 4hsa
```
修改配置文件am_data.yml (./asr/configs)來設置一些訓練的選項，以及修改model yaml （如：./asr/configs/conformer.yml）裡的name參數來選擇模型結構。

然後執行命令:

python train_asr.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml

想要測試時，可以參考./test_asr.py裡寫的demo,當然你可以修改stt方法來適應你的需求:
```
 python . / test_asr . py  
```

也可以使用Tester來大批量測試數據驗證你的模型性能:

執行:

python eval_am.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml

該腳本將展示SER/CER/DEL/INS/SUB幾項指標

6.訓練VAD或者標點恢復模型，請參照以上步驟。

Tips

如果你想用你自己的音素，需要對應am_dataloader.py裡的轉換方法。

 def init_text_to_vocab ( self ): #keep the name
    
    def text_to_vocab_func ( txt ):
        return your_convert_function

    self . text_to_vocab = text_to_vocab_func #here self.text_to_vocab is a function,not a call

不要忘記你的音素列表用<S>和</S>打頭,eg:

    <S>
    </S>
    de
    shì
    ……

References

參考了以下優秀項目：

https://github.com/usimarit/TiramisuASR

https://github.com/noahchalifour/warp-transducer

https://github.com/PaddlePaddle/DeepSpeech

https://github.com/baidu-research/warp-ctc

Licence

允許並感謝您使用本項目進行學術研究、商業產品生產等，但禁止將本項目作為商品進行交易。

Overall, Almost models here are licensed under the Apache 2.0 for all countries in the world.

Allow and thank you for using this project for academic research, commercial product production, allowing unrestricted commercial and non-commercial use alike.

However, it is prohibited to trade this project as a commodity.

展開

附加信息