Neuralsp: Pemrosesan Bicara Berbasis Jaringan Saraf
Cara menginstal
cd tools
make KALDI=/path/to/kaldi TOOL=/path/to/save/tools
Fitur utama
Corpus
Asr
- Aishell-1
- Aishell-2
- Ami
- CSJ
- Laborotvspeech
- Librispeech
- Switchboard (+Fisher)
- Tedlium2/tedlium3
- Timit
- WSJ
LM
- Penn Tree Bank
- Wikutsext2
Front-end
- Bingkai penumpukan
- Jaringan Ringkasan Urutan [tautan]
- Specaugment [tautan]
- Specaugment adaptif [tautan]
Encoder
- Encoder RNN
- (Cnn-) blstm, (cnn-) lstm, (cnn-) blgru, (cnn-) lgru
- Brnn yang dikendalikan latensi [tautan]
- Random State Passing (RSP) [tautan]
- Transformer Encoder [tautan]
- Mekanisme hopping chunk [tautan]
- Pengkodean posisi relatif [tautan]
- Topeng kausal
- Encoder konformer [tautan]
- Time-Depth terpisah (TDS) convolution encoder [tautan] [LINE]
- Encoder CNN Gated (Glu) [tautan]
Decoder Klasifikasi Temporal (CTC) Connectionist
- Pencarian balok
- Fusi dangkal
- Alignment paksa
Decoder RNN-Transducer (RNN-T) [tautan]
- Pencarian balok
- Fusi dangkal
Decoder Berbasis Perhatian
- Dekoder RNN
- Fusi dangkal
- Fusion Dingin [tautan]
- Fusion dalam [tautan]
- Decoding Perhatian Maju Maju [Tautan]
- Decoding ensemble
- Estimasi LM internal [tautan]
- Jenis perhatian
- berbasis lokasi
- berbasis konten
- Produk Dot
- Perhatian GMM
- Streaming rnn decoder spesifik
- Perhatian Monotonik Hard [Link]
- Monotonic Chunkwise Attention (Mocha) [Link]
- Pelatihan tertunda (decot) [tautan]
- Pelatihan Latensi Minimum (MinLT) [Tautan]
- Pelatihan CTC-Synchronous (CTC-ST) [tautan]
- Transformer Decoder [tautan]
- Streaming Transformer Decoder Spesifik
- Monotonic Multihead Attention [Link] [Link]
Model Bahasa (LM)
- RNNLM (model bahasa jaringan saraf berulang)
- Gated Convolutional LM [tautan]
- Transformer LM
- Transformer-xl LM [tautan]
- Adaptif Softmax [Link]
Unit output
- Fonem
- Grapheme
- Wordpiece (BPE, kalimat)
- Kata
- Campuran Word-Char
Pembelajaran multi-tugas (MTL)
Pembelajaran multi-tugas (MTL) dengan unit yang berbeda didukung untuk mengurangi jarang data.
- Hibrida CTC/Perhatian [tautan]
- Perhatian hierarkis (misalnya, perhatian kata + perhatian karakter) [tautan]
- Hierarkis CTC (misalnya, kata CTC + karakter CTC) [tautan]
- Hierarkis CTC + Perhatian (misalnya, perhatian kata + karakter CTC) [tautan]
- Perhatian ke depan [tautan]
- Tujuan LM
Kinerja ASR
Aishell-1 (cer)
| Model | dev | tes |
|---|
| Las konformer | 4.1 | 4.5 |
| Transformator | 5.0 | 5.4 |
| Streaming MMA | 5.5 | 6.1 |
Aishell-2 (CER)
| Model | test_android | test_ios | test_mic |
|---|
| Las konformer | 6.1 | 5.5 | 5.9 |
CSJ (WER)
| Model | Eval1 | Eval2 | evaluasi3 |
|---|
| Las konformer | 5.7 | 4.4 | 4.9 |
| Blstm las | 6.5 | 5.1 | 5.6 |
| LC-BLSTM MOCHA | 7.4 | 5.6 | 6.4 |
Switchboard 300h (WER)
| Model | Swb | Ch |
|---|
| Blstm las | 9.1 | 18.8 |
Switchboard+Fisher 2000h (WER)
| Model | Swb | Ch |
|---|
| Blstm las | 7.8 | 13.8 |
Laborotvspeech (CER)
| Model | dev_4k | dev | TEDX-JP-10K |
|---|
| Las konformer | 7.8 | 10.1 | 12.4 |
Librispeech (WER)
| Model | devle-clean | dev-lainnya | Tes-Clean | tes-lainnya |
|---|
| Las konformer | 1.9 | 4.6 | 2.1 | 4.9 |
| Transformator | 2.1 | 5.3 | 2.4 | 5.7 |
| Blstm las | 2.5 | 7.2 | 2.6 | 7.5 |
| Blstm rnn-t | 2.9 | 8.5 | 3.2 | 9.0 |
| UNILSTM RNN-T | 3.7 | 11.7 | 4.0 | 11.6 |
| UNILSTM MOCHA | 4.1 | 11.0 | 4.2 | 11.2 |
| LC-BLSTM RNN-T | 3.3 | 9.8 | 3.5 | 10.2 |
| LC-BLSTM MOCHA | 3.3 | 8.8 | 3.5 | 9.1 |
| Streaming MMA | 2.5 | 6.9 | 2.7 | 7.1 |
Tedlium2 (wer)
| Model | dev | tes |
|---|
| Las konformer | 7.0 | 6.8 |
| Blstm las | 8.1 | 7.5 |
| LC-BLSTM RNN-T | 8.0 | 7.7 |
| LC-BLSTM MOCHA | 10.3 | 8.6 |
| UNILSTM RNN-T | 10.7 | 10.7 |
| UNILSTM MOCHA | 13.5 | 11.6 |
WSJ (WER)
| Model | test_dev93 | test_eval92 |
|---|
| Blstm las | 8.8 | 6.2 |
Kinerja lm
Penn Tree Bank (PPL)
| Model | sah | tes |
|---|
| Rnnlm | 87.99 | 86.06 |
| + cache = 100 | 79.58 | 79.12 |
| + cache = 500 | 77.36 | 76.94 |
Wikuxext2 (ppl)
| Model | sah | tes |
|---|
| Rnnlm | 104.53 | 98.73 |
| + cache = 100 | 90.86 | 85.87 |
| + cache = 2000 | 76.10 | 72.77 |
Referensi
- https://github.com/kaldi-asr/kaldi
- https://github.com/espnet/espnet
- https://github.com/awni/speech
- https://github.com/hawkaaron/e2e-asr
Ketergantungan
- https://github.com/seannaren/warp-ctc
- https://github.com/hawkaaron/warp-transducer
- https://github.com/1ytic/warp-rnnt