End_2_End_Automatic_Speech_Recognition_For_Gujarati下載End_2_End_Automatic_Speech_Recognition_For

End_2_End_Automatic_Speech_Recognition_For_Gujarati

Ai源碼

1.0.0

下載

古吉拉特語的端到端自動語音識別

圖標2020：第17屆自然語言處理國際會議

[紙] | [長口腔談話]

Deepang Raval ¹ | Vyom Pathak ¹ | Muktan Patel ¹ | Brijesh Bhatt ¹

佛法大學（Dharmsinh Desai University ^）

我們提出了一種新穎的方法，用於提高古吉拉特語語言的端到端語音識別系統的性能。我們遵循一種基於深度學習的方法，該方法包括卷積神經網絡（CNN），雙向長期記憶（BilstM）層，密集層和連接派時間分類（CTC）作為損失函數。為了提高數據集大小的系統性能，我們提出了一個基於基於變形金剛（BERT）基於變形金剛（BERT）後處理技術的基於基於的語言模型（WLM和CLM）的前綴解碼技術和雙向編碼器表示。為了從我們的自動語音識別（ASR）系統中獲得關鍵見解，我們提出了不同的分析方法。這些見解有助於根據特定語言（古吉拉特語）理解我們的ASR系統，並可以控制ASR系統以改善低資源語言的性能。我們已經在Microsoft語音語料庫上訓練了該模型，並且相對於基本模型WER，單詞錯誤率（WER）降低了5.11％。

如果您覺得這項工作有用，請使用以下Bibtex引用這項工作：

 @inproceedings { raval-etal-2020-end ,
    title = " End-to-End Automatic Speech Recognition for {G}ujarati " ,
    author = " Raval, Deepang  and
      Pathak, Vyom  and
      Patel, Muktan  and
      Bhatt, Brijesh " ,
    booktitle = " Proceedings of the 17th International Conference on Natural Language Processing (ICON) " ,
    month = dec,
    year = " 2020 " ,
    address = " Indian Institute of Technology Patna, Patna, India " ,
    publisher = " NLP Association of India (NLPAI) " ,
    url = " https://aclanthology.org/2020.icon-main.56 " ,
    pages = " 409--419 " ,
    abstract = "We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning based approach which includes Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory (BiLSTM) layers, Dense layers, and Connectionist Temporal Classification (CTC) as a loss function. In order to improve the performance of the system with the limited size of the dataset, we present a combined language model (WLM and CLM) based prefix decoding technique and Bidirectional Encoder Representations from Transformers (BERT) based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we proposed different analysis methods. These insights help to understand our ASR system based on a particular language (Gujarati) as well as can govern ASR systems{'} to improve the performance for low resource languages. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.11{%} decrease in Word Error Rate (WER) with respect to base-model WER.",
}

設定

系統和要求

Linux OS
Python-3.6
TensorFlow-2.2.0
CUDA-11.1
Cudnn-7.6.5

設置存儲庫

git clone https://github.com/01-vyom/End_2_End_Automatic_Speech_Recognition_For_Gujarati.git
python -m venv asr_env
source $PWD /asr_env/bin/activate

安裝依賴項

將目錄更改為存儲庫的根。

pip install --upgrade pip
pip install -r requirements.txt

運行代碼

將目錄更改為存儲庫的根。

訓練

要在論文中訓練模型，請運行此命令：

python ./Train/train.py

筆記：

如果需要，請在train/features_extractor.py文件中更改變量PathDataAudios和PathDataTranscripts ，以指向音頻文件的適當路徑和trascript文件的路徑。
如果需要，請更改train/train.py文件中的可變currmodel ，以更改所保存的模型名稱。

評估

推理

使用經過訓練的模型進行推理，運行：

python ./Eval/inference.py

筆記：

更改變量PathDataAudios和PathDataTranscripts ，以指向音頻文件的適當路徑，並指向trascript文件進行測試的路徑。
要更改用於推斷的模型的名稱，請更改變量model ，然後更改用於測試的文件名，請更改test_data變量。
輸出將是帶有模型特定名稱的.pickle文獻和假設./Eval/

解碼

要解碼推斷的輸出，請運行：

python ./Eval/decode.py

筆記：

選擇特定於模型的.pickle更改model變量。
該輸出將存儲在./Eval/中，該模型具有所有類型的解碼和實際文本的模型。

後處理

要進行後處理解碼的輸出，請遵循此讀數中提到的步驟。

系統分析

要執行系統分析，請運行：

python ./System Analysis/system_analysis.py

筆記：

要選擇特定於模型的解碼.csv文件進行分析，請更改model變量。
要選擇特定類型的列（假設類型）進行分析，請更改type變量。輸出文件將保存在./System Analysis/特定於模型和解碼類型中。

結果

我們的算法實現了以下性能：

技術名稱	減少（％）
LMS的前綴	2.42
LMS的前綴	5.11

筆記：

這些降低是貪婪的解碼。

致謝

前綴解碼代碼基於1和2個開源實現。基於BERT的咒語校正器的代碼是根據此開源實現的

根據MIT許可獲得許可。

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-08
大小 19.44MB
來自於 Github

相關應用

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
nextcloud_share_url_downloader

2024-11-01
狗_狐狸_兔子

2022-08-01
麗華資料分析引擎免費版3.0_搜尋_導航_採集_輿情_排行_api

2022-06-28

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部