End_2_End_Automatic_Speech_Recognition_For_Gujarati下载End_2_End_Automatic_Speech_Recognition_For

End_2_End_Automatic_Speech_Recognition_For_Gujarati

Ai源码

1.0.0

下载

古吉拉特语的端到端自动语音识别

图标2020：第17届自然语言处理国际会议

[纸] | [长口腔谈话]

Deepang Raval ¹ | Vyom Pathak ¹ | Muktan Patel ¹ | Brijesh Bhatt ¹

佛法大学（Dharmsinh Desai University ^）

我们提出了一种新颖的方法，用于提高古吉拉特语语言的端到端语音识别系统的性能。我们遵循一种基于深度学习的方法，该方法包括卷积神经网络（CNN），双向长期记忆（BilstM）层，密集层和连接派时间分类（CTC）作为损失函数。为了提高数据集大小的系统性能，我们提出了一个基于基于变形金刚（BERT）基于变形金刚（BERT）后处理技术的基于基于的语言模型（WLM和CLM）的前缀解码技术和双向编码器表示。为了从我们的自动语音识别（ASR）系统中获得关键见解，我们提出了不同的分析方法。这些见解有助于根据特定语言（古吉拉特语）理解我们的ASR系统，并可以控制ASR系统以改善低资源语言的性能。我们已经在Microsoft语音语料库上训练了该模型，并且相对于基本模型WER，单词错误率（WER）降低了5.11％。

如果您觉得这项工作有用，请使用以下Bibtex引用这项工作：

 @inproceedings { raval-etal-2020-end ,
    title = " End-to-End Automatic Speech Recognition for {G}ujarati " ,
    author = " Raval, Deepang  and
      Pathak, Vyom  and
      Patel, Muktan  and
      Bhatt, Brijesh " ,
    booktitle = " Proceedings of the 17th International Conference on Natural Language Processing (ICON) " ,
    month = dec,
    year = " 2020 " ,
    address = " Indian Institute of Technology Patna, Patna, India " ,
    publisher = " NLP Association of India (NLPAI) " ,
    url = " https://aclanthology.org/2020.icon-main.56 " ,
    pages = " 409--419 " ,
    abstract = "We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning based approach which includes Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory (BiLSTM) layers, Dense layers, and Connectionist Temporal Classification (CTC) as a loss function. In order to improve the performance of the system with the limited size of the dataset, we present a combined language model (WLM and CLM) based prefix decoding technique and Bidirectional Encoder Representations from Transformers (BERT) based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we proposed different analysis methods. These insights help to understand our ASR system based on a particular language (Gujarati) as well as can govern ASR systems{'} to improve the performance for low resource languages. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.11{%} decrease in Word Error Rate (WER) with respect to base-model WER.",
}

设置

系统和要求

Linux OS
Python-3.6
TensorFlow-2.2.0
CUDA-11.1
Cudnn-7.6.5

设置存储库

git clone https://github.com/01-vyom/End_2_End_Automatic_Speech_Recognition_For_Gujarati.git
python -m venv asr_env
source $PWD /asr_env/bin/activate

安装依赖项

将目录更改为存储库的根。

pip install --upgrade pip
pip install -r requirements.txt

运行代码

将目录更改为存储库的根。

训练

要在论文中训练模型，请运行此命令：

python ./Train/train.py

笔记：

如果需要，请在train/features_extractor.py文件中更改变量PathDataAudios和PathDataTranscripts ，以指向音频文件的适当路径和trascript文件的路径。
如果需要，请更改train/train.py文件中的可变currmodel ，以更改所保存的模型名称。

评估

推理

使用经过训练的模型进行推理，运行：

python ./Eval/inference.py

笔记：

更改变量PathDataAudios和PathDataTranscripts ，以指向音频文件的适当路径，并指向trascript文件进行测试的路径。
要更改用于推断的模型的名称，请更改变量model ，然后更改用于测试的文件名，请更改test_data变量。
输出将是带有模型特定名称的.pickle文献和假设./Eval/

解码

要解码推断的输出，请运行：

python ./Eval/decode.py

笔记：

选择特定于模型的.pickle更改model变量。
该输出将存储在./Eval/中，该模型具有所有类型的解码和实际文本的模型。

后处理

要进行后处理解码的输出，请遵循此读数中提到的步骤。

系统分析

要执行系统分析，请运行：

python ./System Analysis/system_analysis.py

笔记：

要选择特定于模型的解码.csv文件进行分析，请更改model变量。
要选择特定类型的列（假设类型）进行分析，请更改type变量。输出文件将保存在./System Analysis/特定于模型和解码类型中。

结果

我们的算法实现了以下性能：

技术名称	减少（％）
LMS的前缀	2.42
LMS的前缀	5.11

笔记：

这些降低是贪婪的解码。

致谢

前缀解码代码基于1和2个开源实现。基于BERT的咒语校正器的代码是根据此开源实现的

根据MIT许可获得许可。

展开

附加信息

版本 1.0.0
类型 Ai源码
更新时间 2025-09-08
大小 19.44MB
来自于 Github

End_2_End_Automatic_Speech_Recognition_For_Gujarati

古吉拉特语的端到端自动语音识别

图标2020：第17届自然语言处理国际会议

[纸] | [长口腔谈话]

设置

系统和要求

设置存储库

安装依赖项

运行代码

训练

评估

推理

解码

后处理

系统分析

结果

致谢

OpenCore_NO_ACPI_Build

nspanel_pro_tools_apk

zkwork_aleo_gpu_worker

nextcloud_share_url_downloader

狗_狐狸_兔子

丽华数据分析引擎免费版3.0_搜索_导航_采集_舆情_排行_api

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express