klaam下载klaam源代码下载

klaam

Ai源码

1.0.0

下载

克拉姆

阿拉伯语语音识别，分类和文本到语音使用了许多高级模型，例如Wave2Vec和FastSpeech2。该存储库允许使用验证的模型进行培训和预测。

1。用法

1.1语音分类

 from klaam import SpeechClassification
model = SpeechClassification ()
model . classify ( wav_file )

1.2语音重新处理

 from klaam import SpeechRecognition
model = SpeechRecognition ()
model . transcribe ( wav_file )

1.3语音文字

 from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech ( prepare_tts_model_path , model_config_path , train_config_path , vocoder_config_path , speaker_pre_trained_path )

model . synthesize ( sample_text )

有两种可识别现代标准阿拉伯语（MSA）和埃及方言（EGY）的可识别模型。您可以使用lang属性设置其中的任何一个。

 from klaam import SpeechRecognition
model = SpeechRecognition ( lang = 'msa' )
model . transcribe ( 'file.wav' )

2。数据集

数据集	描述	关联
MGB-3	埃及阿拉伯语言语在野外识别。每个句子都由四个注释者注释。从YouTube收集了超过15个小时。	这里[需要注册]
adi-5	从Aljazeera电视收集了50多小时。 4区域方言：埃及（Egy），黎凡特（LAV），海湾（GLF），北非（NOR）和现代标准阿拉伯语（MSA）。该数据集是MGB-3挑战的一部分。	这里[需要注册]
常见的声音	在拥抱面上可用的多语言数据集	这里。
阿拉伯语语料库	带有对齐和转录的阿拉伯数据集	这里。

3。型号

我们的项目目前支持四种型号，其中三种是在变压器上可用的。

语言	描述	来源
埃及人	语音识别	WAV2VEC2-LARGE-XLSR-53-阿拉伯 - 埃及人
标准阿拉伯语	语音识别	WAV2VEC2-LARGE-XLSR-53-阿拉伯语
egy，nor，lav，glf，msa	语音分类	wav2Vec2-large-XLSR-dialect-classiencation
标准阿拉伯语	文本到语音	FastSpeech2

4。示例笔记本

姓名	描述	笔记本
演示	分类，重新调整和文本到语音的代码。
与麦克风的演示	音频重新配置和记录分类。

5。培训

这些脚本是JQUEGUINER/WAV2VEC2-SPRINT的修改。

5.1。分类

此脚本用于5个类的分类任务。

python run_classifier.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

5.2。认出

该脚本用于在数据集上进行培训，以便在埃及方言数据集上进行预处理。

python run_mgb3.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

该脚本可用于阿拉伯语通用语音培训

python run_common_voice.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --dataset_config_name= " ar " 
    --output_dir=/path/to/output/ 
    --cache_dir=/path/to/cache 
    --overwrite_output_dir 
    --num_train_epochs= " 1 " 
    --per_device_train_batch_size= " 32 " 
    --per_device_eval_batch_size= " 32 " 
    --evaluation_strategy= " steps " 
    --learning_rate= " 3e-4 " 
    --warmup_steps= " 500 " 
    --fp16 
    --freeze_feature_extractor 
    --save_steps= " 10 " 
    --eval_steps= " 10 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 10 " 
    --group_by_length 
    --feat_proj_dropout= " 0.0 " 
    --layerdrop= " 0.1 " 
    --gradient_checkpointing 
    --do_train --do_eval 
    --max_train_samples 100 --max_val_samples 100

5.3。文字到语音

我们使用Ming024的fastspeech2实现。

该过程如下：

下载数据集并解压缩。

 wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip
unzip arabic-speech-corpus.zip

 mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

准备元数据

 import os
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os . listdir ( f' { base_dir } /lab' ):
  lines . append ( lab_file [: - 4 ] + '|' + open ( f' { base_dir } /lab/ { lab_file } ' , 'r' ). read ())


open ( f' { base_dir } /metadata.csv' , 'w' ). write (( ' n ' ). join ( lines ))

克隆我的存储库（fastspeech2）并安装所需的依赖关系。

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

准备对齐和预先处理数据。

 python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

解压缩声码编码器。

 unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

开始训练。

 python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

该存储库是由ARBML团队创建的。如果您有任何建议或贡献，请随时提出拉。

展开

附加信息

版本 1.0.0
类型 Ai源码
更新时间 2025-08-21
大小 134.33MB
来自于 Github

klaam

克拉姆

1。用法

1.1语音分类

1.2语音重新处理

1.3语音文字

2。数据集

3。型号

4。示例笔记本

5。培训

5.1。分类

5.2。认出

5.3。文字到语音

ML stack

awesome free chatgpt

pywin_contextmenu

promptl

tick.chat

FastLoRAChat

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express