klaam下載klaam源代碼下載

klaam

Ai源碼

1.0.0

下載

克拉姆

阿拉伯語語音識別，分類和文本到語音使用了許多高級模型，例如Wave2Vec和FastSpeech2。該存儲庫允許使用驗證的模型進行培訓和預測。

1。用法

1.1語音分類

 from klaam import SpeechClassification
model = SpeechClassification ()
model . classify ( wav_file )

1.2語音重新處理

 from klaam import SpeechRecognition
model = SpeechRecognition ()
model . transcribe ( wav_file )

1.3語音文字

 from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech ( prepare_tts_model_path , model_config_path , train_config_path , vocoder_config_path , speaker_pre_trained_path )

model . synthesize ( sample_text )

有兩種可識別現代標準阿拉伯語（MSA）和埃及方言（EGY）的可識別模型。您可以使用lang屬性設置其中的任何一個。

 from klaam import SpeechRecognition
model = SpeechRecognition ( lang = 'msa' )
model . transcribe ( 'file.wav' )

2。數據集

數據集	描述	關聯
MGB-3	埃及阿拉伯語言語在野外識別。每個句子都由四個註釋者註釋。從YouTube收集了超過15個小時。	這裡[需要註冊]
adi-5	從Aljazeera電視收集了50多小時。 4區域方言：埃及（Egy），黎凡特（LAV），海灣（GLF），北非（NOR）和現代標準阿拉伯語（MSA）。該數據集是MGB-3挑戰的一部分。	這裡[需要註冊]
常見的聲音	在擁抱面上可用的多語言數據集	這裡。
阿拉伯語語料庫	帶有對齊和轉錄的阿拉伯數據集	這裡。

3。型號

我們的項目目前支持四種型號，其中三種是在變壓器上可用的。

語言	描述	來源
埃及人	語音識別	WAV2VEC2-LARGE-XLSR-53-阿拉伯 - 埃及人
標準阿拉伯語	語音識別	WAV2VEC2-LARGE-XLSR-53-阿拉伯語
egy，nor，lav，glf，msa	語音分類	wav2Vec2-large-XLSR-dialect-classiencation
標準阿拉伯語	文本到語音	FastSpeech2

4。示例筆記本

姓名	描述	筆記本
演示	分類，重新調整和文本到語音的代碼。
與麥克風的演示	音頻重新配置和記錄分類。

5。培訓

這些腳本是JQUEGUINER/WAV2VEC2-SPRINT的修改。

5.1。分類

此腳本用於5個類的分類任務。

python run_classifier.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

5.2。認出

該腳本用於在數據集上進行培訓，以便在埃及方言數據集上進行預處理。

python run_mgb3.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

該腳本可用於阿拉伯語通用語音培訓

python run_common_voice.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --dataset_config_name= " ar " 
    --output_dir=/path/to/output/ 
    --cache_dir=/path/to/cache 
    --overwrite_output_dir 
    --num_train_epochs= " 1 " 
    --per_device_train_batch_size= " 32 " 
    --per_device_eval_batch_size= " 32 " 
    --evaluation_strategy= " steps " 
    --learning_rate= " 3e-4 " 
    --warmup_steps= " 500 " 
    --fp16 
    --freeze_feature_extractor 
    --save_steps= " 10 " 
    --eval_steps= " 10 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 10 " 
    --group_by_length 
    --feat_proj_dropout= " 0.0 " 
    --layerdrop= " 0.1 " 
    --gradient_checkpointing 
    --do_train --do_eval 
    --max_train_samples 100 --max_val_samples 100

5.3。文字到語音

我們使用Ming024的fastspeech2實現。

該過程如下：

下載數據集並解壓縮。

 wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip
unzip arabic-speech-corpus.zip

 mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

準備元數據

 import os
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os . listdir ( f' { base_dir } /lab' ):
  lines . append ( lab_file [: - 4 ] + '|' + open ( f' { base_dir } /lab/ { lab_file } ' , 'r' ). read ())


open ( f' { base_dir } /metadata.csv' , 'w' ). write (( ' n ' ). join ( lines ))

克隆我的存儲庫（fastspeech2）並安裝所需的依賴關係。

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

準備對齊和預先處理數據。

 python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

解壓縮聲碼編碼器。

 unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

開始訓練。

 python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

該存儲庫是由ARBML團隊創建的。如果您有任何建議或貢獻，請隨時提出拉。

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-08-21
大小 134.33MB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部