ดาวน์โหลด klaam - ดาวน์โหลดซอร์สโค้ด klaam

klaam

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

Klaam

การรู้จำเสียงพูดภาษาอาหรับการจำแนกและข้อความเป็นคำพูดโดยใช้โมเดลขั้นสูงมากมายเช่น Wave2VEC และ FastSpeech2 ที่เก็บนี้ช่วยให้การฝึกอบรมและการทำนายโดยใช้แบบจำลองที่ผ่านการฝึกอบรม

1. การใช้งาน

1.1 การจำแนกคำพูด

 from klaam import SpeechClassification
model = SpeechClassification ()
model . classify ( wav_file )

1.2 การร้องเพลงใหม่

 from klaam import SpeechRecognition
model = SpeechRecognition ()
model . transcribe ( wav_file )

1.3 ข้อความถึงการพูด

 from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech ( prepare_tts_model_path , model_config_path , train_config_path , vocoder_config_path , speaker_pre_trained_path )

model . synthesize ( sample_text )

มีสองรุ่นที่ใช้งานได้สำหรับการจดจำ trageting arabic มาตรฐานสมัยใหม่ (MSA) และภาษาอียิปต์ (อียิปต์) คุณสามารถตั้งค่าใด ๆ โดยใช้แอตทริบิวต์ lang

 from klaam import SpeechRecognition
model = SpeechRecognition ( lang = 'msa' )
model . transcribe ( 'file.wav' )

2. ชุดข้อมูล

ชุดข้อมูล	คำอธิบาย	การเชื่อมโยง
MGB-3	การรับรู้การพูดภาษาอาหรับอียิปต์ในป่า ทุกประโยคมีคำอธิบายประกอบโดยผู้อธิบายสี่คน มีการรวบรวมมากกว่า 15 ชั่วโมงจาก YouTube	ที่นี่ [จำเป็นต้องลงทะเบียน]
adi-5	มากกว่า 50 ชั่วโมงที่รวบรวมจาก Aljazeera TV 4 ภาษาถิ่นในภูมิภาค: อียิปต์ (อียิปต์), Levantine (LAV), อ่าว (GLF), แอฟริกาเหนือ (NOR) และอาหรับมาตรฐานสมัยใหม่ (MSA) ชุดข้อมูลนี้เป็นส่วนหนึ่งของความท้าทาย MGB-3	ที่นี่ [จำเป็นต้องลงทะเบียน]
เสียงทั่วไป	ชุดข้อมูลหลายภาษา avilable บน huggingface	ที่นี่.
คลังคำพูดภาษาอาหรับ	ชุดข้อมูลภาษาอาหรับที่มีการจัดตำแหน่งและการถอดความ	ที่นี่.

3. รุ่น

ปัจจุบันโครงการของเรารองรับสี่รุ่นสามรุ่นนั้นสามารถใช้งานได้ในหม้อแปลง

ภาษา	คำอธิบาย	แหล่งที่มา
เกี่ยวกับชาวอียิปต์	การรู้จำเสียงพูด	WAV2VEC2-Large-XLSR-53-Arabic-Egyptian
ภาษาอาหรับมาตรฐาน	การรู้จำเสียงพูด	WAV2VEC2-Large-XLSR-53-arabic
Egy, Nor, Lav, GLF, MSA	การจำแนกคำพูด	WAV2VEC2-Large-XLSR-Dialect-classification
ภาษาอาหรับมาตรฐาน	ข้อความเป็นคำพูด	FastSpeech2

4. ตัวอย่างสมุดบันทึก

ชื่อ	คำอธิบาย	สมุดบันทึก
การสาธิต	การจำแนกประเภทการจัดเรียงและข้อความเป็นคำพูดในรหัสไม่กี่บรรทัด
สาธิตด้วยไมค์	การจัดเรียงเสียงและการจำแนกประเภทด้วยการบันทึก

5. การฝึกอบรม

สคริปต์เป็นการดัดแปลงของ Jqueguiner/WAV2VEC2-SPRINT

5.1. การจำแนกประเภท

สคริปต์นี้ใช้สำหรับงานการจำแนกประเภทใน 5 คลาส

python run_classifier.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

5.2. การยอมรับ

สคริปต์นี้ใช้สำหรับการฝึกอบรมในชุดข้อมูลสำหรับการเตรียมการในชุดข้อมูลภาษา Egyption

python run_mgb3.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

สคริปต์นี้สามารถใช้สำหรับการฝึกด้วยเสียงทั่วไปภาษาอาหรับ

python run_common_voice.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --dataset_config_name= " ar " 
    --output_dir=/path/to/output/ 
    --cache_dir=/path/to/cache 
    --overwrite_output_dir 
    --num_train_epochs= " 1 " 
    --per_device_train_batch_size= " 32 " 
    --per_device_eval_batch_size= " 32 " 
    --evaluation_strategy= " steps " 
    --learning_rate= " 3e-4 " 
    --warmup_steps= " 500 " 
    --fp16 
    --freeze_feature_extractor 
    --save_steps= " 10 " 
    --eval_steps= " 10 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 10 " 
    --group_by_length 
    --feat_proj_dropout= " 0.0 " 
    --layerdrop= " 0.1 " 
    --gradient_checkpointing 
    --do_train --do_eval 
    --max_train_samples 100 --max_val_samples 100

5.3. ส่งข้อความถึงการพูด

เราใช้การใช้ Pytorch ของ FastSpeech2 โดย Ming024

ขั้นตอนดังต่อไปนี้:

ดาวน์โหลดชุดข้อมูลและคลายซิป

 wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip
unzip arabic-speech-corpus.zip

สร้างหลายไดเรกทอรีสำหรับข้อมูล

 mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

เตรียมข้อมูลเมตา

 import os
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os . listdir ( f' { base_dir } /lab' ):
  lines . append ( lab_file [: - 4 ] + '|' + open ( f' { base_dir } /lab/ { lab_file } ' , 'r' ). read ())


open ( f' { base_dir } /metadata.csv' , 'w' ). write (( ' n ' ). join ( lines ))

โคลนพื้นที่เก็บข้อมูลของฉัน (FastSpeech2) และติดตั้งการพึ่งพาที่จำเป็น

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

จัดทำการจัดตำแหน่งและข้อมูลที่ได้รับการออกแบบล่วงหน้า

 python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

unzip vocoders

 unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

เริ่มการฝึกอบรม

 python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

ที่เก็บนี้ถูกสร้างขึ้นโดยทีม ARBML หากคุณมีข้อเสนอแนะหรือการบริจาคอย่าลังเลที่จะทำการร้องขอการดึง

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-08-21
ขนาด 134.33MB
มาจาก Github

แอปที่เกี่ยวข้อง

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด