klaam 다운로드 klaam 소스 코드 다운로드

klaam

AI 소스 코드

1.0.0

다운로드

Klaam

Wave2Vec 및 FastSpeech2와 같은 많은 고급 모델을 사용한 아랍어 음성 인식, 분류 및 텍스트 음성 연설. 이 저장소를 사용하면 사전 각인 모델을 사용하여 교육 및 예측이 가능합니다.

1. 사용법

1.1 음성 분류

 from klaam import SpeechClassification
model = SpeechClassification ()
model . classify ( wav_file )

1.2 연설 재구성

 from klaam import SpeechRecognition
model = SpeechRecognition ()
model . transcribe ( wav_file )

1.3 텍스트 연설

 from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech ( prepare_tts_model_path , model_config_path , train_config_path , vocoder_config_path , speaker_pre_trained_path )

model . synthesize ( sample_text )

현대 표준 아랍어 (MSA)와 이집트 방언 (EGY)을 인식하기위한 두 가지 열렬한 모델이 있습니다. lang 속성을 사용하여 중 하나를 설정할 수 있습니다.

 from klaam import SpeechRecognition
model = SpeechRecognition ( lang = 'msa' )
model . transcribe ( 'file.wav' )

2. 데이터 세트

데이터 세트	설명	링크
MGB-3	야생에서의 이집트 아랍어 음성 인식. 모든 문장에는 4 명의 주석이 주석을 달았습니다. YouTube에서 15 시간 이상을 수집했습니다.	여기 [등록 필수]
adi-5	Aljazeera TV에서 50 시간 이상을 수집했습니다. 4 지역 변증법 : 이집트 (EGY), 레반틴 (LAV), 걸프 (GLF), 북아프리카 (NOR) 및 현대 표준 아랍어 (MSA). 이 데이터 세트는 MGB-3 챌린지의 일부입니다.	여기 [등록 필수]
일반적인 목소리	Huggingface에서는 멀티 언어 데이터 세트	여기.
아랍어 연설 코퍼스	정렬 및 전사가있는 아랍어 데이터 세트	여기.

3. 모델

우리의 프로젝트는 현재 4 가지 모델을 지원하며 그 중 3 개는 변압기에서 열렬합니다.

언어	설명	원천
이집트 사람	음성 인식	WAV2VEC2-LARGE-XLSR-53-ARABIC-EGYPTIAN
표준 아랍어	음성 인식	WAV2VEC2-LARGE-XLSR-53-ARABIC
EGY, NOR, LAV, GLF, MSA	언어 분류	WAV2VEC2-LARGE-XLSR-DIALECT CLASSIFING
표준 아랍어	텍스트 음성	FastSpeech2

4. 예제 노트

이름	설명	공책
데모	몇 줄의 코드 라인에서 분류, 회수 및 텍스트 음성.
마이크가있는 데모	녹음을 통한 오디오 회수 및 분류.

5. 훈련

스크립트는 JqueGuiner/WAV2VEC2-Sprint의 수정입니다.

5.1. 분류

이 스크립트는 5 개의 클래스의 분류 작업에 사용됩니다.

python run_classifier.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

5.2. 인식

이 스크립트는 Egyption Dialects 데이터 세트의 사전 여지가있는 데이터 세트에 대한 교육을위한 것입니다.

python run_mgb3.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --output_dir=/path/to/output 
    --cache_dir=/path/to/cache/ 
    --freeze_feature_extractor 
    --num_train_epochs= " 50 " 
    --per_device_train_batch_size= " 32 " 
    --preprocessing_num_workers= " 1 " 
    --learning_rate= " 3e-5 " 
    --warmup_steps= " 20 " 
    --evaluation_strategy= " steps " 
    --save_steps= " 100 " 
    --eval_steps= " 100 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 100 " 
    --do_eval 
    --do_train

이 스크립트는 아랍어 공통 음성 훈련에 사용될 수 있습니다.

python run_common_voice.py 
    --model_name_or_path= " facebook/wav2vec2-large-xlsr-53 " 
    --dataset_config_name= " ar " 
    --output_dir=/path/to/output/ 
    --cache_dir=/path/to/cache 
    --overwrite_output_dir 
    --num_train_epochs= " 1 " 
    --per_device_train_batch_size= " 32 " 
    --per_device_eval_batch_size= " 32 " 
    --evaluation_strategy= " steps " 
    --learning_rate= " 3e-4 " 
    --warmup_steps= " 500 " 
    --fp16 
    --freeze_feature_extractor 
    --save_steps= " 10 " 
    --eval_steps= " 10 " 
    --save_total_limit= " 1 " 
    --logging_steps= " 10 " 
    --group_by_length 
    --feat_proj_dropout= " 0.0 " 
    --layerdrop= " 0.1 " 
    --gradient_checkpointing 
    --do_train --do_eval 
    --max_train_samples 100 --max_val_samples 100

5.3. 연설에 텍스트

우리는 Ming024의 FastSpeech2의 Pytorch 구현을 사용합니다.

절차는 다음과 같습니다.

데이터 세트를 다운로드하고 압축을 풀어주십시오.

 wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip
unzip arabic-speech-corpus.zip

데이터에 대한 여러 디렉토리를 만듭니다

 mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

메타 데이터를 준비하십시오

 import os
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os . listdir ( f' { base_dir } /lab' ):
  lines . append ( lab_file [: - 4 ] + '|' + open ( f' { base_dir } /lab/ { lab_file } ' , 'r' ). read ())


open ( f' { base_dir } /metadata.csv' , 'w' ). write (( ' n ' ). join ( lines ))

내 저장소 (FastSpeech2)를 복제하고 필요한 종속성을 설치하십시오.

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

정렬 및 준비된 데이터를 준비하십시오.

 python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

압축 보코더.

 unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

훈련을 시작하십시오.

 python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

이 저장소는 ARBML 팀에 의해 만들어졌습니다. 제안이나 기여가 있으면 풀 요청을 자유롭게 요청하십시오.

확장하다

추가 정보

버전 1.0.0
유형 AI 소스 코드
업데이트 시간 2025-08-21
크기 134.33MB
출처 Github

klaam

Klaam

1. 사용법

1.1 음성 분류

1.2 연설 재구성

1.3 텍스트 연설

2. 데이터 세트

3. 모델

4. 예제 노트

5. 훈련

5.1. 분류

5.2. 인식

5.3. 연설에 텍스트

ML stack

awesome free chatgpt

pywin_contextmenu

promptl

tick.chat

FastLoRAChat

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express