ดาวน์โหลด Automatic Speech Recognition - ดาวน์โหลดซอร์สโค้ด Automatic Speech Recognition

Automatic Speech Recognition

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

การรู้จำเสียงพูดอัตโนมัติ

เป้าหมายของโครงการคือการกลั่นการวิจัยการรู้จำเสียงพูดอัตโนมัติ ในตอนแรกคุณสามารถโหลดไปป์ไลน์พร้อมใช้งานด้วยรุ่นที่ผ่านการฝึกอบรมมาแล้ว ได้รับประโยชน์จาก Eager TensorFlow 2.0 และตรวจสอบน้ำหนักรุ่นการเปิดใช้งานหรือการไล่ระดับสีได้อย่างอิสระ

 import automatic_speech_recognition as asr

file = 'to/test/sample.wav'  # sample rate 16 kHz, and 16 bit depth
sample = asr . utils . read_audio ( file )
pipeline = asr . load ( 'deepspeech2' , lang = 'en' )
pipeline . model . summary ()     # TensorFlow model
sentences = pipeline . predict ([ sample ])

เราสนับสนุนภาษาอังกฤษ (ขอบคุณ SEQ2SEQ แบบเปิด) ผลการประเมินผลของเกณฑ์มาตรฐานภาษาอังกฤษ librispeech dev-clean อยู่ในตาราง ในการอ้างอิง DeepSpeech (Mozilla) ประสบความสำเร็จประมาณ 7.5% WER ในขณะที่ State-of-the-Art (RWTH Aachen University) เท่ากับ 2.3% (ผลการประเมินล่าสุดสามารถพบได้ที่นี่) ทั้งสองใช้โมเดลภาษาภายนอกเพื่อเพิ่มผลลัพธ์ จากการเปรียบเทียบ มนุษย์ จะได้รับ 5.83% ที่นี่ (Librispeech Dev-Clean)

ชื่อนางแบบ	ตัวถอดรหัส	wer-dev
`deepspeech2`	โลภ	6.71

ในไม่ช้ามันกลับกลายเป็นว่าคุณต้องปรับท่อเล็กน้อย ลองดูที่ท่อ CTC ไปป์ไลน์มีหน้าที่ในการเชื่อมต่อโมเดลเครือข่ายประสาทกับการแปลงที่ไม่แตกต่างทั้งหมด (การสกัดคุณสมบัติหรือการถอดรหัสการทำนาย) ส่วนประกอบท่อมีความเป็นอิสระ คุณสามารถปรับให้เข้ากับความต้องการของคุณเช่นใช้การแยกคุณสมบัติที่ซับซ้อนมากขึ้นการเพิ่มข้อมูลที่แตกต่างกันหรือเพิ่มตัวถอดรหัสแบบจำลองภาษา (N-GRAMS แบบคงที่หรือหม้อแปลงขนาดใหญ่) คุณสามารถทำอะไรได้มากขึ้นเช่นแจกจ่ายการฝึกอบรมโดยใช้กลยุทธ์หรือทดสอบด้วยนโยบายความแม่นยำผสม

 import numpy as np
import tensorflow as tf
import automatic_speech_recognition as asr

dataset = asr . dataset . Audio . from_csv ( 'train.csv' , batch_size = 32 )
dev_dataset = asr . dataset . Audio . from_csv ( 'dev.csv' , batch_size = 32 )
alphabet = asr . text . Alphabet ( lang = 'en' )
features_extractor = asr . features . FilterBanks (
    features_num = 160 ,
    winlen = 0.02 ,
    winstep = 0.01 ,
    winfunc = np . hanning
)
model = asr . model . get_deepspeech2 (
    input_dim = 160 ,
    output_dim = 29 ,
    rnn_units = 800 ,
    is_mixed_precision = False
)
optimizer = tf . optimizers . Adam (
    lr = 1e-4 ,
    beta_1 = 0.9 ,
    beta_2 = 0.999 ,
    epsilon = 1e-8
)
decoder = asr . decoder . GreedyDecoder ()
pipeline = asr . pipeline . CTCPipeline (
    alphabet , features_extractor , model , optimizer , decoder
)
pipeline . fit ( dataset , dev_dataset , epochs = 25 )
pipeline . save ( '/checkpoint' )

test_dataset = asr . dataset . Audio . from_csv ( 'test.csv' )
wer , cer = asr . evaluate . calculate_error_rates ( pipeline , test_dataset )
print ( f'WER: { wer }   CER: { cer } ' )

การติดตั้ง

คุณสามารถใช้ PIP:

pip install automatic-speech-recognition

มิฉะนั้นโคลนรหัสและสร้างสภาพแวดล้อมใหม่ผ่าน conda:

git clone https://github.com/rolczynski/Automatic-Speech-Recognition.git
conda env create -f=environment.yml     # or use: environment-gpu.yml
conda activate Automatic-Speech-Recognition

การอ้างอิง

ที่เก็บพื้นฐาน:

Baidu - DeepSpeech2 - การใช้งาน Paddlepaddle ของสถาปัตยกรรม DeepSpeech2 สำหรับ ASR
NVIDIA - ชุดเครื่องมือสำหรับการทดลองที่มีประสิทธิภาพด้วยการรู้จำเสียงพูด Text2Speech และ NLP
RWTH Aachen University - กรอบการฝึกอบรม RWT
TensorFlow - การใช้งานแบบจำลอง DeepSpeech2
Mozilla - Deepspeech - การใช้งาน Tensorflow ของสถาปัตยกรรม Deepspeech ของ Baidu
ESPNET-ชุดเครื่องมือประมวลผลคำพูดแบบ end-to-end
Sean Naren - การจดจำคำพูดโดยใช้ DeepSpeech2

ยิ่งกว่านั้นคุณสามารถสำรวจ gitHub โดยใช้วลีสำคัญเช่น ASR , DeepSpeech หรือ Speech-To-Text รายการ wer_are_we ความพยายามในการติดตามสถานะของศิลปะก็มีประโยชน์เช่นกัน

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-09-09
ขนาด 162.91KB
มาจาก Github

แอปที่เกี่ยวข้อง

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
GitHub actions/download artifact

2024-11-01

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด