Automatic Speech Recognition 다운로드 - Automatic Speech Recognition 소스 코드 다운로드

Automatic Speech Recognition

AI 소스 코드

1.0.0

다운로드

자동 음성 인식

프로젝트 목표는 자동 음성 인식 연구를 증류하는 것입니다. 처음에는 미리 훈련 된 모델로 즉시 사용 가능한 파이프 라인을로드 할 수 있습니다. 열심 인 TensorFlow 2.0 의 혜택을 받고 모델 가중치, 활성화 또는 그라디언트를 자유롭게 모니터링하십시오.

 import automatic_speech_recognition as asr

file = 'to/test/sample.wav'  # sample rate 16 kHz, and 16 bit depth
sample = asr . utils . read_audio ( file )
pipeline = asr . load ( 'deepspeech2' , lang = 'en' )
pipeline . model . summary ()     # TensorFlow model
sentences = pipeline . predict ([ sample ])

우리는 영어를 지원합니다 (오픈 SEQ2Seq 덕분). 영어 벤치 마크 Librispeech Dev-Clean의 평가 결과는 표에 있습니다. 참고로, DeepSpeech (Mozilla)는 약 7.5% WER을 달성하는 반면, 최첨단 (RWTH Achen University)은 2.3%와 같습니다 (최근 평가 결과는 여기에서 찾을 수 있음). 둘 다 외부 언어 모델을 사용하여 결과를 향상시킵니다. 이에 비해 인간은 여기서 5.83%를 달성합니다 (librispeech dev-clean)

모델 이름	디코더	wer-dev
`deepspeech2`	탐욕스러운	6.71

곧 파이프 라인을 조금 조정해야한다는 것이 밝혀졌습니다. CTC 파이프 라인을 살펴보십시오. 이 파이프 라인은 신경망 모델을 모든 비일 변형 (특징 추출 또는 예측 디코딩)과 연결하는 데 도움이됩니다. 파이프 라인 구성 요소는 독립적입니다. 예를 들어보다 정교한 기능 추출, 다른 데이터 증강을 사용하거나 언어 모델 디코더 (정적 N- 그램 또는 거대한 변압기)를 추가 할 수 있습니다. 전략을 사용하여 훈련을 배포하거나 혼합 정밀 정책을 실험하는 것과 훨씬 더 같은 작업을 수행 할 수 있습니다.

 import numpy as np
import tensorflow as tf
import automatic_speech_recognition as asr

dataset = asr . dataset . Audio . from_csv ( 'train.csv' , batch_size = 32 )
dev_dataset = asr . dataset . Audio . from_csv ( 'dev.csv' , batch_size = 32 )
alphabet = asr . text . Alphabet ( lang = 'en' )
features_extractor = asr . features . FilterBanks (
    features_num = 160 ,
    winlen = 0.02 ,
    winstep = 0.01 ,
    winfunc = np . hanning
)
model = asr . model . get_deepspeech2 (
    input_dim = 160 ,
    output_dim = 29 ,
    rnn_units = 800 ,
    is_mixed_precision = False
)
optimizer = tf . optimizers . Adam (
    lr = 1e-4 ,
    beta_1 = 0.9 ,
    beta_2 = 0.999 ,
    epsilon = 1e-8
)
decoder = asr . decoder . GreedyDecoder ()
pipeline = asr . pipeline . CTCPipeline (
    alphabet , features_extractor , model , optimizer , decoder
)
pipeline . fit ( dataset , dev_dataset , epochs = 25 )
pipeline . save ( '/checkpoint' )

test_dataset = asr . dataset . Audio . from_csv ( 'test.csv' )
wer , cer = asr . evaluate . calculate_error_rates ( pipeline , test_dataset )
print ( f'WER: { wer }   CER: { cer } ' )

설치

PIP를 사용할 수 있습니다.

pip install automatic-speech-recognition

그렇지 않으면 코드를 복제하고 Conda를 통해 새로운 환경을 만듭니다.

git clone https://github.com/rolczynski/Automatic-Speech-Recognition.git
conda env create -f=environment.yml     # or use: environment-gpu.yml
conda activate Automatic-Speech-Recognition