
Silero模型:預訓練的企業級STT / TTS模型和基準。
Enterprise級STT使清爽的簡單(認真地,請參閱基準)。我們提供的質量與Google的STT相當(有時甚至更好),而且我們不是Google。
作為獎勵:
另外,我們發布了滿足以下標準的TTS模型:
另外,我們發表了一個文本校正和資本化模型,該模型:
您基本上可以在3種口味中使用我們的型號:
torch.hub.load() ;pip install silero ,然後import silero ;PIP和Pytorch Hub按需下載型號。如果需要緩存,請手動或通過調用必要的模型(將下載到緩存文件夾)進行操作。請參閱這些文檔以獲取更多信息。
Pytorch集線器和PIP軟件包基於相同的代碼。所有torch.hub.load示例都可以通過此基本更改與PIP軟件包一起使用:
# before
torch . hub . load ( repo_or_dir = 'snakers4/silero-models' ,
model = 'silero_stt' , # or silero_tts or silero_te
** kwargs )
# after
from silero import silero_stt , silero_tts , silero_te
silero_stt ( ** kwargs )所有提供的模型均在模型文件中列出。任何元數據和較新版本都將在此處添加。

目前,我們提供以下檢查點:
| Pytorch | onnx | 量化 | 品質 | COLAB | |
|---|---|---|---|---|---|
英語( en_v6 ) | ✔️ | ✔️ | ✔️ | 關聯 | |
英語( en_v5 ) | ✔️ | ✔️ | ✔️ | 關聯 | |
德語( de_v4 ) | ✔️ | ✔️ | ⌛ | 關聯 | |
英語( en_v3 ) | ✔️ | ✔️ | ✔️ | 關聯 | |
德語( de_v3 ) | ✔️ | ⌛ | ⌛ | 關聯 | |
德語( de_v1 ) | ✔️ | ✔️ | ⌛ | 關聯 | |
西班牙語( es_v1 ) | ✔️ | ✔️ | ⌛ | 關聯 | |
烏克蘭( ua_v3 ) | ✔️ | ✔️ | ✔️ | N/A。 |
型號的口味:
| 吉特 | 吉特 | 吉特 | 吉特 | jit_q | jit_q | onnx | onnx | onnx | onnx | |
|---|---|---|---|---|---|---|---|---|---|---|
| XSMALL | 小的 | 大的 | Xlarge | XSMALL | 小的 | XSMALL | 小的 | 大的 | Xlarge | |
英語en_v6 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |||||
英語en_v5 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |||||
英語en_v4_0 | ✔️ | ✔️ | ||||||||
英語en_v3 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
德國de_v4 | ✔️ | ✔️ | ||||||||
德國de_v3 | ✔️ | |||||||||
德國de_v1 | ✔️ | ✔️ | ||||||||
西班牙es_v1 | ✔️ | ✔️ | ||||||||
烏克蘭ua_v3 | ✔️ | ✔️ | ✔️ |
torch ,1.8+(用於克隆倉庫中的tensorflow和onnx示例),打破了以上1.6的版本的更改torchaudio ,最新版本綁定到Pytorch應該只能工作omegaconf ,最新應該只能工作onnx ,最新應該只能工作onnxruntime ,最新應該只能工作tensorflow ,最新應該只能使用tensorflow_hub ,最新應該只能工作有關下面的每個示例的詳細信息,請參閱提供的COLAB。所有示例都可以維持使用最新的已安裝庫的主要包裝版本。
import torch
import zipfile
import torchaudio
from glob import glob
device = torch . device ( 'cpu' ) # gpu also works, but our models are fast enough for CPU
model , decoder , utils = torch . hub . load ( repo_or_dir = 'snakers4/silero-models' ,
model = 'silero_stt' ,
language = 'en' , # also available 'de', 'es'
device = device )
( read_batch , split_into_batches ,
read_audio , prepare_model_input ) = utils # see function signature for details
# download a single file in any format compatible with TorchAudio
torch . hub . download_url_to_file ( 'https://opus-codec.org/static/examples/samples/speech_orig.wav' ,
dst = 'speech_orig.wav' , progress = True )
test_files = glob ( 'speech_orig.wav' )
batches = split_into_batches ( test_files , batch_size = 10 )
input = prepare_model_input ( read_batch ( batches [ 0 ]),
device = device )
output = model ( input )
for example in output :
print ( decoder ( example . cpu ()))我們的模型將運行可以導入ONNX模型或支持ONNX運行時的任何地方。
import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf
language = 'en' # also available 'de', 'es'
# load provided utils
_ , decoder , utils = torch . hub . load ( repo_or_dir = 'snakers4/silero-models' , model = 'silero_stt' , language = language )
( read_batch , split_into_batches ,
read_audio , prepare_model_input ) = utils
# see available models
torch . hub . download_url_to_file ( 'https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml' , 'models.yml' )
models = OmegaConf . load ( 'models.yml' )
available_languages = list ( models . stt_models . keys ())
assert language in available_languages
# load the actual ONNX model
torch . hub . download_url_to_file ( models . stt_models . en . latest . onnx , 'model.onnx' , progress = True )
onnx_model = onnx . load ( 'model.onnx' )
onnx . checker . check_model ( onnx_model )
ort_session = onnxruntime . InferenceSession ( 'model.onnx' )
# download a single file in any format compatible with TorchAudio
torch . hub . download_url_to_file ( 'https://opus-codec.org/static/examples/samples/speech_orig.wav' , dst = 'speech_orig.wav' , progress = True )
test_files = [ 'speech_orig.wav' ]
batches = split_into_batches ( test_files , batch_size = 10 )
input = prepare_model_input ( read_batch ( batches [ 0 ]))
# actual ONNX inference and decoding
onnx_input = input . detach (). cpu (). numpy ()
ort_inputs = { 'input' : onnx_input }
ort_outs = ort_session . run ( None , ort_inputs )
decoded = decoder ( torch . Tensor ( ort_outs [ 0 ])[ 0 ])
print ( decoded )SavedModel示例
import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf
language = 'en' # also available 'de', 'es'
# load provided utils using torch.hub for brevity
_ , decoder , utils = torch . hub . load ( repo_or_dir = 'snakers4/silero-models' , model = 'silero_stt' , language = language )
( read_batch , split_into_batches ,
read_audio , prepare_model_input ) = utils
# see available models
torch . hub . download_url_to_file ( 'https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml' , 'models.yml' )
models = OmegaConf . load ( 'models.yml' )
available_languages = list ( models . stt_models . keys ())
assert language in available_languages
# load the actual tf model
torch . hub . download_url_to_file ( models . stt_models . en . latest . tf , 'tf_model.tar.gz' )
subprocess . run ( 'rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model' , shell = True , check = True )
tf_model = tf . saved_model . load ( 'tf_model' )
# download a single file in any format compatible with TorchAudio
torch . hub . download_url_to_file ( 'https://opus-codec.org/static/examples/samples/speech_orig.wav' , dst = 'speech_orig.wav' , progress = True )
test_files = [ 'speech_orig.wav' ]
batches = split_into_batches ( test_files , batch_size = 10 )
input = prepare_model_input ( read_batch ( batches [ 0 ]))
# tf inference
res = tf_model . signatures [ "serving_default" ]( tf . constant ( input . numpy ()))[ 'output_0' ]
print ( decoder ( torch . Tensor ( res . numpy ())[ 0 ]))所有提供的模型均在模型文件中列出。任何元數據和較新版本都將在此處添加。
V4模型支持SSML。另請參見COLAB示例以獲取主要SSML標籤。
| ID | 演講者 | 自動壓力 | 語言 | Sr | COLAB |
|---|---|---|---|---|---|
v4_ru | aidar kseniya | 是的 | ru (俄語) | | |
v4_cyrillic | b_ava , marat_tt , kalmyk_erdni ... | 不 | cyrillic (Avar,Tatar,Kalmyk,...) | | |
v4_ua | mykyta , random | 不 | ua (烏克蘭人) | | |
v4_uz | dilnavoz | 不 | uz (烏茲別克) | | |
v4_indic | hindi_male , hindi_female ,..., random | 不 | indic (印地語,泰盧固語,...) | |
V3模型支持SSML。另請參見COLAB示例以獲取主要SSML標籤。
| ID | 演講者 | 自動壓力 | 語言 | Sr | COLAB |
|---|---|---|---|---|---|
v3_en | en_0 , en_1 ,..., en_117 , random | 不 | en (英語) | | |
v3_en_indic | tamil_female ,..., assamese_male , random | 不 | en (英語) | | |
v3_de | eva_k ,..., karlsson , random | 不 | de (德語) | | |
v3_es | es_0 , es_1 , es_2 , random | 不 | es (西班牙) | | |
v3_fr | fr_0 ,..., fr_5 , random | 不 | fr (法語) | | |
v3_indic | hindi_male , hindi_female ,..., random | 不 | indic (印地語,泰盧固語,...) | |
COLAB示例的基本依賴性:
torch ,1.10+;torchaudio ,最新版本綁定到Pytorch(僅是因為模型與STT一起託管,而不是工作需要);omegaconf ,最新的(如果您不加載所有配置,也可以刪除); # V4
import torch
language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch . device ( 'cpu' )
model , example_text = torch . hub . load ( repo_or_dir = 'snakers4/silero-models' ,
model = 'silero_tts' ,
language = language ,
speaker = model_id )
model . to ( device ) # gpu or cpu
audio = model . apply_tts ( text = example_text ,
speaker = speaker ,
sample_rate = sample_rate ) # V4
import os
import torch
device = torch . device ( 'cpu' )
torch . set_num_threads ( 4 )
local_file = 'model.pt'
if not os . path . isfile ( local_file ):
torch . hub . download_url_to_file ( 'https://models.silero.ai/models/tts/ru/v4_ru.pt' ,
local_file )
model = torch . package . PackageImporter ( local_file ). load_pickle ( "tts_models" , "model" )
model . to ( device )
example_text = 'В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.'
sample_rate = 48000
speaker = 'baya'
audio_paths = model . save_wav ( text = example_text ,
speaker = speaker ,
sample_rate = sample_rate )查看我們的TTS Wiki頁面。
支持的tokenset: !,-.:?iµöабвгдежзийклмнопрстуфхцчшщъыьэюяёђѓєіјњћќўѳғҕҗҙқҡңҥҫүұҳҷһӏӑӓӕӗәӝӟӥӧөӱӳӵӹ
| 揚聲器_ID | 語言 | 性別 |
|---|---|---|
| B_AVA | 阿瓦爾 | f |
| B_BASHKIR | 巴什基 | m |
| b_bulb | 保加利亞語 | m |
| b_bulc | 保加利亞語 | m |
| B_che | 車臣 | m |
| B_CV | chuvash | m |
| CV_EKATERINA | chuvash | f |
| B_MYV | Erzya | m |
| b_kalmyk | 卡爾米克 | m |
| b_krc | 卡拉切 - 巴爾卡 | m |
| kz_m1 | 哈薩克 | m |
| kz_m2 | 哈薩克 | m |
| kz_f3 | 哈薩克 | f |
| kz_f1 | 哈薩克 | f |
| kz_f2 | 哈薩克 | f |
| B_KJH | 卡卡斯 | f |
| B_KPV | Komi-Ziryan | m |
| B_lez | 萊茲格安 | m |
| B_MHR | 瑪麗 | f |
| B_MRJ | 瑪麗高 | m |
| b_nog | Nogai | f |
| 老闆 | 胡說八道 | m |
| B_RU | 俄語 | m |
| B_TAT | 塔塔爾 | m |
| marat_tt | 塔塔爾 | m |
| B_TYV | 圖維尼亞人 | m |
| B_UDM | Udmurt | m |
| B_UZB | 烏茲別克 | m |
| B_SAH | yakut | m |
| kalmyk_erdni | 卡爾米克 | m |
| kalmyk_delghir | 卡爾米克 | f |
(!!!!)使用aksharamukha將所有輸入句子都被roman romantim to ISO格式。 hindi的一個例子:
# V3
import torch
from aksharamukha import transliterate
# Loading model
model , example_text = torch . hub . load ( repo_or_dir = 'snakers4/silero-models' ,
model = 'silero_tts' ,
language = 'indic' ,
speaker = 'v4_indic' )
orig_text = "प्रसिद्द कबीर अध्येता, पुरुषोत्तम अग्रवाल का यह शोध आलेख, उस रामानंद की खोज करता है"
roman_text = transliterate . process ( 'Devanagari' , 'ISO' , orig_text )
print ( roman_text )
audio = model . apply_tts ( roman_text ,
speaker = 'hindi_male' )| 語言 | 演講者 | 羅馬化功能 |
|---|---|---|
| 印地語 | hindi_female , hindi_male | transliterate.process('Devanagari', 'ISO', orig_text) |
| 馬拉雅拉姆語 | malayalam_female , malayalam_male | transliterate.process('Malayalam', 'ISO', orig_text) |
| 曼尼普里 | manipuri_female | transliterate.process('Bengali', 'ISO', orig_text) |
| 孟加拉 | bengali_female , bengali_male | transliterate.process('Bengali', 'ISO', orig_text) |
| 拉賈斯坦 | rajasthani_female , rajasthani_female | transliterate.process('Devanagari', 'ISO', orig_text) |
| 泰米爾人 | tamil_female , tamil_male | transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe']) |
| 泰盧固語 | telugu_female , telugu_male | transliterate.process('Telugu', 'ISO', orig_text) |
| 古吉拉特語 | gujarati_female , gujarati_male | transliterate.process('Gujarati', 'ISO', orig_text) |
| 卡納達語 | kannada_female , kannada_male | transliterate.process('Kannada', 'ISO', orig_text) |
| 語言 | 量化 | 品質 | COLAB |
|---|---|---|---|
| 'en','de','ru','es' | ✔️ | 關聯 |
COLAB示例的基本依賴性:
torch ,1.9+;pyyaml ,但它安裝了火炬本身 import torch
model , example_texts , languages , punct , apply_te = torch . hub . load ( repo_or_dir = 'snakers4/silero-models' ,
model = 'silero_te' )
input_text = input ( 'Enter input text n ' )
apply_te ( input_text , lan = 'en' )Denoise模型試圖減少背景噪聲以及各種偽像,例如混響,剪裁,高/低通濾波器等,同時試圖保存和/或增強語音。他們還試圖提高音頻質量並提高輸入的採樣率高達48kHz。
所有提供的模型均在模型文件中列出。
| 模型 | 吉特 | 實際輸入SR | 輸入SR | 輸出SR | COLAB |
|---|---|---|---|---|---|
small_slow | ✔️ | | 24000 | 48000 | |
large_fast | ✔️ | | 24000 | 48000 | |
small_fast | ✔️ | | 24000 | 48000 |
COLAB示例的基本依賴性:
torch ,2.0+;torchaudio ,最新版本綁定到Pytorch應該有效;omegaconf ,最新的(如果您不加載所有配置,也可以刪除)。 import torch
name = 'small_slow'
device = torch . device ( 'cpu' )
model , samples , utils = torch . hub . load (
repo_or_dir = 'snakers4/silero-models' ,
model = 'silero_denoise' ,
name = name ,
device = device )
( read_audio , save_audio , denoise ) = utils
i = 0
torch . hub . download_url_to_file (
samples [ i ],
dst = f'sample { i } .wav' ,
progress = True
)
audio_path = f'sample { i } .wav'
audio = read_audio ( audio_path ). to ( device )
output = model ( audio )
save_audio ( f'result { i } .wav' , output . squeeze ( 1 ). cpu ())
i = 1
torch . hub . download_url_to_file (
samples [ i ],
dst = f'sample { i } .wav' ,
progress = True
)
output , sr = denoise ( model , f'sample { i } .wav' , f'result { i } .wav' , device = 'cpu' ) import os
import torch
device = torch . device ( 'cpu' )
torch . set_num_threads ( 4 )
local_file = 'model.pt'
if not os . path . isfile ( local_file ):
torch . hub . download_url_to_file ( 'https://models.silero.ai/denoise_models/sns_latest.jit' ,
local_file )
model = torch . jit . load ( local_file )
torch . _C . _jit_set_profiling_mode ( False )
torch . set_grad_enabled ( False )
model . to ( device )
a = torch . rand (( 1 , 48000 ))
a = a . to ( device )
out = model ( a )還可以查看我們的Wiki。
請參閱這些Wiki部分:
請參考。
嘗試我們的模型,創建問題,加入我們的聊天,給我們發送電子郵件,並閱讀最新消息。
有關相關信息,請參考我們的Wiki以及許可和層頁頁面,並給我們發送電子郵件。
@misc { Silero Models,
author = { Silero Team } ,
title = { Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks } ,
year = { 2021 } ,
publisher = { GitHub } ,
journal = { GitHub repository } ,
howpublished = { url{https://github.com/snakers4/silero-models} } ,
commit = { insert_some_commit_here } ,
email = { hello @ silero.ai }
}STT:
TTS:
vad:
文本增強:
stt
TTS:
vad:
文本增強:
請使用“贊助商”按鈕。