audiotoken
v0.3.1
令牌化音頻以獲得原聲和語義令牌。
pip install audiotoken您可以使用聲學或語義編碼器來編碼音頻並獲取令牌。
from pathlib import Path
from audiotoken import AudioToken , Tokenizers
encoder = AudioToken ( tokenizer = Tokenizers . acoustic , device = 'cuda:0' )
encoded_audio = encoder . encode ( Path ( 'path/to/audio.wav' ))有1個聲學和2個語義引導者可用:
Tokenizers.acousticTokenizers.semantic_s (小)Tokenizers.semantic_m (中)您可以這樣解碼聲令牌:
from pathlib import Path
from audiotoken import AudioToken , Tokenizers
tokenizer = AudioToken ( tokenizer = Tokenizers . acoustic , device = 'cuda:0' )
encoded_audio = tokenizer . encode ( Path ( 'path/to/audio.wav' ))
decoded_audio = tokenizer . decode ( encoded_audio )
# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio . save (
'reconstructed.wav' ,
decoded_audio ,
sample_rate = 24000
)您可以這樣解碼語義令牌:
from pathlib import Path
from audiotoken import AudioToken , Tokenizers
semantic_tokenizer = AudioToken ( tokenizer = Tokenizers . semantic_s , device = 'cuda:0' )
semantic_toks = semantic_tokenizer . encode ( Path ( 'path/to/audio.wav' ))
decoded_audio = semantic_tokenizer . decode ( semantic_toks )
# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio . save (
'reconstructed.wav' ,
decoded_audio ,
sample_rate = 24000
)有關更多用法示例,請參見示例/usage.ipynb。
核心類
from audiotoken import AudioToken , Tokenizers
tokenizer = AudioToken ( tokenizer = Tokenizers . semantic_m , device = 'cuda:0' )有關API的完整文檔,請參見Audiotoken/Core.py。
提供了3個API:
tokenizer.encode :一次編碼單個音頻文件/數組tokenizer.encode_batch_files :批處理多個音頻文件,然後將它們直接保存到磁盤encode_batch_files在同一文件列表上多次運行,因為它可能導致數據不正確。這將在以後的版本中解決。tokenizer.decode :解碼原聲/語義令牌