audiotoken
v0.3.1
令牌化音频以获得原声和语义令牌。
pip install audiotoken您可以使用声学或语义编码器来编码音频并获取令牌。
from pathlib import Path
from audiotoken import AudioToken , Tokenizers
encoder = AudioToken ( tokenizer = Tokenizers . acoustic , device = 'cuda:0' )
encoded_audio = encoder . encode ( Path ( 'path/to/audio.wav' ))有1个声学和2个语义引导者可用:
Tokenizers.acousticTokenizers.semantic_s (小)Tokenizers.semantic_m (中)您可以这样解码声令牌:
from pathlib import Path
from audiotoken import AudioToken , Tokenizers
tokenizer = AudioToken ( tokenizer = Tokenizers . acoustic , device = 'cuda:0' )
encoded_audio = tokenizer . encode ( Path ( 'path/to/audio.wav' ))
decoded_audio = tokenizer . decode ( encoded_audio )
# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio . save (
'reconstructed.wav' ,
decoded_audio ,
sample_rate = 24000
)您可以这样解码语义令牌:
from pathlib import Path
from audiotoken import AudioToken , Tokenizers
semantic_tokenizer = AudioToken ( tokenizer = Tokenizers . semantic_s , device = 'cuda:0' )
semantic_toks = semantic_tokenizer . encode ( Path ( 'path/to/audio.wav' ))
decoded_audio = semantic_tokenizer . decode ( semantic_toks )
# Save the decoded audio and compare it with the original audio
import torch
import torchaudio
torchaudio . save (
'reconstructed.wav' ,
decoded_audio ,
sample_rate = 24000
)有关更多用法示例,请参见示例/usage.ipynb。
核心类
from audiotoken import AudioToken , Tokenizers
tokenizer = AudioToken ( tokenizer = Tokenizers . semantic_m , device = 'cuda:0' )有关API的完整文档,请参见Audiotoken/Core.py。
提供了3个API:
tokenizer.encode :一次编码单个音频文件/数组tokenizer.encode_batch_files :批处理多个音频文件,然后将它们直接保存到磁盘encode_batch_files在同一文件列表上多次运行,因为它可能导致数据不正确。这将在以后的版本中解决。tokenizer.decode :解码原声/语义令牌