f5 tts swift
0.0.6
使用MLX Swift框架在Swift中實現F5-TT。
您可以在此處收聽在M3 Max MacBook Pro上在11秒內生成的示例。
有關模型體系結構的更多詳細信息,請參見Python存儲庫。
該存儲庫基於此處可用的原始Pytorch實現。
可以從Xcode或SwiftPM構建F5TTS Swift軟件包。
擁抱面上有驗證的模型。
import F5TTS
let f5tts = try await F5TTS . fromPretrained ( repoId : " lucasnewman/f5-tts-mlx " )
let generatedAudio = try await f5tts . generate ( text : " The quick brown fox jumped over the lazy dog. " )結果是帶有24kHz音頻樣本的MLXARRAY。
如果您想使用自己的參考音頻示例,請確保它是一個大約5-10秒的單聲道24kHz WAV文件:
let generatedAudio = try await f5tts . generate (
text : " The quick brown fox jumped over the lazy dog. " ,
referenceAudioURL : ... ,
referenceAudioText : " This is the caption for the reference audio. "
)您可以使用這樣的ffmpeg將音頻文件轉換為正確的格式:
ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wavYushen Chen用於F5 TTS的原始Pytorch實施和預驗證的模型。
Phil Wang用於該模型所基於的E2 TTS實現。
@article { chen-etal-2024-f5tts ,
title = { F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching } ,
author = { Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen } ,
journal = { arXiv preprint arXiv:2410.06885 } ,
year = { 2024 } ,
} @inproceedings { Eskimez2024E2TE ,
title = { E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS } ,
author = { Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda } ,
year = { 2024 } ,
url = { https://api.semanticscholar.org/CorpusID:270738197 }
}該存儲庫中的代碼按照許可證文件中的MIT許可發布。