f5 tts swift下载f5 tts swift源代码下载

f5 tts swift

Ai源码

0.0.6

下载

f5 tts for Swift

使用MLX Swift框架在Swift中实现F5-TT。

您可以在此处收听在M3 Max MacBook Pro上在11秒内生成的示例。

有关模型体系结构的更多详细信息，请参见Python存储库。

该存储库基于此处可用的原始Pytorch实现。

安装

可以从Xcode或SwiftPM构建F5TTS Swift软件包。

拥抱面上有验证的模型。

用法

import F5TTS

let f5tts = try await F5TTS . fromPretrained ( repoId : " lucasnewman/f5-tts-mlx " )

let generatedAudio = try await f5tts . generate ( text : " The quick brown fox jumped over the lazy dog. " )

结果是带有24kHz音频样本的MLXARRAY。

如果您想使用自己的参考音频示例，请确保它是一个大约5-10秒的单声道24kHz WAV文件：

 let generatedAudio = try await f5tts . generate (
    text : " The quick brown fox jumped over the lazy dog. " ,
    referenceAudioURL : ... ,
    referenceAudioText : " This is the caption for the reference audio. "
)

您可以使用这样的ffmpeg将音频文件转换为正确的格式：

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

欣赏

Yushen Chen用于F5 TTS的原始Pytorch实施和预验证的模型。

Phil Wang用于该模型所基于的E2 TTS实现。

引用

 @article { chen-etal-2024-f5tts ,
      title = { F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching } , 
      author = { Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen } ,
      journal = { arXiv preprint arXiv:2410.06885 } ,
      year = { 2024 } ,
}

 @inproceedings { Eskimez2024E2TE ,
    title   = { E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS } ,
    author  = { Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:270738197 }
}