f5 tts swift Download - f5 tts swift SWIFT CODE Téléchargement

f5 tts swift

Code Source AI

0.0.6

Télécharger

F5 TTS pour Swift

Implémentation de F5-TTS dans Swift, en utilisant le cadre MLX SWIFT.

Vous pouvez écouter un échantillon ici généré en ~ 11 secondes sur un M3 MAX MacBook Pro.

Voir le référentiel Python pour plus de détails sur l'architecture du modèle.

Ce référentiel est basé sur l'implémentation originale de Pytorch disponible ici.

Installation

Le package SWIFT F5TTS peut être construit et exécuté à partir de Xcode ou SwiftPM.

Un modèle pré-entraîné est disponible sur HuggingFace.

Usage

import F5TTS

let f5tts = try await F5TTS . fromPretrained ( repoId : " lucasnewman/f5-tts-mlx " )

let generatedAudio = try await f5tts . generate ( text : " The quick brown fox jumped over the lazy dog. " )

Le résultat est un MLXArray avec des échantillons audio 24KHz.

Si vous souhaitez utiliser votre propre échantillon audio de référence, assurez-vous qu'il s'agit d'un fichier WAV mono, 24KHz d'environ 5 à 10 secondes:

 let generatedAudio = try await f5tts . generate (
    text : " The quick brown fox jumped over the lazy dog. " ,
    referenceAudioURL : ... ,
    referenceAudioText : " This is the caption for the reference audio. "
)

Vous pouvez convertir un fichier audio au format correct avec ffmpeg comme ceci:

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

Appréciation

Yushen Chen pour la mise en œuvre originale de Pytorch de F5 TTS et du modèle pré-entraîné.

Phil Wang pour l'implémentation E2 TTS sur laquelle ce modèle est basé.

Citations

 @article { chen-etal-2024-f5tts ,
      title = { F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching } , 
      author = { Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen } ,
      journal = { arXiv preprint arXiv:2410.06885 } ,
      year = { 2024 } ,
}

 @inproceedings { Eskimez2024E2TE ,
    title   = { E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS } ,
    author  = { Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:270738197 }
}