metavoice src下载 - metavoice src源代码下载

metavoice src

其他源码

1.0.0

下载

metavoice-1b

Metavoice-1b是一种1.2B参数基本模型，该模型在TTS的100K语音（文本到语音）上训练。它的构建具有以下优先级：

英语的情感语音节奏和语气。
美国和英国声音的零射克隆，带有30年代参考音频。
支持（跨语言）语音和填充。
- 我们的成功数据仅为印度说话者的1分钟培训数据。
任意长度文本的合成

我们将在Apache 2.0许可下释放Metavoice-1B，可以使用它而无限制。

Quickstart -Tl; Dr

Web UI

docker-compose up -d ui && docker-compose ps && docker-compose logs -f

服务器

 # navigate to <URL>/docs for API definitions
docker-compose up -d server && docker-compose ps && docker-compose logs -f

安装

先决条件：

GPU VRAM> = 12GB
Python> = 3.10，<3.12
PIPX（安装说明）

环境设置

 # install ffmpeg
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz.md5
md5sum -c ffmpeg-git-amd64-static.tar.xz.md5
tar xvf ffmpeg-git-amd64-static.tar.xz
sudo mv ffmpeg-git- * -static/ffprobe ffmpeg-git- * -static/ffmpeg /usr/local/bin/
rm -rf ffmpeg-git- *

# install rust if not installed (ensure you've restarted your terminal after installation)
curl --proto ' =https ' --tlsv1.2 -sSf https://sh.rustup.rs | sh

项目依赖项安装

使用诗歌
使用PIP/CONDA

使用诗歌（推荐）

 # install poetry if not installed (ensure you've restarted your terminal after installation)
pipx install poetry

# disable any conda envs that might interfere with poetry's venv
conda deactivate

# if running from Linux, keyring backend can hang on `poetry install`. This prevents that.
export PYTHON_KEYRING_BACKEND=keyring.backends.fail.Keyring

# pip's dependency resolver will complain, this is temporary expected behaviour
# full inference & finetuning functionality will still be available
poetry install && poetry run pip install torch==2.2.1 torchaudio==2.2.1

使用PIP/CONDA

注意1：提出问题时，我们会要求您先尝试诗歌。注2：默认情况下，此读书中的所有命令都使用poetry ，因此您只需删除任何poetry run即可。

pip install -r requirements.txt
pip install torch==2.2.1 torchaudio==2.2.1
pip install -e .

用法

下载并在我们的参考实现中（包括本地）使用它（包括本地）

 # You can use `--quantisation_mode int4` or `--quantisation_mode int8` for experimental faster inference.  This will degrade the quality of the audio.
# Note: int8 is slower than bf16/fp16 for undebugged reasons. If you want fast, try int4 which is roughly 2x faster than bf16/fp16.
poetry run python -i fam/llm/fast_inference.py

# Run e.g. of API usage within the interactive python session
tts.synthesise(text= " This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model. " , spk_ref_path= " assets/bria.mp3 " )

注意：脚本需要30-90秒才能启动（取决于硬件）。这是因为我们对快速推理模型进行了折磨。

在Ampere，Ada-Lovelace和Hopper Architecture GPU上，一旦编译，Synthesise（）API的运行速度比实时因子（RTF）<1.0快。

使用我们的推理服务器或Web UI，将其部署在任何云（AWS/GCP/Azure）上

 # You can use `--quantisation_mode int4` or `--quantisation_mode int8` for experimental faster inference. This will degrade the quality of the audio.
# Note: int8 is slower than bf16/fp16 for undebugged reasons. If you want fast, try int4 which is roughly 2x faster than bf16/fp16.

# navigate to <URL>/docs for API definitions
poetry run python serving.py

poetry run python app.py

通过拥抱脸使用它
Google COLAB演示

微调

我们支持FINETUNTINE fineTuning first Stage LLM（请参阅体系结构部分）。

为了获得Finetune，我们期望以下格式的“ |”限制在以下格式的CSV数据集：

 audio_files|captions
./data/audio.wav|./data/caption.txt

请注意，我们没有执行任何数据集重叠检查，因此请确保您的火车和Val数据集是不相交的。

通过以下方式使用我们的示例数据集尝试一下

poetry run finetune --train ./datasets/sample_dataset.csv --val ./datasets/sample_val_dataset.csv

训练模型后，您可以将其用于推断：

poetry run python -i fam/llm/fast_inference.py --first_stage_path ./my-finetuned_model.pt

配置

为了设置超参数，例如学习率，冻结的内容等，您可以编辑Finetune_params.py文件。

我们已经与W＆B进行了轻巧的可选集成，可以通过设置wandb_log = True ＆通过安装适当的依赖项来启用。

poetry install -E observable

即将到来

更快的推理⚡
微调代码？
任意长度文本的合成

建筑学

我们从文本和扬声器信息中预测Encodec令牌。然后将其扩散到波形水平，并应用后处理以清理音频。

我们使用因果GPT来预测Encodec令牌的前两个层次结构。文本和音频是LLM上下文的一部分。扬声器信息通过在令牌嵌入层处的调节传递。该扬声器调节是从单独训练的扬声器验证网络获得的。
- 这两个层次结构以“扁平交错”方式预测，我们预测第一个层次结构的第一个令牌，然后是第二个层次结构的第一个令牌，然后是第一个层次结构的第二个标记，等等。
- 我们使用无条件采样来提高模型的克隆能力。
- 该文本使用具有512个令牌的自定义训练的BPE令牌来进行令牌化。
- 请注意，我们已经跳过了预测其他作品的语义令牌，因为我们发现这不是严格必要的。
我们使用非混合（编码器式）变压器来预测前两个层次结构的6个层次结构的其余部分。这是一个超级小型模型（〜10MN参数），并且对我们尝试过的大多数扬声器具有广泛的零弹性概括。由于它是非毒品的，我们也能够并行预测所有时间步。
我们使用多波段扩散来从Encodec代币中生成波形。我们注意到演讲比使用原始RVQ解码器或VOCOS更清晰。但是，波形水平的扩散留下了一些背景伪像，这些伪影对耳朵非常不愉快。我们在下一步中清理此操作。
我们使用DeepFilternet清除多波段扩散引入的伪影。

优化

模型支持：

通过闪存解码的kV摄取
批处理（包括不同长度的文本）

贡献

查看所有活动问题！

致谢

我们感谢他们在一起的24/7帮助，以编造我们的集群。我们感谢AWS，GCP和Hugging Face的团队为他们的云平台提供支持。

défossez等。 al。对于Eccodec。
RS Roman等。 al。用于多播扩散。
@liusongxiang用于扬声器编码器实现。
@karpathy for Nanogpt基于的推理实现。
@rikorose for Deepfilternet。

如果我们错过了任何人，请提前道歉。如果我们有，请告诉我们。

展开

附加信息

版本 1.0.0
类型其他源码
更新时间 2025-02-24
大小 1.16MB
来自于 Github

metavoice src

metavoice-1b

Quickstart -Tl; Dr

安装

项目依赖项安装

使用诗歌（推荐）

使用PIP/CONDA

用法

微调

配置

即将到来

建筑学

优化

贡献

致谢

src

GitHub sgrebnov/cordova plugin background download

Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

hidusbf

Google Dorks

shepherd

hidusbf