aivmlib Download - aivmlib Source code download

aivmlib

? aivmlib : Ai vis V oice M odel File (.aivm/.aivmx) Utility Lib rary

AIVM ( Ai vis V oice M odel) / AIVMX ( Ai vis V oice M odel for ONN X ) is an open file format for AI speech synthesis models that combines pre-trained models, hyperparameters, style vectors, speaker metadata (names, overviews, licenses, icons, voice samples, etc.) into one file .

Note

"AIVM" is also a general term for both AIVM/AIVMX format specifications and metadata specifications.
Specifically, the AIVM file is a model file in "Safetensors format with AIVM metadata added", and the AIVMX file is a model file in "ONNX format with AIVM metadata added".
"AIVM Metadata" refers to various metadata that is linked to a trained model as defined in the AIVM specification.

You can easily use AI speech synthesis models by adding AIVM/AIVMX files to software that supports AIVM specifications, including AivisSpeech/AivisSpeech-Engine.

aivmlib/aivmlib-web provides a utility for reading and writing metadata in AIVM/AIVMX files.
This aivmlib is a reference implementation of the AIVM specification written in Python. If you are using a web browser, please use aivmlib-web.

Tip

AIVM Generator allows you to easily generate and edit AIVM/AIVMX files using the GUI on your browser.
We recommend using AIVM Generator when manually generating and editing AIVM/AIVMX files.

aivmlib
- Installation
- Usage
- License
AIVM Specification
- Overview
- AIVM File Format Specification
  - Compatible with Safetensors format
  - References
- AIVMX File Format Specification
  - ONNX format compatibility
  - References
- AIVM Manifest Specification (Version 1.0)
  - Supported model architectures
  - Field definitions for AIVM manifests
- FAQ
  - Q. Why are there two formats defined: AIVM and AIVMX?
  - Q. Can I load AIVM/AIVMX files with existing tools?
  - Q. How do I convert an existing AI speech synthesis model to AIVM/AIVMX?
  - Q. How is the version control of the AIVM manifest?
  - Q. What is the difference between aivmlib and aivmlib-web?
  - Q. How do I add support for a new model architecture?
  - Q. How should I write the license information?
  - Q. Are there any size limits for image and audio data?
  - Q. Can I edit the metadata manually?

Installation

If you install it with pip, the command line tool aivmlib will also be automatically installed.
Requires Python 3.11 or higher.

pip install aivmlib

I use Poetry during development.

pip install poetry
git clone https://github.com/Aivis-Project/aivmlib.git
cd aivmlib
poetry install --with dev
poetry run aivmlib --help

Usage

Below is how to use the CLI tool itself.

$ aivmlib --help

 Usage: aivmlib [OPTIONS] COMMAND [ARGS]...

 Aivis Voice Model File (.aivm/.aivmx) Utility Library

╭─ Options ─────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.           │
│ --show-completion             Show completion for the current shell, to copy it   │
│                               or customize the installation.                      │
│ --help                        Show this message and exit.                         │
╰───────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────╮
│ create-aivm    与えられたアーキテクチャ, 学習済みモデル, ハイパーパラメータ,      │
│                スタイルベクトルから AIVM メタデータを生成した上で、               │
│                それを書き込んだ仮の AIVM ファイルを生成する                       │
│ create-aivmx   与えられたアーキテクチャ, 学習済みモデル, ハイパーパラメータ,      │
│                スタイルベクトルから AIVM メタデータを生成した上で、               │
│                それを書き込んだ仮の AIVMX ファイルを生成する                      │
│ show-metadata  指定されたパスの AIVM / AIVMX ファイル内に記録されている AIVM      │
│                メタデータを見やすく出力する                                       │
╰───────────────────────────────────────────────────────────────────────────────────╯

$ aivmlib show-metadata --help

 Usage: aivmlib show-metadata [OPTIONS] FILE_PATH

 指定されたパスの AIVM / AIVMX ファイル内に記録されている AIVM メタデータを見やすく出力する

╭─ Arguments ───────────────────────────────────────────────────────────────────────╮
│ *    file_path      PATH  Path to the AIVM / AIVMX file [default: None]           │
│                           [required]                                              │
╰───────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                       │
╰───────────────────────────────────────────────────────────────────────────────────╯

$ aivmlib create-aivm --help

 Usage: aivmlib create-aivm [OPTIONS]

 与えられたアーキテクチャ, 学習済みモデル, ハイパーパラメータ, スタイルベクトルから
 AIVM メタデータを生成した上で、それを書き込んだ仮の AIVM ファイルを生成する

╭─ Options ─────────────────────────────────────────────────────────────────────────╮
│ *  --output              -o      PATH                    Path to the output AIVM  │
│                                                          file                     │
│                                                          [default: None]          │
│                                                          [required]               │
│ *  --model               -m      PATH                    Path to the Safetensors  │
│                                                          model file               │
│                                                          [default: None]          │
│                                                          [required]               │
│    --hyper-parameters    -h      PATH                    Path to the hyper        │
│                                                          parameters file          │
│                                                          (optional)               │
│                                                          [default: None]          │
│    --style-vectors       -s      PATH                    Path to the style        │
│                                                          vectors file (optional)  │
│                                                          [default: None]          │
│    --model-architecture  -a      [Style-Bert-VITS2 | Styl  Model architecture       │
│                                  e-Bert-VITS2            [default:                │
│                                  (JP-Extra)]             Style-Bert-VITS2         │
│                                                          (JP-Extra)]              │
│    --help                                                Show this message and    │
│                                                          exit.                    │
╰───────────────────────────────────────────────────────────────────────────────────╯

$ aivmlib create-aivmx --help

 Usage: aivmlib create-aivmx [OPTIONS]

 与えられたアーキテクチャ, 学習済みモデル, ハイパーパラメータ, スタイルベクトルから
 AIVM メタデータを生成した上で、それを書き込んだ仮の AIVMX ファイルを生成する

╭─ Options ─────────────────────────────────────────────────────────────────────────╮
│ *  --output              -o      PATH                    Path to the output AIVMX │
│                                                          file                     │
│                                                          [default: None]          │
│                                                          [required]               │
│ *  --model               -m      PATH                    Path to the ONNX model   │
│                                                          file                     │
│                                                          [default: None]          │
│                                                          [required]               │
│    --hyper-parameters    -h      PATH                    Path to the hyper        │
│                                                          parameters file          │
│                                                          (optional)               │
│                                                          [default: None]          │
│    --style-vectors       -s      PATH                    Path to the style        │
│                                                          vectors file (optional)  │
│                                                          [default: None]          │
│    --model-architecture  -a      [Style-Bert-VITS2 | Styl  Model architecture       │
│                                  e-Bert-VITS2            [default:                │
│                                  (JP-Extra)]             Style-Bert-VITS2         │
│                                                          (JP-Extra)]              │
│    --help                                                Show this message and    │
│                                                          exit.                    │
╰───────────────────────────────────────────────────────────────────────────────────╯

Below is an example of executing the command.

 # Safetensors 形式で保存された "Style-Bert-VITS2 (JP-Extra)" モデルアーキテクチャの学習済みモデルから AIVM ファイルを生成
# .safetensors と同じディレクトリに config.json と style_vectors.npy があることが前提
# -a オプションを省略した場合、既定で "Style-Bert-VITS2 (JP-Extra)" の学習済みモデルと判定される
$ aivmlib create-aivm -o ./output.aivm -m ./model.safetensors

# 明示的にハイパーパラメータとスタイルベクトルのパスを指定して生成
$ aivmlib create-aivm -o ./output.aivm -m ./model.safetensors -h ./config.json -s ./style-vectors.npy

# ONNX 形式で保存された "Style-Bert-VITS2" モデルアーキテクチャの学習済みモデルから AIVMX ファイルを生成
# .onnx と同じディレクトリに config.json と style_vectors.npy があることが前提
$ aivmlib create-aivmx -o ./output.aivmx -m ./model.onnx -a " Style-Bert-VITS2 "

# 明示的にハイパーパラメータとスタイルベクトルのパスを指定して生成
$ aivmlib create-aivmx -o ./output.aivmx -m ./model.onnx -a " Style-Bert-VITS2 " -h ./config.json -s ./style-vectors.npy

# AIVM ファイルに格納された AIVM メタデータを確認
$ aivmlib show-metadata ./output.aivm

# AIVMX ファイルに格納された AIVM メタデータを確認
$ aivmlib show-metadata ./output.aivmx

Tip

For usage as a library, see Implementing the CLI tool implemented in __main__.py .

Important

aivmlib/aivmlib-web is a library that only read/write functions for AIVM/AIVMX file formats.
The inference logic for AI speech synthesis models for each model architecture and how data obtained from aivmlib/aivmlib-web is left to the user of the library.

License

MIT License

AIVM Specification

This section defines the following technical specifications included in the "AIVM Specifications":

AIVM File Format Specification
AIVMX File Format Specification
AIVM Manifest Specification (Version 1.0)
FAQ

Overview

The purpose is to combine the trained AI speech synthesis model and the various metadata necessary for its use into a single file, preventing file dissipation and confusion, and to make it easier to use and share the model.

Tip

By combining it into a single file, you can easily operate it by simply downloading the AIVM/AIVMX file and placing it in a specified folder, and using the speech synthesis model immediately with compatible software.
Another advantage is that it is not a compressed file, so there is no need to deploy it.

The AIVM specification does not rely on the model architecture of the speech synthesis model.
It has been designed with future scalability and versatility in mind so that speech synthesis models of different model architectures can be handled in a common file format.

If the underlying trained model is saved in a single Safetensors or ONNX format, in principle, you can add metadata to generate AIVM/AIVMX files, regardless of the model architecture.
When designing, we emphasized compatibility with existing ecosystems so that they could be loaded as regular Safetensors or ONNX files without any conversion processing.

Important

The AIVM specification does not define the inference method for each model architecture. The specifications are defined as "a file that summarizes the metadata of the AI speech synthesis model."
For example, for AIVM files, the stored AI speech synthesis model may be for PyTorch or TensorFlow.
How to infer AI speech synthesis models is left to the implementation of software that supports AIVM/AIVMX files.

AIVM File Format Specification

The specifications for the AIVM file format are shown below.

AIVM ( Ai vis V oice M odel) is an extended Safetensor format specification that stores various information such as speaker metadata (AIVM manifest), hyperparameters, and style vectors as custom metadata in the header area of a trained model stored in the Safetensors (.safetensors) format.

It can also be said to be a "common metadata description specification for AI speech synthesis models saved in Safetensors format."

Compatible with Safetensors format

Because it is an extended specification in the Safetensor format, it can be loaded as a normal Safetensor file as is.

Like Safetensors, the first 8 bytes of unsigned Little-Endian 64bit integer is the header size followed by a UTF-8 JSON string by the length of the header size.
The JSON header of Safetensors stores the offset of tensors, etc., but the __metadata__ key allows you to freely set the map from string to string.

Utilizing this specification, AIVM stores the following string data in the following keys in __metadata__ :

aivm_manifest : AIVM Manifest
- Stored as a JSON string
- Contains most information including manifest version and speaker metadata
aivm_hyper_parameters : Hyperparameters for speech synthesis models
- The storage format is model architecture dependent
- Style-Bert-VITS2 and Style-Bert-VITS2 (JP-Extra) model architecture stores JSON strings
aivm_style_vectors : Base64 Encoded speech synthesis model style vectors (binary)
- Base64 After decoding, the format is model architecture dependent
- In Style-Bert-VITS2 and Style-Bert-VITS2 (JP-Extra) model architecture, a string with Base64 encoded NumPy array (.npy) is stored.
- It may be omitted depending on the model architecture

References

Safetensors
Safetensors Metadata Parsing

AIVMX File Format Specification

Below is the specifications for the AIVMX file format.

AIVMX ( Ai vis V oice M odel for ONNX ) is an extended ONNX format specification that stores various information such as speaker metadata (AIVM manifest), hyperparameter style vectors as custom metadata in the metadata area of a trained model stored in ONNX format.

It can also be said to be a "common metadata description specification for AI speech synthesis models saved in ONNX format."

ONNX format compatibility

Because it is an extended specification in ONNX format, it can be loaded as a normal ONNX file as it is.

ONNX files are defined in the Protocol Buffers format, and are designed to store metadata as a list of StringStringEntryProto in the metadata_props field of the root ModelProto message.

Utilizing this specification, AIVMX stores the following string data in the following keys in metadata_props :

aivm_manifest : AIVM Manifest
- Stored as a JSON string
- Contains most information including manifest version and speaker metadata
aivm_hyper_parameters : Hyperparameters for speech synthesis models
- The storage format is model architecture dependent
- Style-Bert-VITS2 and Style-Bert-VITS2 (JP-Extra) model architecture stores JSON strings
aivm_style_vectors : Base64 Encoded speech synthesis model style vectors (binary)
- Base64 After decoding, the format is model architecture dependent
- In Style-Bert-VITS2 and Style-Bert-VITS2 (JP-Extra) model architecture, a string with Base64 encoded NumPy array (.npy) is stored.
- It may be omitted depending on the model architecture

References

ONNX
Open Neural Network Exchange Intermediate Representation (ONNX IR) Specification
ONNX Metadata
How to populate onnx model with custom meta data map ?

AIVM Manifest Specification (Version 1.0)

Below is the specifications for the AIVM manifest (Version 1.0) included in the AIVM/AIVMX file format.

The AIVM manifest contains various information needed to use the speech synthesis model, such as the manifest version, model architecture, model name, speaker metadata, and style information.

The data format for the AIVM manifest is a UTF-8 string written in JSON format.
Due to the JSON format, images and audio data are stored as Base64 encoded strings.

Note

AIVM (Safetensors) currently defined as the container format for AIVM manifests - The metadata area of AIVMX (ONNX) must be a key-value from a string type to a string type without nesting, so all metadata is serialized into a string and stored.
Binary data such as images and audio are stored as a string after being Base64 encoded.

Supported model architectures

Style-Bert-VITS2
Style-Bert-VITS2 (JP-Extra)

Important

Software that supports AIVM/AIVMX files must be properly validated for AIVM/AIVMX files of model architectures that are not supported by your own software.
For example, software that does not support model architectures other than Style-Bert-VITS2 (JP-Extra) , when asked to install the AIVM/AIVMX file for Style-Bert-VITS2 model architecture, an alert will be displayed saying "This model architecture is not supported" and the installation will be aborted.

Important

Technically, speech synthesis models of model architectures other than those listed above can be stored, but the only model architecture strings mentioned above are officially defined in the AIVM Manifest (Version 1.0) specification.
When defining your own model architecture string, you need to be extremely careful to avoid name conflicts with existing model architectures or misalignment between different software.
It is recommended to send pull requests to this repository as much as possible and officially add support for the new model architecture to the AIVM specification.

Field definitions for AIVM manifests

Below is the field definitions for the AIVM manifest as of the AIVM manifest (Version 1.0) specification (excerpt from aivmlib's Pydantic schema definition).

Important

Fields in the AIVM manifest may be added, expanded or removed when the AIVM specification is updated in the future.
It is also quite possible that new metadata will be added to the AIVM manifest and the AIVM/AIVMX file format itself with support for future version updates and additional model architectures.
The only currently active AIVM manifest version is 1.0 .

 class ModelArchitecture ( StrEnum ):
    StyleBertVITS2 = 'Style-Bert-VITS2'  # 対応言語: "ja", "en-US", "zh-CN"
    StyleBertVITS2JPExtra = 'Style-Bert-VITS2 (JP-Extra)'  # 対応言語: "ja"

class ModelFormat ( StrEnum ):
    Safetensors = 'Safetensors'
    ONNX = 'ONNX'

class AivmManifest ( BaseModel ):
    """ AIVM マニフェストのスキーマ """
    # AIVM マニフェストのバージョン (ex: 1.0)
    # 現在は 1.0 のみサポート
    manifest_version : Literal [ '1.0' ]
    # 音声合成モデルの名前 (最大 80 文字)
    # 音声合成モデル内の話者が 1 名の場合は話者名と同じ値を設定すべき
    name : Annotated [ str , StringConstraints ( min_length = 1 , max_length = 80 )]
    # 音声合成モデルの簡潔な説明 (最大 140 文字 / 省略時は空文字列を設定)
    description : Annotated [ str , StringConstraints ( max_length = 140 )] = ''
    # 音声合成モデルの制作者名のリスト (省略時は空リストを設定)
    # 制作者名には npm package.json の "author", "contributors" に指定できるものと同じ書式を利用できる
    # 例: ["John Doe", "Jane Doe <[email protected]>", "John Doe <[email protected]> (https://example.com)"]
    creators : list [ Annotated [ str , StringConstraints ( min_length = 1 , max_length = 255 )]] = []
    # 音声合成モデルのライセンス情報 (Markdown 形式またはプレーンテキスト / 省略時は None を設定)
    # AIVM 仕様に対応するソフトでライセンス情報を表示できるよう、Markdown 形式またはプレーンテキストでライセンスの全文を設定する想定
    # 社内のみでの利用など、この音声合成モデルの公開・配布を行わない場合は None を設定する
    license : Annotated [ str , StringConstraints ( min_length = 1 )] | None = None
    # 音声合成モデルのアーキテクチャ (音声合成技術の種類)
    model_architecture : ModelArchitecture
    # 音声合成モデルのモデル形式 (Safetensors または ONNX)
    # AIVM ファイル (.aivm) のモデル形式は Safetensors 、AIVMX ファイル (.aivmx) のモデル形式は ONNX である
    model_format : ModelFormat
    # 音声合成モデル学習時のエポック数 (省略時は None を設定)
    training_epochs : Annotated [ int , Field ( ge = 0 )] | None = None
    # 音声合成モデル学習時のステップ数 (省略時は None を設定)
    training_steps : Annotated [ int , Field ( ge = 0 )] | None = None
    # 音声合成モデルを一意に識別する UUID
    uuid : UUID
    # 音声合成モデルのバージョン (SemVer 2.0 準拠 / ex: 1.0.0)
    version : Annotated [ str , StringConstraints ( pattern = r'^(0|[1-9]d*).(0|[1-9]d*).(0|[1-9]d*)(?:-((?:0|[1-9]d*|d*[a-zA-Z-][0-9a-zA-Z-]*)(?:.(?:0|[1-9]d*|d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:+([0-9a-zA-Z-]+(?:.[0-9a-zA-Z-]+)*))?$' )]
    # 音声合成モデルの話者情報 (最低 1 人以上の話者が必要)
    speakers : list [ AivmManifestSpeaker ]

class AivmManifestSpeaker ( BaseModel ):
    """ AIVM マニフェストの話者情報 """
    # 話者の名前 (最大 80 文字)
    # 音声合成モデル内の話者が 1 名の場合は音声合成モデル名と同じ値を設定すべき
    name : Annotated [ str , StringConstraints ( min_length = 1 , max_length = 80 )]
    # 話者のアイコン画像 (Data URL)
    # 画像ファイル形式は 512×512 の JPEG (image/jpeg)・PNG (image/png) のいずれか (JPEG を推奨)
    icon : Annotated [ str , StringConstraints ( pattern = r'^data:image/(jpeg|png);base64,[A-Za-z0-9+/=]+$' )]
    # 話者の対応言語のリスト (BCP 47 言語タグ)
    # 例: 日本語: "ja", アメリカ英語: "en-US", 標準中国語: "zh-CN"
    supported_languages : list [ Annotated [ str , StringConstraints ( pattern = r'^[a-z]{2,3}(?:-[A-Z]{4})?(?:-(?:[A-Z]{2}|d{3}))?(?:-(?:[A-Za-z0-9]{5,8}|d[A-Za-z0-9]{3}))*(?:-[A-Za-z](?:-[A-Za-z0-9]{2,8})+)*(?:-x(?:-[A-Za-z0-9]{1,8})+)?$' )]]
    # 話者を一意に識別する UUID
    uuid : UUID
    # 話者のローカル ID (この音声合成モデル内で話者を識別するための一意なローカル ID で、uuid とは異なる)
    local_id : Annotated [ int , Field ( ge = 0 )]
    # 話者のスタイル情報 (最低 1 つ以上のスタイルが必要)
    styles : list [ AivmManifestSpeakerStyle ]

class AivmManifestSpeakerStyle ( BaseModel ):
    """ AIVM マニフェストの話者スタイル情報 """
    # スタイルの名前 (最大 20 文字)
    name : Annotated [ str , StringConstraints ( min_length = 1 , max_length = 20 )]
    # スタイルのアイコン画像 (Data URL, 省略可能)
    # 省略時は話者のアイコン画像がスタイルのアイコン画像として使われる想定
    # 画像ファイル形式は 512×512 の JPEG (image/jpeg)・PNG (image/png) のいずれか (JPEG を推奨)
    icon : Annotated [ str , StringConstraints ( pattern = r'^data:image/(jpeg|png);base64,[A-Za-z0-9+/=]+$' )] | None = None
    # スタイルの ID (この話者内でスタイルを識別するための一意なローカル ID で、uuid とは異なる)
    local_id : Annotated [ int , Field ( ge = 0 , le = 31 )]  # 最大 32 スタイルまでサポート
    # スタイルごとのボイスサンプル (省略時は空リストを設定)
    voice_samples : list [ AivmManifestVoiceSample ] = []

class AivmManifestVoiceSample ( BaseModel ):
    """ AIVM マニフェストのボイスサンプル情報 """
    # ボイスサンプルの音声ファイル (Data URL)
    # 音声ファイル形式は WAV (audio/wav, Codec: PCM 16bit)・M4A (audio/mp4, Codec: AAC-LC) のいずれか (M4A を推奨)
    audio : Annotated [ str , StringConstraints ( pattern = r'^data:audio/(wav|mp4);base64,[A-Za-z0-9+/=]+$' )]
    # ボイスサンプルの書き起こし文
    # 書き起こし文は音声ファイルでの発話内容と一致している必要がある
    transcript : Annotated [ str , StringConstraints ( min_length = 1 )]

FAQ

Q. Why are there two formats defined: AIVM and AIVMX?

A. To provide two formats optimized for different applications and environments, allowing for more flexible use.

AIVM (.aivm): A format based on the Safetensors format that can be used directly in machine learning frameworks such as PyTorch.
- Suitable for research and development, fine-tuning models, and new voice quality generation through model merging.
- It is generally specialized in high speed inference on NVIDIA GPUs (such as CUDA/TensorRT).
- PyTorch also has the .pth (pickle) format, but due to the nature of pickle, which serializes Python code as is, it is possible to execute arbitrary code. Therefore, there are no plans to support this with the AIVM specification.
AIVMX (.aivmx): A format based on the ONNX format, which allows for fast inference in a variety of environments.
- It is especially suitable for inference on CPUs and for use on edge devices. It can also be inferred using a web browser.
- As of 2024, many typical PC users use PCs that do not have NVIDIA GPUs or NPUs.
  - The ONNX format has excellent inference performance on the CPU, allowing you to perform speech synthesis comfortably without a GPU or NPU.
  - Additionally, the ONNX format supports DirectML inference, and high-speed inference is possible on Windows with AMD Radeon/Intel Arc GPUs.
- AivisSpeech, which is also a reference implementation of AIVM specifications, supports AIVMX files only.
  - This is to eliminate PyTorch dependencies, reduce installation size, and at the same time increase CPU inference.

Q. Can I load AIVM/AIVMX files with existing tools?

A. Yes, it is possible.

AIVM is designed as an extension of Safetensors format, while AIVMX is designed as an extension of ONNX format, so each can be read as a regular Safetensors file or ONNX file.
AIVM metadata is stored in the metadata area defined by the existing model format specifications and does not affect the behavior of existing tools.

Q. How do I convert an existing AI speech synthesis model to AIVM/AIVMX?

A. There are two ways:

AIVM Generator (recommended) : Easily generate and edit AIVM/AIVMX files using the GUI on your browser.
aivmlib : You can use the CLI tool provided by this library to generate AIVM/AIVMX files with minimal metadata from the command line.
- Since only the minimum amount of metadata converted from hyperparameters is set, you will need to edit the metadata separately when actually distributing it.

Note that the model from which you are converting it must be saved in a single Safetensor or ONNX format.

Q. How is the version control of the AIVM manifest?

A. AIVM manifest version control is carried out under the following policies:

Minor version upgrade (ex: 1.0 -> 1.1) : Changes that are backward compatible, such as adding new fields
Major version upgrade (ex: 1.1 -> 2.0) : Incompatible changes such as deleting existing fields or changing structures

Currently, 1.0 is the latest.

Q. What is the difference between aivmlib and aivmlib-web?

A. aivmlib and aivmlib-web are libraries that implement the same AIVM specification for different languages/operating environments.

aivmlib : Python implementation. It is intended for use on the desktop application or server side.
- In cases where you operate on a high-fire server with an NVIDIA GPU, the AIVM (Safetensors) format may be faster than the AIVMX (ONNX) format, depending on the model architecture and inference environment.
- aivmlib is also a reference implementation of aivmlib-web. When implementing the new specifications, first implement them in aivmlib and then port them to aivmlib-web.
aivmlib-web : TypeScript implementation. It is intended for use on a web browser.
- It is designed and developed with the premise that it will be used in AIVM Generator or services that perform speech synthesis on a web browser.
- It can handle both AIVM and AIVMX files (mainly for AIVM Generator).
  - Since the model formats that can be inferred using a web browser are basically limited to the ONNX format, in practice, most cases involve only AIVMX files.
- Although there are differences in implementation depending on the characteristics of the web browser, such as BinaryIO in Python becomes File (Blob) of the JavaScript Web API, the basic API design is the same as aivmlib.
  - There are no plans to support server-side JavaScript environments such as Node.js and Deno.

Tip

At this time, there are no officially maintained AIVM specification libraries besides aivmlib/aivmlib-web.
There is a possibility that third-party libraries for other languages will appear in the future.

Important

When adding support for new model architectures, you must add implementations to both aivmlib and aivmlib-web.
Because AIVM Generator uses aivmlib-web, both libraries must be updated to provide new features to end users.

Q. How do I add support for a new model architecture?

A. The AIVM specification does not specify implementation details for model architectures, making it relatively easy to add new model architectures.

If only the metadata in the AIVM manifest can be supported : Simply send a pull request to add a new type (e.g. GPT-SoVITS2 ) to ModelArchitecture .
- In this case, add support for the new model architecture to generate_aivm_metadata() function at the same time.
If you need to add model architecture-specific metadata : create a specification that will create a new metadata key separate from the AIVM manifest, such as aivm_style_vectors field, and then submit a pull request.
- It is desirable to have specifications that can be supported by both aivmlib (Python) and aivmlib-web (TypeScript Web) as much as possible.
- If the model architecture does not support the ONNX format that can be inferred on the web, then the metadata required only during inference does not need to be compatible with aivmlib-web.
- The metadata added to the AIVM manifest must also be supported by aivmlib-web (see below).

Important

The submitted AIVM manifest specifications must be technically supported by both aivmlib (Python) and aivmlib-web (TypeScript Web).
Aivmlib-web is used inside AIVM Generator.
Once you have added support to aivmlib, add support to aivmlib-web as well.

Note

The AIVM manifest is designed to define only common metadata that is independent of the model architecture.
The implementation-specific hyperparameters should be stored in the aivm_hyper_parameters field.
We also accept addition of Pydantic schema definitions for hyperparameters. Currently, only the hyperparameter schema for Style-Bert-VITS2 architectures is defined.

Note

Of course, the source model to AIVM/AIVMX must be saved in a single Safetensors or ONNX format.
Therefore, model architectures spanning multiple model files are not supported.
Please consider how to combine model files into one or remove unnecessary model files.

Q. How should I write the license information?

A. License information is in Markdown or plain text, and set up a full copy of the license directly embed in an AIVM/AIVMX file.

The reasons for embedding the full license text rather than specifying a URL are as follows:

URL persistence cannot be guaranteed
I don't know the license name if it's just a URL
Custom license regulations are difficult
Software compatible with AIVM specifications must be able to directly display license information

Q. Are there any size limits for image and audio data?

A. Although there is no specific size limit, the model file itself is generally huge, so further file size increases due to metadata should be kept to a minimum.

Image file: 512x512 JPEG or PNG (JPEG is recommended)
Audio file: WAV (PCM 16bit) or M4A (AAC-LC) (M4A is recommended)

Tip

The reference implementation, AIVM Generator, follows these guidelines to ensure proper size optimization.

Q. Can I edit the metadata manually?

A. Manual editing is not recommended as metadata is embedded directly in the binary.
If you are an end user, please use AIVM Generator.

Tip

Developers can write their own applications using aivmlib/aivmlib-web.
The aivmlib CLI only provides the ability to generate AIVM/AIVMX files with minimal metadata and to verify the metadata.

Expand