vector quantize pytorchダウンロード - vector quantize pytorchソースコードのダウンロード

ベクトル量子化-Pytorch

もともとDeepMindのTensorFlowの実装から転写されたベクトル量子化ライブラリは、パッケージに便利に作成されました。指数の移動平均を使用して、辞書を更新します。

VQは、DeepMindとOpenAIによって高品質の画像（VQ-VAE-2）と音楽（Jukebox）に成功裏に使用されています。

インストール

$ pip install vector-quantize-pytorch

使用法

 import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize (
    dim = 256 ,
    codebook_size = 512 ,     # codebook size
    decay = 0.8 ,             # the exponential moving average decay, lower means the dictionary will change faster
    commitment_weight = 1.   # the weight on the commitment loss
)

x = torch . randn ( 1 , 1024 , 256 )
quantized , indices , commit_loss = vq ( x ) # (1, 1024, 256), (1, 1024), (1)

残留VQ

このホワイトペーパーでは、複数のベクトルQuantizerを使用して、波形の残差を再帰的に量子化することを提案しています。これをResidualVQクラスと1つの追加の初期化パラメーターで使用できます。

 import torch
from vector_quantize_pytorch import ResidualVQ

residual_vq = ResidualVQ (
    dim = 256 ,
    num_quantizers = 8 ,      # specify number of quantizers
    codebook_size = 1024 ,    # codebook size
)

x = torch . randn ( 1 , 1024 , 256 )

quantized , indices , commit_loss = residual_vq ( x )
print ( quantized . shape , indices . shape , commit_loss . shape )
# (1, 1024, 256), (1, 1024, 8), (1, 8)

# if you need all the codes across the quantization layers, just pass return_all_codes = True

quantized , indices , commit_loss , all_codes = residual_vq ( x , return_all_codes = True )

# (8, 1, 1024, 256)

さらに、このペーパーでは、残留VQを使用してRQ-VAEを構築し、より圧縮コードを使用して高解像度画像を生成します。

彼らは2つの変更を加えます。 1つ目は、すべてのQuantizersでコードブックを共有することです。 2つ目は、常に最も近い試合を行うのではなく、コードをだましてサンプリングすることです。これらの機能の両方を2つの追加のキーワード引数で使用できます。

 import torch
from vector_quantize_pytorch import ResidualVQ

residual_vq = ResidualVQ (
    dim = 256 ,
    num_quantizers = 8 ,
    codebook_size = 1024 ,
    stochastic_sample_codes = True ,
    sample_codebook_temp = 0.1 ,         # temperature for stochastically sampling codes, 0 would be equivalent to non-stochastic
    shared_codebook = True              # whether to share the codebooks for all quantizers or not
)

x = torch . randn ( 1 , 1024 , 256 )
quantized , indices , commit_loss = residual_vq ( x )

# (1, 1024, 256), (1, 1024, 8), (1, 8)

最近の論文では、特徴ディメンションのグループに残留VQを実行することを提案しており、コードブックをはるかに少なく使用しながら、Encodecに同等の結果を示しています。 GroupedResidualVQをインポートすることで使用できます

 import torch
from vector_quantize_pytorch import GroupedResidualVQ

residual_vq = GroupedResidualVQ (
    dim = 256 ,
    num_quantizers = 8 ,      # specify number of quantizers
    groups = 2 ,
    codebook_size = 1024 ,    # codebook size
)

x = torch . randn ( 1 , 1024 , 256 )

quantized , indices , commit_loss = residual_vq ( x )

# (1, 1024, 256), (2, 1, 1024, 8), (2, 1, 8)

初期化

SoundStream Paperは、コードブックを最初のバッチのKmeans Centroidsによって初期化する必要があることを提案しています。 1つのフラグkmeans_init = True ResidualVQ簡単にこの機能をオンにすることができますVectorQuantize

 import torch
from vector_quantize_pytorch import ResidualVQ

residual_vq = ResidualVQ (
    dim = 256 ,
    codebook_size = 256 ,
    num_quantizers = 4 ,
    kmeans_init = True ,   # set to True
    kmeans_iters = 10     # number of kmeans iterations to calculate the centroids for the codebook on init
)

x = torch . randn ( 1 , 1024 , 256 )
quantized , indices , commit_loss = residual_vq ( x )

# (1, 1024, 256), (1, 1024, 4), (1, 4)

勾配計算

VQ-VAEは、従来、ストレートスルー推定器（STE）で訓練されています。後方パス中、勾配はVQレイヤーを通過するのではなく、VQレイヤーの周りに流れます。回転トリックペーパーでは、VQレイヤーを介して勾配を変換することを提案しているため、入力ベクトルと量子化された出力の間の相対角度と大きさが勾配にエンコードされます。 VectorQuantizeクラスでrotation_trick=True/Falseでこの機能を有効または無効にできます。

 from vector_quantize_pytorch import VectorQuantize

vq_layer = VectorQuantize (
    dim = 256 ,
    codebook_size = 256 ,
    rotation_trick = True ,   # Set to False to use the STE gradient estimator or True to use the rotation trick.
)

コードブックの使用の増加

このリポジトリには、さまざまな論文からのいくつかの手法が含まれています。「Dead」コードブックエントリと戦うために、ベクター量子を使用する際の一般的な問題です。

コードブックの寸法が低い

改良されたVQGANペーパーは、コードブックをより低い次元に保持することを提案しています。エンコーダー値は、量子化後に高次元に戻る前に投影されます。これをcodebook_dim hyperparameterで設定できます。

 import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize (
    dim = 256 ,
    codebook_size = 256 ,
    codebook_dim = 16      # paper proposes setting this to 32 or as low as 8 to increase codebook usage
)

x = torch . randn ( 1 , 1024 , 256 )
quantized , indices , commit_loss = vq ( x )

# (1, 1024, 256), (1, 1024), (1,)

コサインの類似性

改良されたVQGANペーパーは、L2がコードとエンコードされたベクトルを正規化することも提案しています。彼らは、ベクトルを球体に施行することで、コードの使用とダウンストリーム再構成の改善につながると主張しています。 use_cosine_sim = Trueを設定することで、これをオンにすることができます

 import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize (
    dim = 256 ,
    codebook_size = 256 ,
    use_cosine_sim = True   # set this to True
)

x = torch . randn ( 1 , 1024 , 256 )
quantized , indices , commit_loss = vq ( x )

# (1, 1024, 256), (1, 1024), (1,)

古いコードの期限切れ

最後に、SoundStream Paperには、現在のバッチからランダムに選択されたベクトルを使用して、特定のしきい値以下のコードを置き換えるスキームがあります。しきい値をthreshold_ema_dead_codeキーワードで設定できます。

 import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize (
    dim = 256 ,
    codebook_size = 512 ,
    threshold_ema_dead_code = 2  # should actively replace any codes that have an exponential moving average cluster size less than 2
)

x = torch . randn ( 1 , 1024 , 256 )
quantized , indices , commit_loss = vq ( x )

# (1, 1024, 256), (1, 1024), (1,)

直交正規化損失

VQ-Vae / VQ-Ganはすぐに人気を博しています。最近の論文では、画像にベクトル量子化を使用すると、コードブックを直交するように施行すると、離散化されたコードの翻訳等語がつながり、下流テキストの画像生成タスクの大幅な改善につながることが提案されています。

この機能は、 orthogonal_reg_weight 0より大きく設定するだけで使用できます。その場合、直交の正則化はモジュールによって出力された補助損失に追加されます。

 import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize (
    dim = 256 ,
    codebook_size = 256 ,
    accept_image_fmap = True ,                   # set this true to be able to pass in an image feature map
    orthogonal_reg_weight = 10 ,                 # in paper, they recommended a value of 10
    orthogonal_reg_max_codes = 128 ,             # this would randomly sample from the codebook for the orthogonal regularization loss, for limiting memory usage
    orthogonal_reg_active_codes_only = False    # set this to True if you have a very large codebook, and would only like to enforce the loss on the activated codes per batch
)

img_fmap = torch . randn ( 1 , 256 , 32 , 32 )
quantized , indices , loss = vq ( img_fmap ) # (1, 256, 32, 32), (1, 32, 32), (1,)

# loss now contains the orthogonal regularization loss with the weight as assigned

多目的VQ

多目的アプローチ（機能あたりの複数のコード）を使用して、離散潜在表現のバリエーションを提案する多くの論文があります。私は、同じコードブックが入力ディメンションのheadタイム全体で量子化するために同じコードブックを使用している1つのバリアントを提供することにしました。

NWTペーパーからより実績のあるアプローチ（メンバー）を使用することもできます

 import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize (
    dim = 256 ,
    codebook_dim = 32 ,                  # a number of papers have shown smaller codebook dimension to be acceptable
    heads = 8 ,                          # number of heads to vector quantize, codebook shared across all heads
    separate_codebook_per_head = True ,  # whether to have a separate codebook per head. False would mean 1 shared codebook
    codebook_size = 8196 ,
    accept_image_fmap = True
)

img_fmap = torch . randn ( 1 , 256 , 32 , 32 )
quantized , indices , loss = vq ( img_fmap )

# (1, 256, 32, 32), (1, 32, 32, 8), (1,)

ランダム投影量子化器

この論文は、最初にマスクされた音声モデリングにランダム投影量子化器を使用することを提案しました。ここでは、信号がランダムに初期化されたマトリックスで投影され、次にランダムな初期化されたコードブックと一致します。したがって、1つは量子化器を学ぶ必要はありません。この手法は、GoogleのUniversal Speech Modelによって使用され、音声からテキストモデリングのSOTAを実現しました。

USMはさらに、複数のコードブックと、マルチソフトマックス目標を持つマスクされた音声モデリングを使用することを提案しています。 num_codebooks 1より大きく設定することで、これを簡単に行うことができます

 import torch
from vector_quantize_pytorch import RandomProjectionQuantizer

quantizer = RandomProjectionQuantizer (
    dim = 512 ,               # input dimensions
    num_codebooks = 16 ,      # in USM, they used up to 16 for 5% gain
    codebook_dim = 256 ,      # codebook dimension
    codebook_size = 1024     # codebook size
)

x = torch . randn ( 1 , 1024 , 512 )
indices = quantizer ( x )

# (1, 1024, 16)

このリポジトリは、マルチプロセス設定でコードブックを自動的に同期する必要があります。どういうわけかそうでない場合は、問題を開いてください。 sync_codebook = True | Falseを設定して、コードブックを同期するかどうかをオーバーライドできます。 sync_codebook = True | False

SIM VQ

新しいICLR 2025ペーパーでは、コードブックが凍結され、コードが線形投影によって暗黙的に生成されるスキームを提案します。著者は、このセットアップにより、コードブックの崩壊が少なくなり、収束が容易になると主張しています。 Fifty et al。の回転トリックとペアになり、線形投影を小さな1層MLPに拡張すると、これがさらに優れていることがわかりました。そのように実験することができます

更新：混合結果を聞く

 import torch
from vector_quantize_pytorch import SimVQ

sim_vq = SimVQ (
    dim = 512 ,
    codebook_size = 1024 ,
    rotation_trick = True  # use rotation trick from Fifty et al.
)

x = torch . randn ( 1 , 1024 , 512 )
quantized , indices , commit_loss = sim_vq ( x )

assert x . shape == quantized . shape
assert torch . allclose ( quantized , sim_vq . indices_to_codes ( indices ), atol = 1e-6 )

残留フレーバーの場合は、代わりにResidualSimVQをインポートしてください

 import torch
from vector_quantize_pytorch import ResidualSimVQ

residual_sim_vq = ResidualSimVQ (
    dim = 512 ,
    num_quantizers = 4 ,
    codebook_size = 1024 ,
    rotation_trick = True  # use rotation trick from Fifty et al.
)

x = torch . randn ( 1 , 1024 , 512 )
quantized , indices , commit_loss = residual_sim_vq ( x )

assert x . shape == quantized . shape
assert torch . allclose ( quantized , residual_sim_vq . get_output_from_indices ( indices ), atol = 1e-6 )

有限スカラー量子化

	VQ	FSQ
量子化	argmin_c \|\| zc \|\|	ラウンド（f（z））
勾配	ストレートスルー推定（STE）	Ste
補助損失	コミットメント、コードブック、エントロピー損失、...	n/a
トリック	コードブックのEMA、コードブックの分割、投影、...	n/a
パラメーター	コードブック	n/a

Google DeepMindのこの作品は、生成モデリング、コミットメント損失の必要性、コードブックのEMA更新、およびコードブックの崩壊または不十分な使用率の問題に取り組むために、ベクターの量子化が行われる方法を大幅に簡素化することを目指しています。それらは、各スカラーを勾配を介してまっすぐに離散レベルに丸めます。コードはハイパーキューブの均一なポイントになります。

記録的な時期にこの実装を移植してくれた@sekstiniに感謝します！

 import torch
from vector_quantize_pytorch import FSQ

quantizer = FSQ (
    levels = [ 8 , 5 , 5 , 5 ]
)

x = torch . randn ( 1 , 1024 , 4 ) # 4 since there are 4 levels
xhat , indices = quantizer ( x )

# (1, 1024, 4), (1, 1024)

assert torch . all ( xhat == quantizer . indices_to_codes ( indices ))

オーディオエンコーディングを改善しようとする試みのための即興の残留FSQ。

クレジットは、ここにアイデアを元々受け入れてくれたために@sekstiniに送られます

 import torch
from vector_quantize_pytorch import ResidualFSQ

residual_fsq = ResidualFSQ (
    dim = 256 ,
    levels = [ 8 , 5 , 5 , 3 ],
    num_quantizers = 8
)

x = torch . randn ( 1 , 1024 , 256 )

residual_fsq . eval ()

quantized , indices = residual_fsq ( x )

# (1, 1024, 256), (1, 1024, 8)

quantized_out = residual_fsq . get_output_from_indices ( indices )

# (1, 1024, 256)

assert torch . all ( quantized == quantized_out )

ルックアップフリー量子化

Magvitの背後にある研究チームは、生成ビデオモデリングの新しいSOTA結果をリリースしました。 V1とV2の間のコアの変化には、新しいタイプの量子化、ルックアップフリー量子化（LFQ）が含まれ、コードブックを排除し、外観を完全に埋め込むことが含まれます。

このペーパーでは、独立したバイナリ潜在物を使用するという簡単なLFQ量子化装置を紹介します。 LFQのその他の実装が存在します。ただし、チームは、LFQを搭載したMAGVIT-V2がImagenetベンチマークで大幅に改善することを示しています。 LFQと2レベルのFSQの違いには、エントロピーの正則化と維持されたコミットメント損失が含まれます。

コードブックルックアップなしでLFQ量子化のより高度な方法を開発すると、生成モデリングに革命をもたらす可能性があります。

次のように使用できます。 Magvit2 PytorchポートでDogfoodedになります

 import torch
from vector_quantize_pytorch import LFQ

# you can specify either dim or codebook_size
# if both specified, will be validated against each other

quantizer = LFQ (
    codebook_size = 65536 ,      # codebook size, must be a power of 2
    dim = 16 ,                   # this is the input feature dimension, defaults to log2(codebook_size) if not defined
    entropy_loss_weight = 0.1 ,  # how much weight to place on entropy loss
    diversity_gamma = 1.        # within entropy loss, how much weight to give to diversity of codes, taken from https://arxiv.org/abs/1911.05894
)

image_feats = torch . randn ( 1 , 16 , 32 , 32 )

quantized , indices , entropy_aux_loss = quantizer ( image_feats , inv_temperature = 100. )  # you may want to experiment with temperature

# (1, 16, 32, 32), (1, 32, 32), ()

assert ( quantized == quantizer . indices_to_codes ( indices )). all ()

また、ビデオ機能を(batch, feat, time, height, width)またはシーケンスとして渡すこともできます(batch, seq, feat)

 import torch
from vector_quantize_pytorch import LFQ

quantizer = LFQ (
    codebook_size = 65536 ,
    dim = 16 ,
    entropy_loss_weight = 0.1 ,
    diversity_gamma = 1.
)

seq = torch . randn ( 1 , 32 , 16 )
quantized , * _ = quantizer ( seq )

assert seq . shape == quantized . shape

video_feats = torch . randn ( 1 , 16 , 10 , 32 , 32 )
quantized , * _ = quantizer ( video_feats )

assert video_feats . shape == quantized . shape

または複数のコードブックをサポートします

 import torch
from vector_quantize_pytorch import LFQ

quantizer = LFQ (
    codebook_size = 4096 ,
    dim = 16 ,
    num_codebooks = 4  # 4 codebooks, total codebook dimension is log2(4096) * 4
)

image_feats = torch . randn ( 1 , 16 , 32 , 32 )

quantized , indices , entropy_aux_loss = quantizer ( image_feats )

# (1, 16, 32, 32), (1, 32, 32, 4), ()

assert image_feats . shape == quantized . shape
assert ( quantized == quantizer . indices_to_codes ( indices )). all ()

オーディオ圧縮の改善につながるかどうかを確認するために、即興の残留LFQ。

 import torch
from vector_quantize_pytorch import ResidualLFQ

residual_lfq = ResidualLFQ (
    dim = 256 ,
    codebook_size = 256 ,
    num_quantizers = 8
)

x = torch . randn ( 1 , 1024 , 256 )

residual_lfq . eval ()

quantized , indices , commit_loss = residual_lfq ( x )

# (1, 1024, 256), (1, 1024, 8), (8)

quantized_out = residual_lfq . get_output_from_indices ( indices )

# (1, 1024, 256)

assert torch . all ( quantized == quantized_out )

潜在量子化

解釈は、解釈性、一般化、学習の改善、堅牢性を促進するため、表現学習に不可欠です。データの意味のある独立した機能をキャプチャし、さまざまなアプリケーションで学習した表現のより効果的な使用を促進することを目的としています。より良い解体のために、課題は、明示的なグラウンドトゥルース情報なしで、データセットの根底にあるバリエーションを解くことです。この作品は、組織化された潜在空間内でのエンコードとデコードを目的とした重要な帰納的バイアスを導入します。組み込まれた戦略には、各ディメンションごとに個々の学習可能なスカラーコードブックを使用して離散コードベクトルを割り当てることにより、潜在スペースを離散化することが含まれます。この方法論により、モデルは堅牢な以前の方法を効果的に上回ることができます。

この論文の結果には、非常に高い重量減衰を使用する必要があることに注意してください。

 import torch
from vector_quantize_pytorch import LatentQuantize

# you can specify either dim or codebook_size
# if both specified, will be validated against each other

quantizer = LatentQuantize (
    levels = [ 5 , 5 , 8 ],      # number of levels per codebook dimension
    dim = 16 ,                   # input dim
    commitment_loss_weight = 0.1 ,  
    quantization_loss_weight = 0.1 ,
)

image_feats = torch . randn ( 1 , 16 , 32 , 32 )

quantized , indices , loss = quantizer ( image_feats )

# (1, 16, 32, 32), (1, 32, 32), ()

assert image_feats . shape == quantized . shape
assert ( quantized == quantizer . indices_to_codes ( indices )). all ()

また、ビデオ機能を(batch, feat, time, height, width)またはシーケンスとして渡すこともできます(batch, seq, feat)

 import torch
from vector_quantize_pytorch import LatentQuantize

quantizer = LatentQuantize (
    levels = [ 5 , 5 , 8 ],
    dim = 16 ,
    commitment_loss_weight = 0.1 ,  
    quantization_loss_weight = 0.1 ,
)

seq = torch . randn ( 1 , 32 , 16 )
quantized , * _ = quantizer ( seq )

# (1, 32, 16)

video_feats = torch . randn ( 1 , 16 , 10 , 32 , 32 )
quantized , * _ = quantizer ( video_feats )

# (1, 16, 10, 32, 32)

または複数のコードブックをサポートします

 import torch
from vector_quantize_pytorch import LatentQuantize

model = LatentQuantize (
    levels = [ 4 , 8 , 16 ],
    dim = 9 ,
    num_codebooks = 3
)

input_tensor = torch . randn ( 2 , 3 , dim )
output_tensor , indices , loss = model ( input_tensor )

# (2, 3, 9), (2, 3, 3), ()

assert output_tensor . shape == input_tensor . shape
assert indices . shape == ( 2 , 3 , num_codebooks )
assert loss . item () >= 0

引用

 @misc { oord2018neural ,
    title   = { Neural Discrete Representation Learning } ,
    author  = { Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu } ,
    year    = { 2018 } ,
    eprint  = { 1711.00937 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.LG }
}

 @misc { zeghidour2021soundstream ,
    title   = { SoundStream: An End-to-End Neural Audio Codec } ,
    author  = { Neil Zeghidour and Alejandro Luebs and Ahmed Omran and Jan Skoglund and Marco Tagliasacchi } ,
    year    = { 2021 } ,
    eprint  = { 2107.03312 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.SD }
}

 @inproceedings { anonymous2022vectorquantized ,
    title   = { Vector-quantized Image Modeling with Improved {VQGAN} } ,
    author  = { Anonymous } ,
    booktitle = { Submitted to The Tenth International Conference on Learning Representations } ,
    year    = { 2022 } ,
    url     = { https://openreview.net/forum?id=pfNyExj7z2 } ,
    note    = { under review }
}

 @inproceedings { lee2022autoregressive ,
    title   = { Autoregressive Image Generation using Residual Quantization } ,
    author  = { Lee, Doyup and Kim, Chiheon and Kim, Saehoon and Cho, Minsu and Han, Wook-Shin } ,
    booktitle = { Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition } ,
    pages   = { 11523--11532 } ,
    year    = { 2022 }
}

 @article { Defossez2022HighFN ,
    title   = { High Fidelity Neural Audio Compression } ,
    author  = { Alexandre D'efossez and Jade Copet and Gabriel Synnaeve and Yossi Adi } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2210.13438 }
}

 @inproceedings { Chiu2022SelfsupervisedLW ,
    title   = { Self-supervised Learning with Random-projection Quantizer for Speech Recognition } ,
    author  = { Chung-Cheng Chiu and James Qin and Yu Zhang and Jiahui Yu and Yonghui Wu } ,
    booktitle = { International Conference on Machine Learning } ,
    year    = { 2022 }
}

 @inproceedings { Zhang2023GoogleUS ,
    title   = { Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages } ,
    author  = { Yu Zhang and Wei Han and James Qin and Yongqiang Wang and Ankur Bapna and Zhehuai Chen and Nanxin Chen and Bo Li and Vera Axelrod and Gary Wang and Zhong Meng and Ke Hu and Andrew Rosenberg and Rohit Prabhavalkar and Daniel S. Park and Parisa Haghani and Jason Riesa and Ginger Perng and Hagen Soltau and Trevor Strohman and Bhuvana Ramabhadran and Tara N. Sainath and Pedro J. Moreno and Chung-Cheng Chiu and Johan Schalkwyk and Franccoise Beaufays and Yonghui Wu } ,
    year    = { 2023 }
}

 @inproceedings { Shen2023NaturalSpeech2L ,
    title   = { NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers } ,
    author  = { Kai Shen and Zeqian Ju and Xu Tan and Yanqing Liu and Yichong Leng and Lei He and Tao Qin and Sheng Zhao and Jiang Bian } ,
    year    = { 2023 }
}

 @inproceedings { Yang2023HiFiCodecGV ,
    title   = { HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec } ,
    author  = { Dongchao Yang and Songxiang Liu and Rongjie Huang and Jinchuan Tian and Chao Weng and Yuexian Zou } ,
    year    = { 2023 }
}

 @inproceedings { huh2023improvedvqste ,
    title   = { Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks } ,
    author  = { Huh, Minyoung and Cheung, Brian and Agrawal, Pulkit and Isola, Phillip } ,
    booktitle = { International Conference on Machine Learning } ,
    year    = { 2023 } ,
    organization = { PMLR }
}

 @inproceedings { rogozhnikov2022einops ,
    title   = { Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation } ,
    author  = { Alex Rogozhnikov } ,
    booktitle = { International Conference on Learning Representations } ,
    year    = { 2022 } ,
    url     = { https://openreview.net/forum?id=oapKSVM2bcj }
}

 @misc { shin2021translationequivariant ,
    title   = { Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation } ,
    author  = { Woncheol Shin and Gyubok Lee and Jiyoung Lee and Joonseok Lee and Edward Choi } ,
    year    = { 2021 } ,
    eprint  = { 2112.00384 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @misc { mentzer2023finite ,
    title   = { Finite Scalar Quantization: VQ-VAE Made Simple } ,
    author  = { Fabian Mentzer and David Minnen and Eirikur Agustsson and Michael Tschannen } ,
    year    = { 2023 } ,
    eprint  = { 2309.15505 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @misc { yu2023language ,
    title   = { Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation } ,
    author  = { Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang } ,
    year    = { 2023 } ,
    eprint  = { 2310.05737 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @inproceedings { Zhao2024ImageAV ,
  title     = { Image and Video Tokenization with Binary Spherical Quantization } ,
  author    = { Yue Zhao and Yuanjun Xiong and Philipp Krahenbuhl } ,
  year      = { 2024 } ,
  url       = { https://api.semanticscholar.org/CorpusID:270380237 }
}

 @misc { hsu2023disentanglement ,
    title   = { Disentanglement via Latent Quantization } , 
    author  = { Kyle Hsu and Will Dorrell and James C. R. Whittington and Jiajun Wu and Chelsea Finn } ,
    year    = { 2023 } ,
    eprint  = { 2305.18378 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.LG }
}

 @inproceedings { Irie2023SelfOrganisingND ,
    title   = { Self-Organising Neural Discrete Representation Learning `a la Kohonen } ,
    author  = { Kazuki Irie and R'obert Csord'as and J{"u}rgen Schmidhuber } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:256901024 }
}

 @article { Huijben2024ResidualQW ,
    title   = { Residual Quantization with Implicit Neural Codebooks } ,
    author  = { Iris Huijben and Matthijs Douze and Matthew Muckley and Ruud van Sloun and Jakob Verbeek } ,
    journal = { ArXiv } ,
    year    = { 2024 } ,
    volume  = { abs/2401.14732 } ,
    url     = { https://api.semanticscholar.org/CorpusID:267301189 }
}

 @article { Fifty2024Restructuring ,
    title   = { Restructuring Vector Quantization with the Rotation Trick } ,
    author  = { Christopher Fifty, Ronald G. Junkins, Dennis Duan, Aniketh Iyengar, Jerry W. Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré } ,
    journal = { ArXiv } ,
    year    = { 2024 } ,
    volume  = { abs/2410.06424 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273229218 }
}

 @inproceedings { Zhu2024AddressingRC ,
    title   = { Addressing Representation Collapse in Vector Quantized Models with One Linear Layer } ,
    author  = { Yongxin Zhu and Bocheng Li and Yifei Xin and Linli Xu } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273812459 }
}