xlstmダウンロードxlstmソースコードのダウンロード

xlstm

その他のソースコード

1.0.0

ダウンロード

XLSTM：長期の長期記憶を拡張しました

論文：https：//arxiv.org/abs/2405.04517

について

XLSTMは、元のLSTMのアイデアに基づいた新しい再発性ニューラルネットワークアーキテクチャです。適切な正規化と安定化技術と新しいマトリックスメモリを備えた指数ゲーティングを通じて、元のLSTMの制限を克服し、トランスまたは状態空間モデルと比較した場合、言語モデリングの有望なパフォーマンスを示します。

XLSTM Large 7b

7BパラメーターXLSTM言語モデルをトレーニングしました

スループットと安定性のトレーニングの観点から、XLSTMアーキテクチャを最適化しました。更新されたアーキテクチャのコードはxlstm/xlstm_largeにあります。

モデルの重みは、https://huggingface.co/nx-ai/xlstm-7bのHuggingfaceで入手できます。

最小限のインストール

ファイルenvironment_pt220cu121.yamlからConda環境を作成します。モデルコードのみをインストールします（つまり、モジュールxlstm ）。

PIP経由でインストール：

pip install xlstm

Githubからのクローン：

git clone https://github.com/NX-AI/xlstm.git
cd xlstm
pip install -e .

7B XLSTMモデルを使用するには、 mlstm_kernelsを介してインストールします。

 pip install mlstm_kernels

要件

このパッケージはPytorchに基づいており、バージョン>=1.8でテストされました。 SLSTMのCUDAバージョンについては、コンピューティング機能> = 8.0が必要です。https：//developer.nvidia.com/cuda-gpusを参照してください。よくテストされた環境の場合は、 environment_pt220cu121.yamlをインストールします。

conda env create -n xlstm -f environment_pt220cu121.yaml
conda activate xlstm

XLSTM Large 7Bモデルには、XLSTMの高速カーネルを提供するmlstm_kernels （TODO Add GitHub Link）パッケージが必要です。

XLSTMペーパーのモデル

このセクションでは、XLSTMペーパーのモデルの使用方法について説明します。

使用法

非言語アプリケーションまたは他のアーキテクチャに統合する場合、 xLSTMBlockStackを使用できます。言語モデリングまたはその他のトークンベースのアプリケーションには、 xLSTMLMModel使用できます。

XLSTMブロックスタック

xLSTMBLockStackは、既存のプロジェクトで代替バックボーンとして使用することを目的としています。トランスブロックのスタックに似ていますが、XLSTMブロックを使用します。

 import torch

from xlstm import (
    xLSTMBlockStack ,
    xLSTMBlockStackConfig ,
    mLSTMBlockConfig ,
    mLSTMLayerConfig ,
    sLSTMBlockConfig ,
    sLSTMLayerConfig ,
    FeedForwardConfig ,
)

cfg = xLSTMBlockStackConfig (
    mlstm_block = mLSTMBlockConfig (
        mlstm = mLSTMLayerConfig (
            conv1d_kernel_size = 4 , qkv_proj_blocksize = 4 , num_heads = 4
        )
    ),
    slstm_block = sLSTMBlockConfig (
        slstm = sLSTMLayerConfig (
            backend = "cuda" ,
            num_heads = 4 ,
            conv1d_kernel_size = 4 ,
            bias_init = "powerlaw_blockdependent" ,
        ),
        feedforward = FeedForwardConfig ( proj_factor = 1.3 , act_fn = "gelu" ),
    ),
    context_length = 256 ,
    num_blocks = 7 ,
    embedding_dim = 128 ,
    slstm_at = [ 1 ],

)

xlstm_stack = xLSTMBlockStack ( cfg )

x = torch . randn ( 4 , 256 , 128 ). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape == x . shape

構成のためにYAML文字列 /ファイルを使用している場合は、DACITEを使用して構成データラスを作成することもできます。これは上記のスニペットと同じです：

 from omegaconf import OmegaConf
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMBlockStack , xLSTMBlockStackConfig

xlstm_cfg = """ 
mlstm_block:
  mlstm:
    conv1d_kernel_size: 4
    qkv_proj_blocksize: 4
    num_heads: 4
slstm_block:
  slstm:
    backend: cuda
    num_heads: 4
    conv1d_kernel_size: 4
    bias_init: powerlaw_blockdependent
  feedforward:
    proj_factor: 1.3
    act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf . create ( xlstm_cfg )
cfg = from_dict ( data_class = xLSTMBlockStackConfig , data = OmegaConf . to_container ( cfg ), config = DaciteConfig ( strict = True ))
xlstm_stack = xLSTMBlockStack ( cfg )

x = torch . randn ( 4 , 256 , 128 ). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape == x . shape

XLSTM言語モデル

xLSTMLMModel 、トークンの埋め込みとLMヘッドを追加するxLSTMBlockStackの周りのラッパーです。

 from omegaconf import OmegaConf
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMLMModel , xLSTMLMModelConfig

xlstm_cfg = """ 
vocab_size: 50304
mlstm_block:
  mlstm:
    conv1d_kernel_size: 4
    qkv_proj_blocksize: 4
    num_heads: 4
slstm_block:
  slstm:
    backend: cuda
    num_heads: 4
    conv1d_kernel_size: 4
    bias_init: powerlaw_blockdependent
  feedforward:
    proj_factor: 1.3
    act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf . create ( xlstm_cfg )
cfg = from_dict ( data_class = xLSTMLMModelConfig , data = OmegaConf . to_container ( cfg ), config = DaciteConfig ( strict = True ))
xlstm_stack = xLSTMLMModel ( cfg )

x = torch . randint ( 0 , 50304 , size = ( 4 , 256 )). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape [ 1 :] == ( 256 , 50304 )

実験

MLSTMを介したSLSTMの利点を示し、その逆の合成実験は、パリティタスクとマルチクエリ連想リコールタスクです。パリティタスクは、SLSTMのメモリミックスによって提供される状態追跡機能でのみ解決できます。マルチクエリ連想リコールタスクは、MLSTMのマトリックスメモリと状態の拡張が非常に有益である記憶能力を測定します。組み合わせて、それらは両方のタスクでうまく機能します。

それぞれを実行するには、次のような実験フォルダーでmain.pyを実行します。

 python experiments/main.py --config experiments/parity_xLSTM01.yaml   # xLSTM[0:1], sLSTM only
python experiments/main.py --config experiments/parity_xLSTM10.yaml   # xLSTM[1:0], mLSTM only
python experiments/main.py --config experiments/parity_xLSTM11.yaml   # xLSTM[1:1], mLSTM and sLSTM

トレーニングループには、早期停止またはテスト評価が含まれていないことに注意してください。

引用

このコードベースを使用する場合、または私たちの作品が価値のあるものを見つけた場合は、XLSTMペーパーを引用してください。

 @inproceedings{beck:24xlstm,
      title={xLSTM: Extended Long Short-Term Memory}, 
      author={Maximilian Beck and Korbinian Pöppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter},
      booktitle = {Thirty-eighth Conference on Neural Information Processing Systems},
      year={2024},
      url={https://arxiv.org/abs/2405.04517}, 
}

拡大する

追加情報