xlstm下載xlstm源代碼下載

xlstm

其他源碼

1.0.0

下載

XLSTM：延長長期的短期內存

論文：https：//arxiv.org/abs/2405.04517

關於

XLSTM是基於原始LSTM的思想的新的經常性神經網絡體系結構。通過具有適當的歸一化和穩定技術的指數門控和新的矩陣內存，它克服了原始LSTM的局限性，並且與變形金剛或狀態空間模型相比，在語言建模方面表現出了有希望的性能。

XLSTM大7b

我們訓練了7B參數XLSTM語言模型

我們已經在訓練吞吐量和穩定性方面優化了XLSTM體系結構。更新的體系結構的代碼位於xlstm/xlstm_large中。

型號的權重可以在https://huggingface.co/nx-ai/xlstm-7b上的huggingface上找到。

最小安裝

從文件environment_pt220cu121.yaml創建一個conda環境。僅將模型代碼（即模塊xlstm ）安裝為程序包：

通過PIP安裝：

pip install xlstm

來自Github的克隆：

git clone https://github.com/NX-AI/xlstm.git
cd xlstm
pip install -e .

用於使用7B XLSTM型號安裝mlstm_kernels通過：

 pip install mlstm_kernels

要求

該軟件包基於Pytorch，並對版本進行了測試>=1.8 。對於SLSTM的CUDA版本，您需要計算能力> = 8.0，請參見https：//developer.nvidia.com/cuda-gpus。對於經過良好測試的環境，請安裝environment_pt220cu121.yaml AS：

conda env create -n xlstm -f environment_pt220cu121.yaml
conda activate xlstm

對於XLSTM大型7B型號，我們需要mlstm_kernels （TODO ADD GITHUB鏈接）軟件包，該軟件包為XLSTM提供快速內核。

XLSTM紙的模型

本節說明瞭如何使用XLSTM紙中的模型。

用法

對於非語言應用程序或集成在其他體系結構中，您可以使用xLSTMBlockStack以及語言建模或其他基於令牌的應用程序，可以使用xLSTMLMModel 。

XLSTM塊堆棧

xLSTMBLockStack用於在現有項目中用作替代骨幹。它類似於一堆變壓器塊，但使用XLSTM塊：

 import torch

from xlstm import (
    xLSTMBlockStack ,
    xLSTMBlockStackConfig ,
    mLSTMBlockConfig ,
    mLSTMLayerConfig ,
    sLSTMBlockConfig ,
    sLSTMLayerConfig ,
    FeedForwardConfig ,
)

cfg = xLSTMBlockStackConfig (
    mlstm_block = mLSTMBlockConfig (
        mlstm = mLSTMLayerConfig (
            conv1d_kernel_size = 4 , qkv_proj_blocksize = 4 , num_heads = 4
        )
    ),
    slstm_block = sLSTMBlockConfig (
        slstm = sLSTMLayerConfig (
            backend = "cuda" ,
            num_heads = 4 ,
            conv1d_kernel_size = 4 ,
            bias_init = "powerlaw_blockdependent" ,
        ),
        feedforward = FeedForwardConfig ( proj_factor = 1.3 , act_fn = "gelu" ),
    ),
    context_length = 256 ,
    num_blocks = 7 ,
    embedding_dim = 128 ,
    slstm_at = [ 1 ],

)

xlstm_stack = xLSTMBlockStack ( cfg )

x = torch . randn ( 4 , 256 , 128 ). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape == x . shape

如果您正在使用YAML字符串 /文件進行配置，則還可以使用dacite創建配置數據級別。這與上面的摘要相同：

 from omegaconf import OmegaConf
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMBlockStack , xLSTMBlockStackConfig

xlstm_cfg = """ 
mlstm_block:
  mlstm:
    conv1d_kernel_size: 4
    qkv_proj_blocksize: 4
    num_heads: 4
slstm_block:
  slstm:
    backend: cuda
    num_heads: 4
    conv1d_kernel_size: 4
    bias_init: powerlaw_blockdependent
  feedforward:
    proj_factor: 1.3
    act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf . create ( xlstm_cfg )
cfg = from_dict ( data_class = xLSTMBlockStackConfig , data = OmegaConf . to_container ( cfg ), config = DaciteConfig ( strict = True ))
xlstm_stack = xLSTMBlockStack ( cfg )

x = torch . randn ( 4 , 256 , 128 ). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape == x . shape

XLSTM語言模型

xLSTMLMModel是xLSTMBlockStack周圍的包裝器，可添加令牌嵌入和LM頭。

 from omegaconf import OmegaConf
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMLMModel , xLSTMLMModelConfig

xlstm_cfg = """ 
vocab_size: 50304
mlstm_block:
  mlstm:
    conv1d_kernel_size: 4
    qkv_proj_blocksize: 4
    num_heads: 4
slstm_block:
  slstm:
    backend: cuda
    num_heads: 4
    conv1d_kernel_size: 4
    bias_init: powerlaw_blockdependent
  feedforward:
    proj_factor: 1.3
    act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf . create ( xlstm_cfg )
cfg = from_dict ( data_class = xLSTMLMModelConfig , data = OmegaConf . to_container ( cfg ), config = DaciteConfig ( strict = True ))
xlstm_stack = xLSTMLMModel ( cfg )

x = torch . randint ( 0 , 50304 , size = ( 4 , 256 )). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape [ 1 :] == ( 256 , 50304 )

實驗

合成實驗表明SLSTM比MLSTM的好處是奇偶校驗任務和多Query Associative召回任務。奇偶校任務只能通過SLSTM的內存混音提供的狀態跟踪功能來解決。多Query關聯召回任務衡量記憶能力，其中MLSTM的矩陣記憶和狀態擴展非常有益。結合起來，他們在這兩個任務上都表現良好。

要運行每個人，請在實驗文件夾中運行main.py ，例如：

 python experiments/main.py --config experiments/parity_xLSTM01.yaml   # xLSTM[0:1], sLSTM only
python experiments/main.py --config experiments/parity_xLSTM10.yaml   # xLSTM[1:0], mLSTM only
python experiments/main.py --config experiments/parity_xLSTM11.yaml   # xLSTM[1:1], mLSTM and sLSTM

請注意，訓練環不包含早期停止或測試評估。

引用

如果您使用此代碼庫，或者以其他方式找到我們的作品有價值，請引用XLSTM紙：

 @inproceedings{beck:24xlstm,
      title={xLSTM: Extended Long Short-Term Memory}, 
      author={Maximilian Beck and Korbinian Pöppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter},
      booktitle = {Thirty-eighth Conference on Neural Information Processing Systems},
      year={2024},
      url={https://arxiv.org/abs/2405.04517}, 
}

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2025-04-17
大小 222.32KB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部