论文:https://arxiv.org/abs/2405.04517
XLSTM是基于原始LSTM的思想的新的经常性神经网络体系结构。通过具有适当的归一化和稳定技术的指数门控和新的矩阵内存,它克服了原始LSTM的局限性,并且与变形金刚或状态空间模型相比,在语言建模方面表现出了有希望的性能。
我们训练了7B参数XLSTM语言模型
我们已经在训练吞吐量和稳定性方面优化了XLSTM体系结构。更新的体系结构的代码位于xlstm/xlstm_large中。
型号的权重可以在https://huggingface.co/nx-ai/xlstm-7b上的huggingface上找到。
从文件environment_pt220cu121.yaml创建一个conda环境。仅将模型代码(即模块xlstm )安装为程序包:
通过PIP安装:
pip install xlstm来自Github的克隆:
git clone https://github.com/NX-AI/xlstm.git
cd xlstm
pip install -e .用于使用7B XLSTM型号安装mlstm_kernels通过:
pip install mlstm_kernels
该软件包基于Pytorch,并对版本进行了测试>=1.8 。对于SLSTM的CUDA版本,您需要计算能力> = 8.0,请参见https://developer.nvidia.com/cuda-gpus。对于经过良好测试的环境,请安装environment_pt220cu121.yaml AS:
conda env create -n xlstm -f environment_pt220cu121.yaml
conda activate xlstm对于XLSTM大型7B型号,我们需要mlstm_kernels (TODO ADD GITHUB链接)软件包,该软件包为XLSTM提供快速内核。
本节说明了如何使用XLSTM纸中的模型。
对于非语言应用程序或集成在其他体系结构中,您可以使用xLSTMBlockStack以及语言建模或其他基于令牌的应用程序,可以使用xLSTMLMModel 。
xLSTMBLockStack用于在现有项目中用作替代骨干。它类似于一堆变压器块,但使用XLSTM块:
import torch
from xlstm import (
xLSTMBlockStack ,
xLSTMBlockStackConfig ,
mLSTMBlockConfig ,
mLSTMLayerConfig ,
sLSTMBlockConfig ,
sLSTMLayerConfig ,
FeedForwardConfig ,
)
cfg = xLSTMBlockStackConfig (
mlstm_block = mLSTMBlockConfig (
mlstm = mLSTMLayerConfig (
conv1d_kernel_size = 4 , qkv_proj_blocksize = 4 , num_heads = 4
)
),
slstm_block = sLSTMBlockConfig (
slstm = sLSTMLayerConfig (
backend = "cuda" ,
num_heads = 4 ,
conv1d_kernel_size = 4 ,
bias_init = "powerlaw_blockdependent" ,
),
feedforward = FeedForwardConfig ( proj_factor = 1.3 , act_fn = "gelu" ),
),
context_length = 256 ,
num_blocks = 7 ,
embedding_dim = 128 ,
slstm_at = [ 1 ],
)
xlstm_stack = xLSTMBlockStack ( cfg )
x = torch . randn ( 4 , 256 , 128 ). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape == x . shape如果您正在使用YAML字符串 /文件进行配置,则还可以使用dacite创建配置数据级别。这与上面的摘要相同:
from omegaconf import OmegaConf
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMBlockStack , xLSTMBlockStackConfig
xlstm_cfg = """
mlstm_block:
mlstm:
conv1d_kernel_size: 4
qkv_proj_blocksize: 4
num_heads: 4
slstm_block:
slstm:
backend: cuda
num_heads: 4
conv1d_kernel_size: 4
bias_init: powerlaw_blockdependent
feedforward:
proj_factor: 1.3
act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf . create ( xlstm_cfg )
cfg = from_dict ( data_class = xLSTMBlockStackConfig , data = OmegaConf . to_container ( cfg ), config = DaciteConfig ( strict = True ))
xlstm_stack = xLSTMBlockStack ( cfg )
x = torch . randn ( 4 , 256 , 128 ). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape == x . shapexLSTMLMModel是xLSTMBlockStack周围的包装器,可添加令牌嵌入和LM头。
from omegaconf import OmegaConf
from dacite import from_dict
from dacite import Config as DaciteConfig
from xlstm import xLSTMLMModel , xLSTMLMModelConfig
xlstm_cfg = """
vocab_size: 50304
mlstm_block:
mlstm:
conv1d_kernel_size: 4
qkv_proj_blocksize: 4
num_heads: 4
slstm_block:
slstm:
backend: cuda
num_heads: 4
conv1d_kernel_size: 4
bias_init: powerlaw_blockdependent
feedforward:
proj_factor: 1.3
act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
cfg = OmegaConf . create ( xlstm_cfg )
cfg = from_dict ( data_class = xLSTMLMModelConfig , data = OmegaConf . to_container ( cfg ), config = DaciteConfig ( strict = True ))
xlstm_stack = xLSTMLMModel ( cfg )
x = torch . randint ( 0 , 50304 , size = ( 4 , 256 )). to ( "cuda" )
xlstm_stack = xlstm_stack . to ( "cuda" )
y = xlstm_stack ( x )
y . shape [ 1 :] == ( 256 , 50304 )合成实验表明SLSTM比MLSTM的好处是奇偶校验任务和多Query Associative召回任务。奇偶校任务只能通过SLSTM的内存混音提供的状态跟踪功能来解决。多Query关联召回任务衡量记忆能力,其中MLSTM的矩阵记忆和状态扩展非常有益。结合起来,他们在这两个任务上都表现良好。
要运行每个人,请在实验文件夹中运行main.py ,例如:
python experiments/main.py --config experiments/parity_xLSTM01.yaml # xLSTM[0:1], sLSTM only
python experiments/main.py --config experiments/parity_xLSTM10.yaml # xLSTM[1:0], mLSTM only
python experiments/main.py --config experiments/parity_xLSTM11.yaml # xLSTM[1:1], mLSTM and sLSTM
请注意,训练环不包含早期停止或测试评估。
如果您使用此代码库,或者以其他方式找到我们的作品有价值,请引用XLSTM纸:
@inproceedings{beck:24xlstm,
title={xLSTM: Extended Long Short-Term Memory},
author={Maximilian Beck and Korbinian Pöppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter},
booktitle = {Thirty-eighth Conference on Neural Information Processing Systems},
year={2024},
url={https://arxiv.org/abs/2405.04517},
}