该代码库重新制定lora:大语模型的低级适应(ICLR 2022),并根据洛拉布(Loralib)进行重建。
loratorch和loralib的实现非常不同。我们以下列方式以nn.Linear为例。
loralib , 在哪里
loratorch , loralib计算loratorch合并预先训练的重量nn.Linear.forward()来计算结果。线性层中loralib和loratorch之间没有区别。但是在某些无线性或复杂的层中,我们不确定该层是否满足loralib将洛拉扩展到某些复杂的层。相反,首先在loratorch中合并权重的想法更加通用和可扩展。您只需在loratorch中调用merge_lora_param()合并权重,然后在原始层中forward()来计算结果。在loratorch的帮助下,您可以轻松地将Lora实现到任何类型的torch.nn层。
loralib | loratorch | ||
|---|---|---|---|
nn.Linear | ✓ | ✓ | linear.ipynb |
nn.Embedding | ✓ | ✓ | embedding.ipynb |
nn.Conv1d | ✓ | ✓ | |
nn.Conv2d | ✓ | ✓ | |
nn.Conv3d | ✓ | ✓ | |
nn.MultiheadAttention | ✘ | ✓ | |
MergedLinear | ✓(错误) | ✓ | Mergedlinear.ipynb |
| 很难扩展 | 易于扩展 |
我们在示例中比较了loralib和loratorch的结果,以证明loratorch实施的正确性。
loratorch的使用与loralib相同。
安装loratorch 。
pip install git+https://github.com/Baijiong-Lin/LoRA-Torch
# Alternatively for developers
# git clone https://github.com/Baijiong-Lin/LoRA-Torch
# cd LoRA-Torch
# pip install -e .用loratorch更换您想使用Lora的图层。
# ===== Before =====
# layer = nn.Linear(in_features, out_features)
# ===== After ======
import loratorch as lora
# Add a pair of low-rank adaptation matrices with rank r=16 and alpha=32
layer = lora . Linear ( in_features , out_features , r = 16 , lora_alpha = 32 )在训练循环之前,仅将LORA参数标记为可训练的参数。
model = Model ()
# (!!!) This sets requires_grad to False for all parameters without the string "lora_" in their names
lora . mark_only_lora_as_trainable ( model )
optimizer = torch . optim . SGD ( model . parameters (), lr = 0.1 )
# Training loop
for batch in dataloader :
model . train ()
# forward process
loss = forward_fun ( model , batch )
# backward process
optimizer . zero_grad ()
loss . backward ()
optimizer . step ()
# (!!!) reregister model param to ensure they are in model.state_dict() and model.parameters()
# (!!!) Without this line, the performance does not be affected but you will find that some weights are missing in model.state_dict() and model.parameters()
lora . register_model_param_after_backward ( model )保存LORA型号(仅保存Lora矩阵)。
# ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch . save ( lora . lora_state_dict ( model ), checkpoint_path )负载Lora模型(需要首先加载预训练的模型)。
# Load the pre-trained checkpoint first
model . load_state_dict ( torch . load ( 'ckpt_pretrained.pt' ), strict = False )
# Then load the LoRA checkpoint
model . load_state_dict ( torch . load ( 'ckpt_lora.pt' ), strict = False )loratorch由Baijiong Lin开发和维护。
如果您有任何疑问或建议,请随时通过提出问题或发送电子邮件至[email protected]与我们联系。
loratorch大量基于loralib 。我们感谢其作者的出色和开源代码库。
如果您发现loratorch对您的研究或开发有用,请引用以下内容:
@inproceedings { hu2022lora ,
title = { Lo{RA}: Low-Rank Adaptation of Large Language Models } ,
author = { Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen } ,
booktitle = { International Conference on Learning Representations } ,
year = { 2022 } ,
}
@software { lin2023loratorch ,
author = { Baijiong Lin } ,
title = { {LoRA-Torch}: {PyTorch} Reimplementation of {LoRA} } ,
url = { https://github.com/Baijiong-Lin/LoRA-Torch } ,
year = { 2023 }
}loratorch根据MIT许可发布。