該代碼庫重新制定lora:大語模型的低級適應(ICLR 2022),並根據洛拉布(Loralib)進行重建。
loratorch和loralib的實現非常不同。我們以下列方式以nn.Linear為例。
loralib , 在哪裡
loratorch , loralib計算loratorch合併預先訓練的重量nn.Linear.forward()來計算結果。線性層中loralib和loratorch之間沒有區別。但是在某些無線性或複雜的層中,我們不確定該層是否滿足loralib將洛拉擴展到某些複雜的層。相反,首先在loratorch中合併權重的想法更加通用和可擴展。您只需在loratorch中調用merge_lora_param()合併權重,然後在原始層中forward()來計算結果。在loratorch的幫助下,您可以輕鬆地將Lora實現到任何類型的torch.nn層。
loralib | loratorch | ||
|---|---|---|---|
nn.Linear | ✓ | ✓ | linear.ipynb |
nn.Embedding | ✓ | ✓ | embedding.ipynb |
nn.Conv1d | ✓ | ✓ | |
nn.Conv2d | ✓ | ✓ | |
nn.Conv3d | ✓ | ✓ | |
nn.MultiheadAttention | ✘ | ✓ | |
MergedLinear | ✓(錯誤) | ✓ | Mergedlinear.ipynb |
| 很難擴展 | 易於擴展 |
我們在示例中比較了loralib和loratorch的結果,以證明loratorch實施的正確性。
loratorch的使用與loralib相同。
安裝loratorch 。
pip install git+https://github.com/Baijiong-Lin/LoRA-Torch
# Alternatively for developers
# git clone https://github.com/Baijiong-Lin/LoRA-Torch
# cd LoRA-Torch
# pip install -e .用loratorch更換您想使用Lora的圖層。
# ===== Before =====
# layer = nn.Linear(in_features, out_features)
# ===== After ======
import loratorch as lora
# Add a pair of low-rank adaptation matrices with rank r=16 and alpha=32
layer = lora . Linear ( in_features , out_features , r = 16 , lora_alpha = 32 )在訓練循環之前,僅將LORA參數標記為可訓練的參數。
model = Model ()
# (!!!) This sets requires_grad to False for all parameters without the string "lora_" in their names
lora . mark_only_lora_as_trainable ( model )
optimizer = torch . optim . SGD ( model . parameters (), lr = 0.1 )
# Training loop
for batch in dataloader :
model . train ()
# forward process
loss = forward_fun ( model , batch )
# backward process
optimizer . zero_grad ()
loss . backward ()
optimizer . step ()
# (!!!) reregister model param to ensure they are in model.state_dict() and model.parameters()
# (!!!) Without this line, the performance does not be affected but you will find that some weights are missing in model.state_dict() and model.parameters()
lora . register_model_param_after_backward ( model )保存LORA型號(僅保存Lora矩陣)。
# ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch . save ( lora . lora_state_dict ( model ), checkpoint_path )負載Lora模型(需要首先加載預訓練的模型)。
# Load the pre-trained checkpoint first
model . load_state_dict ( torch . load ( 'ckpt_pretrained.pt' ), strict = False )
# Then load the LoRA checkpoint
model . load_state_dict ( torch . load ( 'ckpt_lora.pt' ), strict = False )loratorch由Baijiong Lin開發和維護。
如果您有任何疑問或建議,請隨時通過提出問題或發送電子郵件至[email protected]與我們聯繫。
loratorch大量基於loralib 。我們感謝其作者的出色和開源代碼庫。
如果您發現loratorch對您的研究或開發有用,請引用以下內容:
@inproceedings { hu2022lora ,
title = { Lo{RA}: Low-Rank Adaptation of Large Language Models } ,
author = { Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen } ,
booktitle = { International Conference on Learning Representations } ,
year = { 2022 } ,
}
@software { lin2023loratorch ,
author = { Baijiong Lin } ,
title = { {LoRA-Torch}: {PyTorch} Reimplementation of {LoRA} } ,
url = { https://github.com/Baijiong-Lin/LoRA-Torch } ,
year = { 2023 }
}loratorch根據MIT許可發布。