LoRA Torch下載LoRA Torch源代碼下載

LoRA Torch

Ai源碼

1.0.0

下載

洛拉·托奇（Lora-Torch）

該代碼庫重新制定lora：大語模型的低級適應（ICLR 2022），並根據洛拉布（Loralib）進行重建。

特徵

loratorch和loralib的實現非常不同。我們以下列方式以nn.Linear為例。

對於loralib ， $ h = x w_0^ top + frac { alpha} {r} x（ba）^ top，$

在哪裡 $ x in mathbb {r}^{k times n} $是輸入矩陣， $ w_0 in mathbb {r}^{m times n} $是預訓練的重量矩陣， $ r $是預定義的洛拉等級， $ b in mathbb {r}^{m times r} $和 $ a in mathbb {r}^{r times n} $是洛拉矩陣， $ alpha $是一個高參數。

對於loratorch ， $ h = x（w_0 + frac { alpha} {r} ba）^ top。

loralib計算 $ xw_0^ top $和 $ x（ba）^ top $分別合併結果。 loratorch合併預先訓練的重量 $ W_0 $和它的洛拉重量 $ ba $然後通過簡單地使用nn.Linear.forward()來計算結果。線性層中loralib和loratorch之間沒有區別。但是在某些無線性或複雜的層中，我們不確定該層是否滿足 $ l（x，w_0）+l（x，ba）= l（x，w_0+ba）$ 。因此，很難使用loralib將洛拉擴展到某些複雜的層。相反，首先在loratorch中合併權重的想法更加通用和可擴展。您只需在loratorch中調用merge_lora_param()合併權重，然後在原始層中forward()來計算結果。在loratorch的幫助下，您可以輕鬆地將Lora實現到任何類型的torch.nn層。

支撐層

	`loralib`	`loratorch`
`nn.Linear`	✓	✓	linear.ipynb
`nn.Embedding`	✓	✓	embedding.ipynb
`nn.Conv1d`	✓	✓
`nn.Conv2d`	✓	✓
`nn.Conv3d`	✓	✓
`nn.MultiheadAttention`	✘	✓
`MergedLinear`	✓（錯誤）	✓	Mergedlinear.ipynb
$ cdots $	很難擴展	易於擴展

我們在示例中比較了loralib和loratorch的結果，以證明loratorch實施的正確性。

快速開始

loratorch的使用與loralib相同。

安裝loratorch 。

pip install git+https://github.com/Baijiong-Lin/LoRA-Torch
# Alternatively for developers
# git clone https://github.com/Baijiong-Lin/LoRA-Torch
# cd LoRA-Torch
# pip install -e .

用loratorch更換您想使用Lora的圖層。

 # ===== Before =====
# layer = nn.Linear(in_features, out_features)

# ===== After ======
import loratorch as lora
# Add a pair of low-rank adaptation matrices with rank r=16 and alpha=32
layer = lora . Linear ( in_features , out_features , r = 16 , lora_alpha = 32 )

在訓練循環之前，僅將LORA參數標記為可訓練的參數。

 model = Model ()
# (!!!) This sets requires_grad to False for all parameters without the string "lora_" in their names
lora . mark_only_lora_as_trainable ( model )

optimizer = torch . optim . SGD ( model . parameters (), lr = 0.1 )
# Training loop
for batch in dataloader :
    model . train ()
    # forward process
    loss = forward_fun ( model , batch )
    # backward process
    optimizer . zero_grad ()
    loss . backward ()
    optimizer . step ()
    # (!!!) reregister model param to ensure they are in model.state_dict() and model.parameters()
    # (!!!) Without this line, the performance does not be affected but you will find that some weights are missing in model.state_dict() and model.parameters()
    lora . register_model_param_after_backward ( model )

保存LORA型號（僅保存Lora矩陣）。

 # ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch . save ( lora . lora_state_dict ( model ), checkpoint_path )

負載Lora模型（需要首先加載預訓練的模型）。

 # Load the pre-trained checkpoint first
model . load_state_dict ( torch . load ( 'ckpt_pretrained.pt' ), strict = False )
# Then load the LoRA checkpoint
model . load_state_dict ( torch . load ( 'ckpt_lora.pt' ), strict = False )

貢獻者

loratorch由Baijiong Lin開發和維護。

聯繫我們

如果您有任何疑問或建議，請隨時通過提出問題或發送電子郵件至[email protected]與我們聯繫。

致謝

loratorch大量基於loralib 。我們感謝其作者的出色和開源代碼庫。

引用

如果您發現loratorch對您的研究或開發有用，請引用以下內容：

 @inproceedings { hu2022lora ,
title = { Lo{RA}: Low-Rank Adaptation of Large Language Models } ,
author = { Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen } ,
booktitle = { International Conference on Learning Representations } ,
year = { 2022 } ,
}

@software { lin2023loratorch ,
  author = { Baijiong Lin } ,
  title = { {LoRA-Torch}: {PyTorch} Reimplementation of {LoRA} } ,
  url = { https://github.com/Baijiong-Lin/LoRA-Torch } ,
  year = { 2023 }
}