
火炬射擊器- 與優化模塊兼容的Pytorch的優化器收集。
import torch_optimizer as optim
# model = ...
optimizer = optim . DiffGrad ( model . parameters (), lr = 0.001 )
optimizer . step ()安裝過程很簡單,僅:
$ pip安裝torch_optimizer
https://pytorch-optimizer.rtfd.io
請引用優化算法的原始作者。如果您喜歡此包:
@software {Novik_torchoptimizers,
title = {{{torch-optimizer- pytorch的優化算法的收集。 }},
作者= {novik,mykola},
年= 2020,
月= 1,
版本= {1.0.1}
}
或使用GitHub功能:“引用此存儲庫”按鈕。
| a2gradexp | https://arxiv.org/abs/1810.00553 |
| A2Gradinc | https://arxiv.org/abs/1810.00553 |
| A2Graduni | https://arxiv.org/abs/1810.00553 |
| ACCSGD | https://arxiv.org/abs/1803.05591 |
| 適應性 | https://arxiv.org/abs/2010.07468 |
| Adabound | https://arxiv.org/abs/1902.09843 |
| 阿達莫德 | https://arxiv.org/abs/1910.12249 |
| afactor | https://arxiv.org/abs/1804.04235 |
| Adahessian | https://arxiv.org/abs/2006.00719 |
| Adamp | https://arxiv.org/abs/2006.08217 |
| aggmo | https://arxiv.org/abs/1804.00325 |
| 阿波羅 | https://arxiv.org/abs/2009.13586 |
| diffgrad | https://arxiv.org/abs/1909.11015 |
| 羊肉 | https://arxiv.org/abs/1904.00962 |
| lookahead | https://arxiv.org/abs/1907.08610 |
| 麥格拉德 | https://arxiv.org/abs/2101.11075 |
| 諾維格拉德 | https://arxiv.org/abs/1905.11286 |
| pid | https://www4.comp.polyu.edu.hk/~cslzhang/paper/cvpr18_pid.pdf |
| Qhadam | https://arxiv.org/abs/1810.06801 |
| QHM | https://arxiv.org/abs/1810.06801 |
| radam | https://arxiv.org/abs/1908.03265 |
| 遊俠 | https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination of-radam-lookahead-the-the-the-the-the-the-the-for-2dc83f79a48d |
| rangerqh | https://arxiv.org/abs/1810.06801 |
| 蘭格瓦 | https://arxiv.org/abs/1908.00700v2 |
| SGDP | https://arxiv.org/abs/2006.08217 |
| SGDW | https://arxiv.org/abs/1608.03983 |
| 武士 | https://arxiv.org/abs/1712.07628 |
| 洗髮精 | https://arxiv.org/abs/1802.09568 |
| 瑜伽士 | https://papers.nips.cc/paper/8186-aptive-methods-for-nonconvex-optimization |
可視化有助於我們了解不同的算法如何處理簡單情況,例如:鞍點,本地的minima,valleys等,並可能對算法的內部運作提供有趣的見解。選擇了Rosenbrock和Rastrigin基準功能,因為:
rastrigin是一種非凸功能,在(0.0,0.0)中具有一個全局最小值。由於其較大的搜索空間和大量的本地最小值,找到此功能的最小值是一個相當困難的問題。
每個優化器都執行501個優化步驟。學習率是通過超級參數搜索算法找到的最佳學習率,其餘的調整參數為默認值。擴展腳本並調整其他優化器參數非常容易。
python示例/viz_optimizers.py
請勿根據可視化選擇優化器,優化方法具有獨特的屬性,並且可能是為了不同目的而定制的,或者可能需要明確的學習率計劃等。找出最佳方法是在您的特定問題上嘗試一種,看看它是否提高了分數。
如果您不知道要使用哪種優化器,請從內置的SGD/ADAM開始。一旦訓練邏輯準備就緒並建立了基線得分,請交換優化器,看看是否有任何改進。
import torch_optimizer as optim
# model = ...
optimizer = optim . A2GradExp (
model . parameters (),
kappa = 1000.0 ,
beta = 10.0 ,
lips = 10.0 ,
rho = 0.5 ,
)
optimizer . step ()論文:最佳自適應和加速隨機梯度下降(2018)[https://arxiv.org/abs/1810.00553]
參考代碼:https://github.com/severilov/a2grad_optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . A2GradInc (
model . parameters (),
kappa = 1000.0 ,
beta = 10.0 ,
lips = 10.0 ,
)
optimizer . step ()論文:最佳自適應和加速隨機梯度下降(2018)[https://arxiv.org/abs/1810.00553]
參考代碼:https://github.com/severilov/a2grad_optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . A2GradUni (
model . parameters (),
kappa = 1000.0 ,
beta = 10.0 ,
lips = 10.0 ,
)
optimizer . step ()論文:最佳自適應和加速隨機梯度下降(2018)[https://arxiv.org/abs/1810.00553]
參考代碼:https://github.com/severilov/a2grad_optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . AccSGD (
model . parameters (),
lr = 1e-3 ,
kappa = 1000.0 ,
xi = 10.0 ,
small_const = 0.7 ,
weight_decay = 0
)
optimizer . step ()論文:關於隨機優化現有動量方案的不足(2019)[https://arxiv.org/abs/1803.05591]
參考代碼:https://github.com/rahulkidambi/accsgd
import torch_optimizer as optim
# model = ...
optimizer = optim . AdaBelief (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-3 ,
weight_decay = 0 ,
amsgrad = False ,
weight_decouple = False ,
fixed_decay = False ,
rectify = False ,
)
optimizer . step ()論文: Anibelief Optimizer,通過對觀察到的梯度的信念進行調整(2020)[https://arxiv.org/abs/2010.07468]
參考代碼:https://github.com/juntang-zhuang/adabelief-optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . AdaBound (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
final_lr = 0.1 ,
gamma = 1e-3 ,
eps = 1e-8 ,
weight_decay = 0 ,
amsbound = False ,
)
optimizer . step ()論文:具有動態學習率的自適應梯度方法(2019)[https://arxiv.org/abs/1902.09843]
參考代碼:https://github.com/luolc/adabound
Adamod方法以自適應和矩上的上限限制了自適應學習率。動態學習速率範圍基於自適應學習率本身的指數移動平均值,這使意外的大型學習率平穩並穩定了深層神經網絡的訓練。
import torch_optimizer as optim
# model = ...
optimizer = optim . AdaMod (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
beta3 = 0.999 ,
eps = 1e-8 ,
weight_decay = 0 ,
)
optimizer . step ()論文:一種自適應和矩界的隨機學習方法。 (2019)[https://arxiv.org/abs/1910.12249]
參考代碼:https://github.com/lancopku/adamod
import torch_optimizer as optim
# model = ...
optimizer = optim . Adafactor (
m . parameters (),
lr = 1e-3 ,
eps2 = ( 1e-30 , 1e-3 ),
clip_threshold = 1.0 ,
decay_rate = - 0.8 ,
beta1 = None ,
weight_decay = 0.0 ,
scale_parameter = True ,
relative_step = True ,
warmup_init = False ,
)
optimizer . step ()論文: afafactor:具有均衡記憶成本的自適應學習率。 (2018)[https://arxiv.org/abs/1804.04235]
參考代碼:https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py.py
import torch_optimizer as optim
# model = ...
optimizer = optim . Adahessian (
m . parameters (),
lr = 1.0 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-4 ,
weight_decay = 0.0 ,
hessian_power = 1.0 ,
)
loss_fn ( m ( input ), target ). backward ( create_graph = True ) # create_graph=True is necessary for Hessian calculation
optimizer . step ()論文: Adahessian:用於機器學習的自適應二階優化器(2020)[https://arxiv.org/abs/2006.00719]
參考代碼:https://github.com/amirgholami/adahessian
ADAMP提出了一個簡單有效的解決方案:在ADAM優化器的每次迭代中,都應用於尺度不變的權重(例如,在BN層之前的Conv withs),Adamp從更新向量中刪除了徑向分量(即,與權重矢量平行)。直覺上,此操作防止了沿徑向方向的不必要的更新,而不必增加重量標準,而不會導致損失最小化。
import torch_optimizer as optim
# model = ...
optimizer = optim . AdamP (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-8 ,
weight_decay = 0 ,
delta = 0.1 ,
wd_ratio = 0.1
)
optimizer . step ()紙:基於動量的優化器的重量標準降低了體重標準的增加。 (2020)[https://arxiv.org/abs/2006.08217]
參考代碼:https://github.com/clovaai/adamp
import torch_optimizer as optim
# model = ...
optimizer = optim . AggMo (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.0 , 0.9 , 0.99 ),
weight_decay = 0 ,
)
optimizer . step ()紙:綜合動量:通過被動阻尼的穩定性。 (2019)[https://arxiv.org/abs/1804.00325]
參考代碼:https://github.com/athemathmo/aggmo
import torch_optimizer as optim
# model = ...
optimizer = optim . Apollo (
m . parameters (),
lr = 1e-2 ,
beta = 0.9 ,
eps = 1e-4 ,
warmup = 0 ,
init_lr = 0.01 ,
weight_decay = 0 ,
)
optimizer . step ()論文:阿波羅:一種自適應參數的對角準牛頓法,用於非凸隨機優化。 (2020)[https://arxiv.org/abs/2009.13586]
參考代碼:https://github.com/xuezhemax/apollo
優化器基於當前和過去梯度之間的差異,對每個參數進行調整的步長以使得它應該具有更大的步長以使其具有更大的步長,以更快地更換梯度更換參數,而對於較低的梯度更換參數,則應具有較低的步長。
import torch_optimizer as optim
# model = ...
optimizer = optim . DiffGrad (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-8 ,
weight_decay = 0 ,
)
optimizer . step ()論文: Diffgrad:一種卷積神經網絡的優化方法。 (2019)[https://arxiv.org/abs/1909.11015]
參考代碼:https://github.com/shivram1987/diffgrad
import torch_optimizer as optim
# model = ...
optimizer = optim . Lamb (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-8 ,
weight_decay = 0 ,
)
optimizer . step ()論文:深度學習的大批量優化:76分鐘內的培訓BERT (2019)[https://arxiv.org/abs/1904.00962]
參考代碼:https://github.com/cybertronai/pytorch-lamb
import torch_optimizer as optim
# model = ...
# base optimizer, any other optimizer can be used like Adam or DiffGrad
yogi = optim . Yogi (
m . parameters (),
lr = 1e-2 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-3 ,
initial_accumulator = 1e-6 ,
weight_decay = 0 ,
)
optimizer = optim . Lookahead ( yogi , k = 5 , alpha = 0.5 )
optimizer . step ()論文: lookahead優化器:k向前邁進,向後1步(2019)[https://arxiv.org/abs/1907.08610]
參考代碼:https://github.com/alphadl/lookahead.pytorch
import torch_optimizer as optim
# model = ...
optimizer = optim . MADGRAD (
m . parameters (),
lr = 1e-2 ,
momentum = 0.9 ,
weight_decay = 0 ,
eps = 1e-6 ,
)
optimizer . step ()論文:不妥協的適應性:一種隨機優化的勢頭,自適應,平均梯度方法(2021)[https://arxiv.org/abs/2101.11075]
參考代碼:https://github.com/facebookresearch/madgrad
import torch_optimizer as optim
# model = ...
optimizer = optim . NovoGrad (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-8 ,
weight_decay = 0 ,
grad_averaging = False ,
amsgrad = False ,
)
optimizer . step ()論文:培訓深網的隨機梯度方法具有層次自適應時刻(2019)[https://arxiv.org/abs/1905.11286]
參考代碼:https://github.com/nvidia/deeplearningningexamples/
import torch_optimizer as optim
# model = ...
optimizer = optim . PID (
m . parameters (),
lr = 1e-3 ,
momentum = 0 ,
dampening = 0 ,
weight_decay = 1e-2 ,
integral = 5.0 ,
derivative = 10.0 ,
)
optimizer . step ()論文:深層網絡隨機優化的PID控制器方法(2018)[http://www4.comp.polyu.edu.edu.hk/~cslzhang/paper/paper/cvpr18_pid.pdf]
參考代碼:https://github.com/tensorboy/pidoptimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . QHAdam (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
nus = ( 1.0 , 1.0 ),
weight_decay = 0 ,
decouple_weight_decay = False ,
eps = 1e-8 ,
)
optimizer . step ()論文:準亨利貝利動量和深度學習的亞當(2019)[https://arxiv.org/abs/1810.06801]
參考代碼:https://github.com/facebookresearch/qhoptim
import torch_optimizer as optim
# model = ...
optimizer = optim . QHM (
m . parameters (),
lr = 1e-3 ,
momentum = 0 ,
nu = 0.7 ,
weight_decay = 1e-2 ,
weight_decay_type = 'grad' ,
)
optimizer . step ()論文:準亨利貝利動量和深度學習的亞當(2019)[https://arxiv.org/abs/1810.06801]
參考代碼:https://github.com/facebookresearch/qhoptim
棄用,請使用Pytorch提供的版本。
import torch_optimizer as optim
# model = ...
optimizer = optim . RAdam (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-8 ,
weight_decay = 0 ,
)
optimizer . step ()論文:關於自適應學習率和超越(2019年)[https://arxiv.org/abs/1908.03265]的差異]
參考代碼:https://github.com/liyuanlucasliu/radam
import torch_optimizer as optim
# model = ...
optimizer = optim . Ranger (
m . parameters (),
lr = 1e-3 ,
alpha = 0.5 ,
k = 6 ,
N_sma_threshhold = 5 ,
betas = ( .95 , 0.999 ),
eps = 1e-5 ,
weight_decay = 0
)
optimizer . step ()論文:新的深度學習優化器,遊俠:radam + lookahead的協同組合(2019年)
參考代碼:https://github.com/lessw2020/ranger-deep-learning-optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . RangerQH (
m . parameters (),
lr = 1e-3 ,
betas = ( 0.9 , 0.999 ),
nus = ( .7 , 1.0 ),
weight_decay = 0.0 ,
k = 6 ,
alpha = .5 ,
decouple_weight_decay = False ,
eps = 1e-8 ,
)
optimizer . step ()論文:準亨利貝利動量和深度學習的亞當(2018)[https://arxiv.org/abs/1810.06801]
參考代碼:https://github.com/lessw2020/ranger-deep-learning-optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . RangerVA (
m . parameters (),
lr = 1e-3 ,
alpha = 0.5 ,
k = 6 ,
n_sma_threshhold = 5 ,
betas = ( .95 , 0.999 ),
eps = 1e-5 ,
weight_decay = 0 ,
amsgrad = True ,
transformer = 'softplus' ,
smooth = 50 ,
grad_transformer = 'square'
)
optimizer . step ()論文:校準自適應學習率以提高亞當的融合(2019)[https://arxiv.org/abs/1908.00700v2]
參考代碼:https://github.com/lessw2020/ranger-deep-learning-optimizer
import torch_optimizer as optim
# model = ...
optimizer = optim . SGDP (
m . parameters (),
lr = 1e-3 ,
momentum = 0 ,
dampening = 0 ,
weight_decay = 1e-2 ,
nesterov = False ,
delta = 0.1 ,
wd_ratio = 0.1
)
optimizer . step ()紙:基於動量的優化器的重量標準降低了體重標準的增加。 (2020)[https://arxiv.org/abs/2006.08217]
參考代碼:https://github.com/clovaai/adamp
import torch_optimizer as optim
# model = ...
optimizer = optim . SGDW (
m . parameters (),
lr = 1e-3 ,
momentum = 0 ,
dampening = 0 ,
weight_decay = 1e-2 ,
nesterov = False ,
)
optimizer . step ()論文: SGDR:隨機梯度下降與溫暖重啟(2017年)[https://arxiv.org/abs/1608.03983]
參考代碼:Pytorch/Pytorch#22466
import torch_optimizer as optim
# model = ...
optimizer = optim . SWATS (
model . parameters (),
lr = 1e-1 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-3 ,
weight_decay = 0.0 ,
amsgrad = False ,
nesterov = False ,
)
optimizer . step ()論文:通過從Adam轉換為SGD(2017)[https://arxiv.org/abs/1712.07628]來改善概括性能。
參考代碼:https://github.com/mrpatekful/swats
import torch_optimizer as optim
# model = ...
optimizer = optim . Shampoo (
m . parameters (),
lr = 1e-1 ,
momentum = 0.0 ,
weight_decay = 0.0 ,
epsilon = 1e-4 ,
update_freq = 1 ,
)
optimizer . step ()論文:洗髮水:預處理的隨機張量優化(2018)[https://arxiv.org/abs/1802.09568]
參考代碼:https://github.com/moskomule/shampoo.pytorch
Yogi是基於Adam的優化算法,具有更細顆粒的有效學習率控制,並且在收斂上具有類似的理論保證,與Adam相似。
import torch_optimizer as optim
# model = ...
optimizer = optim . Yogi (
m . parameters (),
lr = 1e-2 ,
betas = ( 0.9 , 0.999 ),
eps = 1e-3 ,
initial_accumulator = 1e-6 ,
weight_decay = 0 ,
)
optimizer . step ()論文:非convex優化的自適應方法(2018)[https://papers.nips.cc/paper/8186-aptive-methods-for-nonconvex-optimization]
參考代碼:https://github.com/4rtemi5/yogi-optimizer_keras