micronet下載 - micronet源代碼下載

micronet

"目前在深度學習領域分類兩個派別，一派為學院派，研究強大、複雜的模型網絡和實驗方法，為了追求更高的性能；另一派為工程派，旨在將算法更穩定、高效的落地在硬件平台上，效率是其追求的目標。複雜的模型固然具有更好的性能，但是高額的存儲空間、計算資源消耗是使其難以有效的應用在各硬件平台上的重要原因。所以，深度神經網絡日益增長的規模為深度學習在移動端的部署帶來了巨大的挑戰，深度學習模型壓縮與部署成為了學術界和工業界都重點關注的研究領域之一"

項目簡介

micronet, a model compression and deploy lib.

壓縮

量化：High-Bit(>2b): QAT, PTQ, QAFT; Low-Bit(≤2b)/Ternary and Binary: QAT
剪枝：正常、規整和分組卷積結構剪枝
針對特徵(A)二值量化的BN融合(訓練量化後，BN參數—> conv的偏置b)
High-Bit量化的BN融合(訓練量化中，先融合再量化，融合：BN參數—> conv的權重w和偏置b)

部署

TensorRT(fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape等)

代碼結構

code_structure

 micronet
├── __init__.py
├── base_module
│   ├── __init__.py
│   └── op.py
├── compression
│   ├── README.md
│   ├── __init__.py
│   ├── pruning
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── gc_prune.py
│   │   ├── main.py
│   │   ├── models_save
│   │   │   └── models_save.txt
│   │   └── normal_regular_prune.py
│   └── quantization
│       ├── README.md
│       ├── __init__.py
│       ├── wbwtab
│       │   ├── __init__.py
│       │   ├── bn_fuse
│       │   │   ├── bn_fuse.py
│       │   │   ├── bn_fused_model_test.py
│       │   │   └── models_save
│       │   │       └── models_save.txt
│       │   ├── main.py
│       │   ├── models_save
│       │   │   └── models_save.txt
│       │   └── quantize.py
│       └── wqaq
│           ├── __init__.py
│           ├── dorefa
│           │   ├── __init__.py
│           │   ├── main.py
│           │   ├── models_save
│           │   │   └── models_save.txt
│           │   ├── quant_model_test
│           │   │   ├── models_save
│           │   │   │   └── models_save.txt
│           │   │   ├── quant_model_para.py
│           │   │   └── quant_model_test.py
│           │   └── quantize.py
│           └── iao
│               ├── __init__.py
│               ├── bn_fuse
│               │   ├── bn_fuse.py
│               │   ├── bn_fused_model_test.py
│               │   └── models_save
│               │       └── models_save.txt
│               ├── main.py
│               ├── models_save
│               │   └── models_save.txt
│               └── quantize.py
├── data
│   └── data.txt
├── deploy
│   ├── README.md
│   ├── __init__.py
│   └── tensorrt
│       ├── README.md
│       ├── __init__.py
│       ├── calibrator.py
│       ├── eval_trt.py
│       ├── models
│       │   ├── __init__.py
│       │   └── models_trt.py
│       ├── models_save
│       │   └── calibration_seg.cache
│       ├── test_trt.py
│       └── util_trt.py
├── models
│   ├── __init__.py
│   ├── nin.py
│   ├── nin_gc.py
│   └── resnet.py
└── readme_imgs
    ├── code_structure.jpg
    └── micronet.xmind

項目進展

2019.12.4 , 初次提交
12.8 , DoReFa特徵(A)量化前先進行縮放(* 0.1)，然後再截斷，以減小截斷誤差
12.11 , 增加項目代碼結構圖
12.12, 完善使用示例
12.14, 增加:1、BN融合量化情況(W三值/二值)可選，即訓練量化時選擇W三/二值，這裡則對應選擇; 2、BN融合時對卷積核(conv)不帶偏置(bias)的處理
12.17 , 增加模型壓縮前後數據對比(示例)
12.20, 增加設備可選(cpu、gpu(單卡、多卡))
12.27 , 補充相關論文
12.29, 取消High-Bit量化8-bit以內的限制，即現在可以量化至10-bit、16-bit等
2020.2.17 , 1、精簡W三值/二值量化代碼; 2、加速W三值量化訓練
2.18 , 優化針對特徵(A)二值的BN融合:去除對BN層gamma參數的限制，即現在此情況下融合時BN可正常訓練
2.24 , 再次優化三/二值量化代碼組織結構，增強可移植性，舊版確實不太好移植。目前移植方法：將想要量化的Conv用compression/quantization/wbwtab/models/util_wbwtab.py中的QuantConv2d替換即可，可參照該路徑下nin_gc.py中的使用方法
3.1 , 新增：1、google的High-Bit量化方法; 2、訓練中High-Bit量化的BN融合
3.2、3.3 , 規整量化代碼整體結構，目前所有量化方法都可採取類似的移植方式：將想要量化的Conv(或FC，目前dorefa支持，其他方法類似可寫)用models/util_wxax.py中的QuantConv2d(或QuantLinear)替換即可，可分別參照該路徑下nin_gc.py中的使用方法進行移植（分類、檢測、分割等均適用，但需要據實際情況具體調試）
3.4 , 規整優化wbwtab/bn_fuse中“針對特徵(A)二值的BN融合”的相關實現代碼，可進行BN融合及融合前後模型對比測試(精度/速度/(大小))
3.11, 調整compression/wqaq/iao中的BN層momentum參數(0.1 —> 0.01),削弱batch統計參數佔比,一定程度抑制量化帶來的抖動。經實驗,量化訓練更穩定,acc提升1%左右
3.13 , 更新代碼結構圖
4.6, 修正二值量化訓練中W_clip的相關問題(之前由於這個，導致二值量化訓練精度上不去，現在已可正常使用)(同時修正無法找到一些模塊如models/util_wxax.py的問題)
12.14 , 1、improve code structure; 2、add deploy-tensorrt(main module, but not running yet)
12.18, 1、improve code structure/module reference/module_name; 2、add transfer-use demo
12.21 , improve pruning-quantization pipeline and code
2021.1.4 , add other quant_op
1.5, add quant_weight's per-channel and per-layer selection
1.7 , fix iao's loss-nan bug. The bug is due to per-channel min/max error
1.8, 1、improve quant_para save. Now, only save scale and zero_point; 2、add optional weight_observer(MinMaxObserver or MovingAverageMinMaxObserver)
1.11 , fix bug in binary_a(1/0) and binary_w preprocessing
1.12 , add "pip install"
1.22 , add auto_insert_quant_op(this still needs to be improved)
1.27 , improve auto_insert_quant_op(now you can easily use quantization, as quant_test_auto)
1.28, 1、fix prune-quantization pipeline and code; 2、improve code structure
2.1 , improve wbwtab_bn_fuse
2.4 , 1、add wqaq_bn_fuse; 2、add quant_model_inference_simulation; 3、improve code format
4.30, 1、update code_structure img; 2、fix iao's quant_weight_range, quant_contrans and quant_bn_fuse_conv pretrained_model bn_para load bug
5.4 , add qaft , it's beneficial to improve the quantization accuracy
5.6 , add ptq , its quantization accuracy is also good
5.11, add bn_fuse_calib flag
5.14 , 1、change ste to clip_ste , it's beneficial to improve the quant_train；2、remove quant_relu and add quant_leaky_relu
5.15, fix bug in quant_model_para post-processing
6.7 , add quant_add(need use base_module's op) and quant_resnet demo
6.9 , iao_quant supports multi gpus
6.16, fix quant_round() and quant_binary()
10.6, format

環境要求

python >= 3.5
torch >= 1.1.0
torchvison >= 0.3.0
numpy
onnx == 1.6.0
tensorrt == 7.0.0.11

安裝

PyPI

pip install micronet -i https://pypi.org/simple

GitHub

git clone https://github.com/666DZY666/micronet.git
cd micronet
python setup.py install

驗證

python -c " import micronet; print(micronet.__version__) "

測試

Install from github

壓縮

量化

--refine,可加載預訓練浮點模型參數,在其基礎上做量化

wbwtab

--W --A, 權重W和特徵A量化取值

 cd micronet/compression/quantization/wbwtab

WbAb

python main.py --W 2 --A 2

WbA32

python main.py --W 2 --A 32

WtAb

python main.py --W 3 --A 2

WtA32

python main.py --W 3 --A 32

wqaq

--w_bits --a_bits, 權重W和特徵A量化位數

dorefa

 cd micronet/compression/quantization/wqaq/dorefa

W16A16

python main.py --w_bits 16 --a_bits 16

W8A8

python main.py --w_bits 8 --a_bits 8

W4A4

python main.py --w_bits 4 --a_bits 4

其他bits情況類比

iao

 cd micronet/compression/quantization/wqaq/iao

量化位數選擇同dorefa

單卡

QAT/PTQ —> QAFT

! 注意，需要在QAT/PTQ之後再做QAFT !

--q_type, 量化類型(0-對稱, 1-非對稱)

--q_level, 權重量化級別(0-通道級, 1-層級)

--weight_observer, weight_observer選擇(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)

--bn_fuse, 量化中bn融合標誌

--bn_fuse_calib, 量化中bn融合校準標誌

--pretrained_model, 預訓練浮點模型

--qaft, qaft標誌

--ptq, ptq_observer

--ptq_control, ptq_control

--ptq_batch, ptq的batch數量

--percentile, ptq校準的比例

QAT

默認: 對稱、(權重)通道級量化, bn不融合, weight_observer-MinMaxObserver, 不加載預訓練浮點模型, 進行qat

python main.py --q_type 0 --q_level 0 --weight_observer 0

對稱、(權重)通道級量化, bn不融合, weight_observer-MovingAverageMinMaxObserver

python main.py --q_type 0 --q_level 0 --weight_observer 1

對稱、(權重)層級量化, bn不融合

python main.py --q_type 0 --q_level 1

非對稱、(權重)通道級量化, bn不融合

python main.py --q_type 1 --q_level 0

非對稱、(權重)層級量化, bn不融合

python main.py --q_type 1 --q_level 1

對稱、(權重)通道級量化, bn融合

python main.py --q_type 0 --q_level 0 --bn_fuse

對稱、(權重)層級量化, bn融合

python main.py --q_type 0 --q_level 1 --bn_fuse

非對稱、(權重)通道級量化, bn融合

python main.py --q_type 1 --q_level 0 --bn_fuse

非對稱、(權重)層級量化, bn融合

python main.py --q_type 1 --q_level 1 --bn_fuse

對稱、(權重)通道級量化, bn融合校準

python main.py --q_type 0 --q_level 0 --bn_fuse --bn_fuse_calib

PTQ

需要加載預訓練浮點模型,本項目中其可由剪枝中採用正常訓練獲取

對稱、(權重)通道級量化, bn融合

python main.py --refine ../../../pruning/models_save/nin_gc.pth --q_level 0 --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999

其他情況類比

QAFT

! 注意，需要在QAT/PTQ之後再做QAFT !

QAT —> QAFT

對稱、(權重)通道級量化, bn融合

python main.py --resume models_save/nin_gc_bn_fused.pth --q_type 0 --q_level 0 --bn_fuse --qaft --lr 0.00001

其他情況類比

PTQ —> QAFT

對稱、(權重)通道級量化, bn融合

python main.py --resume models_save/nin_gc_bn_fused.pth --q_level 0 --bn_fuse --qaft --lr 0.00001 --ptq

其他情況類比

剪枝

稀疏訓練—> 剪枝—> 微調

 cd micronet/compression/pruning

稀疏訓練

-sr 稀疏標誌

--s 稀疏率(需根據dataset、model情況具體調整)

--model_type 模型類型(0-nin, 1-nin_gc)

nin(正常卷積結構)

python main.py -sr --s 0.0001 --model_type 0

nin_gc(含分組卷積結構)

python main.py -sr --s 0.001 --model_type 1

剪枝

--percent 剪枝率

--normal_regular 正常、規整剪枝標誌及規整剪枝基數(如設置為N,則剪枝後模型每層filter個數即為N的倍數)

--model 稀疏訓練後的model路徑

--save 剪枝後保存的model路徑（路徑默認已給出, 可據實際情況更改）

正常剪枝(nin)

python normal_regular_prune.py --percent 0.5 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth

規整剪枝(nin)

python normal_regular_prune.py --percent 0.5 --normal_regular 8 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth

或

python normal_regular_prune.py --percent 0.5 --normal_regular 16 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth

分組卷積結構剪枝(nin_gc)

python gc_prune.py --percent 0.4 --model models_save/nin_gc_sparse.pth

微調

--prune_refine 剪枝後的model路徑（在其基礎上做微調）

python main.py --model_type 0 --prune_refine models_save/nin_prune.pth

nin_gc

需要傳入剪枝後得到的新模型的cfg

如

python main.py --model_type 1 --gc_prune_refine 154 162 144 304 320 320 608 584

剪枝—> 量化（注意剪枝率和量化率平衡）

加載剪枝後的浮點模型再做量化

剪枝—> 量化（高位）（剪枝率偏大、量化率偏小）

w8a8(dorefa)

 cd micronet/compression/quantization/wqaq/dorefa

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth

w8a8(iao)

 cd micronet/compression/quantization/wqaq/iao

QAT/PTQ —> QAFT

! 注意，需要在QAT/PTQ之後再做QAFT !

QAT

bn不融合

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --lr 0.001

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth --lr 0.001

bn融合

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --bn_fuse --pretrained_model --lr 0.001

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth --bn_fuse --pretrained_model --lr 0.001

PTQ

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999

其他情況類比

QAFT

! 注意，需要在QAT/PTQ之後再做QAFT !

QAT —> QAFT

bn不融合

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin.pth --qaft --lr 0.00001

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc.pth --qaft --lr 0.00001

bn融合

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin_bn_fused.pth --bn_fuse --qaft --lr 0.00001

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc_bn_fused.pth --bn_fuse --qaft --lr 0.00001

PTQ —> QAFT

bn不融合

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin.pth --qaft --lr 0.00001 --ptq

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc.pth --qaft --lr 0.00001 --ptq

bn融合

nin(正常卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin_bn_fused.pth --bn_fuse --qaft --lr 0.00001 --ptq

nin_gc(含分組卷積結構)

python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc_bn_fused.pth --bn_fuse --qaft --lr 0.00001 --ptq

其他可選量化配置類比

剪枝—> 量化（低位）（剪枝率偏小、量化率偏大）

 cd micronet/compression/quantization/wbwtab

wbab

nin(正常卷積結構)

python main.py --W 2 --A 2 --model_type 0 --prune_quant ../../pruning/models_save/nin_finetune.pth

nin_gc(含分組卷積結構)

python main.py --W 2 --A 2 --model_type 1 --prune_quant ../../pruning/models_save/nin_gc_retrain.pth

其他取值情況類比

BN融合與量化推理仿真測試

wbwtab

 cd micronet/compression/quantization/wbwtab/bn_fuse

bn_fuse(得到quant_model_train和quant_bn_fused_model_inference的結構和參數)

--model_type, 1 - nin_gc(含分組卷積結構); 0 - nin(正常卷積結構)

--prune_quant, 剪枝_量化模型標誌

--W, weight量化取值

均需要與量化訓練保持一致,可直接用默認

nin_gc, quant_model, wb

python bn_fuse.py --model_type 1 --W 2

nin_gc, prune_quant_model, wb

python bn_fuse.py --model_type 1 --prune_quant --W 2

nin_gc, quant_model, wt

python bn_fuse.py --model_type 1 --W 3

nin, quant_model, wb

python bn_fuse.py --model_type 0 --W 2

bn_fused_model_test(對quant_model_train和quant_bn_fused_model_inference進行測試)

python bn_fused_model_test.py

dorefa

 cd micronet/compression/quantization/wqaq/dorefa/quant_model_test

quant_model_para(得到quant_model_train和quant_model_inference的結構和參數)

--model_type, 1 - nin_gc(含分組卷積結構); 0 - nin(正常卷積結構)

--prune_quant, 剪枝_量化模型標誌

--w_bits, weight量化位數; --a_bits, activation量化位數

均需要與量化訓練保持一致,可直接用默認

nin_gc, quant_model, w8a8

python quant_model_para.py --model_type 1 --w_bits 8 --a_bits 8

nin_gc, prune_quant_model, w8a8

python quant_model_para.py --model_type 1 --prune_quant --w_bits 8 --a_bits 8

nin, quant_model, w8a8

python quant_model_para.py --model_type 0 --w_bits 8 --a_bits 8

quant_model_test(對quant_model_train和quant_model_inference進行測試)

python quant_model_test.py

iao

注意,量化訓練時--bn_fuse 需要設置為True

 cd micronet/compression/quantization/wqaq/iao/bn_fuse

bn_fuse(得到quant_bn_fused_model_train和quant_bn_fused_model_inference的結構和參數)

--model_type, 1 - nin_gc(含分組卷積結構); 0 - nin(正常卷積結構)

--prune_quant, 剪枝_量化模型標誌

--w_bits, weight量化位數; --a_bits, activation量化位數

--q_type, 0 - 對稱; 1 - 非對稱

--q_level, 0 - 通道級; 1 - 層級

均需要與量化訓練保持一致,可直接用默認

nin_gc, quant_model, w8a8

python bn_fuse.py --model_type 1 --w_bits 8 --a_bits 8

nin_gc, prune_quant_model, w8a8

python bn_fuse.py --model_type 1 --prune_quant --w_bits 8 --a_bits 8

nin, quant_model, w8a8

python bn_fuse.py --model_type 0 --w_bits 8 --a_bits 8

nin_gc, quant_model, w8a8, 非對稱, 層級

python bn_fuse.py --model_type 0 --w_bits 8 --a_bits 8 --q_type 1 --q_level 1

bn_fused_model_test(對quant_bn_fused_model_train和quant_bn_fused_model_inference進行測試)

python bn_fused_model_test.py

設備選取

現支持cpu、gpu(單卡、多卡)

--cpu 使用cpu，--gpu_id 使用並選擇gpu

python main.py --cpu

gpu單卡

python main.py --gpu_id 0

或

python main.py --gpu_id 1

gpu多卡

python main.py --gpu_id 0,1

或

python main.py --gpu_id 0,1,2

默認，使用服務器全卡

部署

TensorRT

目前僅提供相關核心模塊代碼，後續再加入完整可運行demo

遷移

量化訓練

LeNet example

quant_test_manual.py

A model can be quantized(High-Bit(>2b)、Low-Bit(≤2b)/Ternary and Binary) by simply replacing op with quant_op .

 import torch . nn as nn
import torch . nn . functional as F

# some base_op, such as ``Add``、``Concat``
from micronet . base_module . op import *

# ``quantize`` is quant_module, ``QuantConv2d``, ``QuantLinear``, ``QuantMaxPool2d``, ``QuantReLU`` are quant_op
from micronet . compression . quantization . wbwtab . quantize import (
    QuantConv2d as quant_conv_wbwtab ,
)
from micronet . compression . quantization . wbwtab . quantize import (
    ActivationQuantizer as quant_relu_wbwtab ,
)
from micronet . compression . quantization . wqaq . dorefa . quantize import (
    QuantConv2d as quant_conv_dorefa ,
)
from micronet . compression . quantization . wqaq . dorefa . quantize import (
    QuantLinear as quant_linear_dorefa ,
)
from micronet . compression . quantization . wqaq . iao . quantize import (
    QuantConv2d as quant_conv_iao ,
)
from micronet . compression . quantization . wqaq . iao . quantize import (
    QuantLinear as quant_linear_iao ,
)
from micronet . compression . quantization . wqaq . iao . quantize import (
    QuantMaxPool2d as quant_max_pool_iao ,
)
from micronet . compression . quantization . wqaq . iao . quantize import (
    QuantReLU as quant_relu_iao ,
)


class LeNet ( nn . Module ):
    def __init__ ( self ):
        super ( LeNet , self ). __init__ ()
        self . conv1 = nn . Conv2d ( 1 , 10 , kernel_size = 5 )
        self . conv2 = nn . Conv2d ( 10 , 20 , kernel_size = 5 )
        self . fc1 = nn . Linear ( 320 , 50 )
        self . fc2 = nn . Linear ( 50 , 10 )
        self . max_pool = nn . MaxPool2d ( kernel_size = 2 )
        self . relu = nn . ReLU ( inplace = True )

    def forward ( self , x ):
        x = self . relu ( self . max_pool ( self . conv1 ( x )))
        x = self . relu ( self . max_pool ( self . conv2 ( x )))
        x = x . view ( - 1 , 320 )
        x = self . relu ( self . fc1 ( x ))
        x = F . dropout ( x , training = self . training )
        x = self . fc2 ( x )
        return F . log_softmax ( x , dim = 1 )


class QuantLeNetWbWtAb ( nn . Module ):
    def __init__ ( self ):
        super ( QuantLeNetWbWtAb , self ). __init__ ()
        self . conv1 = quant_conv_wbwtab ( 1 , 10 , kernel_size = 5 )
        self . conv2 = quant_conv_wbwtab ( 10 , 20 , kernel_size = 5 )
        self . fc1 = nn . Linear ( 320 , 50 )
        self . fc2 = nn . Linear ( 50 , 10 )
        self . max_pool = nn . MaxPool2d ( kernel_size = 2 )
        self . relu = quant_relu_wbwtab ()

    def forward ( self , x ):
        x = self . relu ( self . max_pool ( self . conv1 ( x )))
        x = self . relu ( self . max_pool ( self . conv2 ( x )))
        x = x . view ( - 1 , 320 )
        x = self . relu ( self . fc1 ( x ))
        x = F . dropout ( x , training = self . training )
        x = self . fc2 ( x )
        return F . log_softmax ( x , dim = 1 )


class QuantLeNetDoReFa ( nn . Module ):
    def __init__ ( self ):
        super ( QuantLeNetDoReFa , self ). __init__ ()
        self . conv1 = quant_conv_dorefa ( 1 , 10 , kernel_size = 5 )
        self . conv2 = quant_conv_dorefa ( 10 , 20 , kernel_size = 5 )
        self . fc1 = quant_linear_dorefa ( 320 , 50 )
        self . fc2 = quant_linear_dorefa ( 50 , 10 )
        self . max_pool = nn . MaxPool2d ( kernel_size = 2 )
        self . relu = nn . ReLU ( inplace = True )

    def forward ( self , x ):
        x = self . relu ( self . max_pool ( self . conv1 ( x )))
        x = self . relu ( self . max_pool ( self . conv2 ( x )))
        x = x . view ( - 1 , 320 )
        x = self . relu ( self . fc1 ( x ))
        x = F . dropout ( x , training = self . training )
        x = self . fc2 ( x )
        return F . log_softmax ( x , dim = 1 )


class QuantLeNetIAO ( nn . Module ):
    def __init__ ( self ):
        super ( QuantLeNetIAO , self ). __init__ ()
        self . conv1 = quant_conv_iao ( 1 , 10 , kernel_size = 5 )
        self . conv2 = quant_conv_iao ( 10 , 20 , kernel_size = 5 )
        self . fc1 = quant_linear_iao ( 320 , 50 )
        self . fc2 = quant_linear_iao ( 50 , 10 )
        self . max_pool = quant_max_pool_iao ( kernel_size = 2 )
        self . relu = nn . ReLU ( inplace = True )

    def forward ( self , x ):
        x = self . relu ( self . max_pool ( self . conv1 ( x )))
        x = self . relu ( self . max_pool ( self . conv2 ( x )))
        x = x . view ( - 1 , 320 )
        x = self . relu ( self . fc1 ( x ))
        x = F . dropout ( x , training = self . training )
        x = self . fc2 ( x )
        return F . log_softmax ( x , dim = 1 )


lenet = LeNet ()
quant_lenet_wbwtab = QuantLeNetWbWtAb ()
quant_lenet_dorefa = QuantLeNetDoReFa ()
quant_lenet_iao = QuantLeNetIAO ()

print ( "***ori_model*** n " , lenet )
print ( " n ***quant_model_wbwtab*** n " , quant_lenet_wbwtab )
print ( " n ***quant_model_dorefa*** n " , quant_lenet_dorefa )
print ( " n ***quant_model_iao*** n " , quant_lenet_iao )

print ( " n quant_model is ready" )
print ( "micronet is ready" )

quant_test_auto.py

A model can be quantized(High-Bit(>2b)、Low-Bit(≤2b)/Ternary and Binary) by simply using micronet.compression.quantization.quantize.prepare(model) .

 import torch . nn as nn
import torch . nn . functional as F

# some base_op, such as ``Add``、``Concat``
from micronet . base_module . op import *

import micronet . compression . quantization . wqaq . dorefa . quantize as quant_dorefa
import micronet . compression . quantization . wqaq . iao . quantize as quant_iao


class LeNet ( nn . Module ):
    def __init__ ( self ):
        super ( LeNet , self ). __init__ ()
        self . conv1 = nn . Conv2d ( 1 , 10 , kernel_size = 5 )
        self . conv2 = nn . Conv2d ( 10 , 20 , kernel_size = 5 )
        self . fc1 = nn . Linear ( 320 , 50 )
        self . fc2 = nn . Linear ( 50 , 10 )
        self . max_pool = nn . MaxPool2d ( kernel_size = 2 )
        self . relu = nn . ReLU ( inplace = True )

    def forward ( self , x ):
        x = self . relu ( self . max_pool ( self . conv1 ( x )))
        x = self . relu ( self . max_pool ( self . conv2 ( x )))
        x = x . view ( - 1 , 320 )
        x = self . relu ( self . fc1 ( x ))
        x = F . dropout ( x , training = self . training )
        x = self . fc2 ( x )
        return F . log_softmax ( x , dim = 1 )


"""
--w_bits --a_bits, 权重W和特征A量化位数
--q_type, 量化类型(0-对称, 1-非对称)
--q_level, 权重量化级别(0-通道级, 1-层级)
--weight_observer, weight_observer选择(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)
--bn_fuse, 量化中bn融合标志
--bn_fuse_calib, 量化中bn融合校准标志
--pretrained_model, 预训练浮点模型
--qaft, qaft标志
--ptq, ptq标志
--percentile, ptq校准的比例
"""
lenet = LeNet ()
quant_lenet_dorefa = quant_dorefa . prepare ( lenet , inplace = False , a_bits = 8 , w_bits = 8 )
quant_lenet_iao = quant_iao . prepare (
    lenet ,
    inplace = False ,
    a_bits = 8 ,
    w_bits = 8 ,
    q_type = 0 ,
    q_level = 0 ,
    weight_observer = 0 ,
    bn_fuse = False ,
    bn_fuse_calib = False ,
    pretrained_model = False ,
    qaft = False ,
    ptq = False ,
    percentile = 0.9999 ,
)

# if ptq == False, do qat/qaft, need train
# if ptq == True, do ptq, don't need train
# you can refer to micronet/compression/quantization/wqaq/iao/main.py

print ( "***ori_model*** n " , lenet )
print ( " n ***quant_model_dorefa*** n " , quant_lenet_dorefa )
print ( " n ***quant_model_iao*** n " , quant_lenet_iao )

print ( " n quant_model is ready" )
print ( "micronet is ready" )

test

quant_test_manual

python -c " import micronet; micronet.quant_test_manual() "

quant_test_auto

python -c " import micronet; micronet.quant_test_auto() "

when outputting "quant_model is ready", micronet is ready.

量化推理

參考BN融合與量化推理仿真測試

模型壓縮數據對比（僅供參考）

以下為cifar10示例，可在更冗餘模型、更大數據集上嘗試其他組合壓縮方式

類型	W(Bits)	A(Bits)	Acc	GFLOPs	Para(M)	Size(MB)	壓縮率	損失
原模型(nin)	FP32	FP32	91.01%	0.15	0.67	2.68	***	***
採用分組卷積結構(nin_gc)	FP32	FP32	91.04%	0.15	0.58	2.32	13.43%	-0.03%
剪枝	FP32	FP32	90.26%	0.09	0.32	1.28	52.24%	0.75%
量化	1	FP32	90.93%	***	0.58	0.204	92.39%	0.08%
量化	1.5	FP32	91%	***	0.58	0.272	89.85%	0.01%
量化	1	1	86.23%	***	0.58	0.204	92.39%	4.78%
量化	1.5	1	86.48%	***	0.58	0.272	89.85%	4.53%
量化(DoReFa)	8	8	91.03%	***	0.58	0.596	77.76%	-0.02%
量化(IAO,全量化,symmetric/per-channel/bn_fuse)	8	8	90.99%	***	0.58	0.596	77.76%	0.02%
分組+剪枝+量化	1.5	1	86.13%	***	0.32	0.19	92.91%	4.88%

--train_batch_size 256, 單卡

後續

tensorrt完整demo
其他壓縮算法(量化/剪枝/蒸餾/NAS等)
其他部署框架(mnn/tnn/tengine等)
壓縮—> 部署

展開

micronet

micronet

項目簡介

壓縮

部署

代碼結構

項目進展

環境要求

安裝

測試

壓縮

量化

wbwtab

wqaq

dorefa

iao

剪枝

稀疏訓練

剪枝

微調

剪枝—> 量化（注意剪枝率和量化率平衡）

剪枝—> 量化（高位）（剪枝率偏大、量化率偏小）

w8a8(dorefa)

w8a8(iao)

其他可選量化配置類比

剪枝—> 量化（低位）（剪枝率偏小、量化率偏大）

wbab

其他取值情況類比

BN融合與量化推理仿真測試

wbwtab

bn_fuse(得到quant_model_train和quant_bn_fused_model_inference的結構和參數)

bn_fused_model_test(對quant_model_train和quant_bn_fused_model_inference進行測試)

dorefa

quant_model_para(得到quant_model_train和quant_model_inference的結構和參數)

quant_model_test(對quant_model_train和quant_model_inference進行測試)

iao

bn_fuse(得到quant_bn_fused_model_train和quant_bn_fused_model_inference的結構和參數)

bn_fused_model_test(對quant_bn_fused_model_train和quant_bn_fused_model_inference進行測試)

設備選取

部署

TensorRT

相關解讀

遷移

量化訓練

LeNet example

quant_test_manual.py

quant_test_auto.py

test

quant_test_manual

quant_test_auto

量化推理

模型壓縮數據對比（僅供參考）

相關資料

壓縮

量化

QAT

二值

三值

High-Bit

PTQ

High-Bit

剪枝

適配專用芯片的模型壓縮

部署

TensorRT

後續