ppq Download - ppq Source code download

ppq

Python

v0.6.6

Download

PPL Quantization Tool 0.6.6 (PPL Quantization Tool)

PPQ is a scalable, high-performance, neural network quantization tool for industrial applications.

Neural network quantization, as a commonly used neural network acceleration solution, has been widely used since 2016. Compared with neural network pruning and architecture search, network quantization is more versatile and has high industrial practical value. Especially for the end-side chip, in scenarios where both on-chip area and power consumption are limited, we always want to convert all floating-point operations into fixed-point operations. The value of quantitative technology lies in the fact that floating-point arithmetic and memory fetching are very expensive, and it relies on complex floating-point arithmetic and high memory fetching bandwidth. If we can approximate floating point results using a fixed-point operation with lower bit widths within an acceptable range, this will give us significant advantages in chip circuit design, system power consumption, system latency and throughput.

We are in the tide of the times, and artificial intelligence based on neural networks is developing rapidly, and technologies such as image recognition, image super resolution, content generation, model reconstruction are changing our lives. What comes with it is the ever-changing model structure, which has become the first difficulty before model quantification and deployment. To deal with complex structures, we designed a complete computational graph logic structure and graph scheduling logic. These efforts enable PPQ to parse and modify complex model structures, automatically determine the quantization and non-quantization areas in the network, and allow users to manually control the scheduling logic.

The quantization and performance optimization of the network are serious engineering problems. We hope that users can participate in the quantization and deployment of the network and participate in the performance optimization of the neural network. To this end, we provide corresponding deployment-related learning materials in Github, and deliberately emphasize the flexibility of the interface in software design. Through our continuous attempts and explorations, we abstracted the logic type of quantizer, responsible for initializing quantization strategies on different hardware platforms, and allowed users to customize the quantization bit width, quantization granularity and calibration algorithms of each operator and each tensor in the network. We reorganize the quantitative logic into 27 independent quantitative optimization processes. PPQ users can arbitrarily combine the optimization process according to their needs to complete highly flexible quantitative tasks. As a user of PPQ, you can add and modify all optimization processes according to your needs and explore new boundaries of quantitative technology.

This is a framework created to handle complex quantization tasks - PPQ's execution engine is specially designed for quantization. As of PPQ version 0.6.6, the software has built-in 99 common Onnx operator execution logic and natively supports quantitative simulation operations during execution. PPQ can complete the inference and quantification of the Onnx model without Onnxruntime. As part of the architectural design, we allow users to register new operator implementations for PPQ using Python + Pytorch or C++ / Cuda, and the new logic can also replace existing operator implementation logic. PPQ allows the same operator to have different execution logic on different platforms, thereby supporting the operational simulation of different hardware platforms. With the help of customized execution engines and high-performance implementation of PPQ Cuda Kernel, PPQ has extremely significant performance advantages and can often complete quantitative tasks with amazing efficiency.

PPQ development is closely related to the inference framework, which allows us to understand many details of hardware inference and thus strictly control hardware simulation errors. With the joint efforts of many open source workers at home and abroad, PPQ currently supports collaborative work with multiple inference frameworks such as TensorRT, OpenPPL, Openvino, ncnn, mnn, Onnxruntime, Tengine, Snpe, GraphCore, Metax, etc., and prefabricated corresponding quantizers and export logic. PPQ is a highly scalable model quantization framework. With the function functionality in ppq.lib, you can extend the quantization capabilities of PPQ to other possible hardware and reasoning libraries. We look forward to working with you to bring artificial intelligence to thousands of households.

In the 0.6.6 version update, we have brought you these features:

FP8 quantitative specification, PPQ now supports FP8 quantitative simulation and training with various specifications such as E4M3, E5M2, etc.
PFL basic class library, PPQ now provides a more basic set of API functions to help you complete more flexible quantization
More powerful graph pattern matching and graph fusion functions
Onnx-based model QAT features
New TensorRT quantization and export logic
OnnxQuant, the world's largest quantitative model library
Other unknown software features

Installation (Installation method)

Install CUDA from CUDA Toolkit
Install Complier

apt-get install ninja-build # for debian/ubuntu user
yum install ninja-build # for redhat/centos user

For Windows User:

(1) Download ninja.exe from https://github.com/ninja-build/ninja/releases, add it to Windows PATH.

(2) Install Visual Studio 2019 from https://visualstudio.microsoft.com.

(3) Add your C++ compiler to Windows PATH Environment, if you are using Visual Studio, it should be like "C:Program FilesMicrosoft Visual Studio2019CommunityVCToolsMSVC14.16.27023binHostx86x86"

(4) Update PyTorch version to 1.10+.

Install PPQ

git clone https://github.com/openppl-public/ppq.git
cd ppq
pip install -r requirements.txt
python setup.py install

Install PPQ from our docker image (optional):

docker pull stephen222/ppq:ubuntu18.04_cuda11.4_cudnn8.4_trt8.4.1.5

docker run -it --rm --ipc=host --gpus all --mount type=bind,source=your custom path,target=/workspace stephen222/ppq:ubuntu18.04_cuda11.4_cudnn8.4_trt8.4.1.5 /bin/bash

git clone https://github.com/openppl-public/ppq.git
cd ppq
export PYTHONPATH= ${PWD} : ${PYTHONPATH}

Install PPQ using pip (optional):

python3 -m pip install ppq

Learning Path (Learning Route)

PPQ basic usage and sample scripts

	Description	Link Link
01	Model quantization	onnx, caffe, pytorch
02	Actuator	executor
03	Error analysis	analyser
04	Calibrator	Calibration
05	Network fine-tuning	finetune
06	Network Scheduling	dispatch
07	Best Practices	Best Practice

08	Target platform	platform
09	Optimization process	Optim
10	Picture fusion	Fusion

PPQ optimization process documentation

	Description	Link Link
01	QuantSimplifyPass (general quantitative streamlining process)	doc
02	QuantFusionPass (general quantitative graph fusion process)	doc
03	QuantAlignmentPass (general quantization alignment process)	doc
04	RuntimeCalibrationPass (parameter calibration process)	doc
05	BiasCorrectionPass (Bias correction process)	doc
06	QuantSimplifyPass (general quantitative streamlining process)	doc
07	LayerwiseEqualizationPass (inter-layer weight equalization process)	doc
08	LayerSpilitPass (operator splitting process)	doc
09	LearnedStepSizePass (network fine-tuning process)	doc
10	Other(Other)	Refer to

Video information

	Desc Introduction	Link Link
01	Basics of Computer Architecture	link
02	Network performance analysis	link
03	Quantitative calculation principle	part1, part2
04	Graph optimization and quantitative simulation	link
05	Graph Scheduling and Pattern Match	link
06	Neural Network Deployment	link
07	Quantitative parameter selection	link
08	Quantitative Error Propagation Analysis	link

Quantitative deployment tutorial

Examples	Network deployment platform (Platform)	Input model format (Format)	Link (Link)	Related videos (Video)
`TensorRT`
Use Torch2trt to speed up your network	pytorch	pytorch	link	link
TensorRT Quantitative Training	TensorRT	pytorch	link	link
TensorRT post-training quantization (PPQ)	TensorRT	onnx	1. Quant with TensorRT OnnxParser 2. Quant with TensorRT API
TensorRT fp32 deployment	TensorRT	onnx	link	link
TensorRT Performance Comparison	TensorRT	pytorch	link	link
TensorRT Profiler	TensorRT	pytorch	link	link
`onnxruntime`
Use onnxruntime to speed up your network	onnxruntime	onnx	link	link
onnx post-training quantization (PPQ)	onnxruntime	onnx	link	link
onnxruntime performance comparison	onnxruntime	pytorch	link	link
`openvino`
Use openvino to speed up your network	openvino	onnx	link
openvino quantitative training	openvino	pytorch	link
Openvino Post-training Quantification (PPQ)	openvino	onnx	link
openvino performance comparison	openvino	pytorch	link
`snpe`
Post-snpe training quantization (PPQ)	snpe	caffe	link
`ncnn`
ncnn post-training quantization (PPQ)	ncnn	onnx	link
`OpenPPL`
ppl cuda post-training quantization (PPQ)	ppl cuda	onnx	link

Dive into PPQ In-depth understanding of the quantitative framework

	Desc Introduction	Link Link
01	PPQ Quantitative Execution Process	link
02	PPQ Network Analysis	link
03	PPQ Quantitative Graph Scheduling	link
04	PPQ Target Platform and TQC	link
05	PPQ quantizer	link
06	PPQ Quantitative Optimization Process	link
07	PPQ Quantitative Function	link

Contact Us

WeChat Official Account	QQ Group
OpenPPL	627853444

Email: [email protected]

Other Resources

Sensetime Parrots
Sensetime Parrots Primitive Libraries
Sensetime mmlab

Contributions

We appreciate all contributions. If you are planning to contribute back bug fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

Benchmark

PPQ is tested with models from mmlab-classification, mmlab-detection, mmlab-segamentation, mmlab-editing, here we listed part of out testing result.

No quantization optimization procedure is applied with following models.

Model	Type	Calibration	Dispatcher	Metric	PPQ(sim)	PPLCUDA	FP32
Resnet-18	Classification	512 imgs	Conservative	Acc-Top-1	69.50%	69.42%	69.88%
ResNeXt-101	Classification	512 imgs	Conservative	Acc-Top-1	78.46%	78.37%	78.66%
SE-ResNet-50	Classification	512 imgs	Conservative	Acc-Top-1	77.24%	77.26%	77.76%
ShuffleNetV2	Classification	512 imgs	Conservative	Acc-Top-1	69.13%	68.85%	69.55%
MobileNetV2	Classification	512 imgs	Conservative	Acc-Top-1	70.99%	71.1%	71.88%
----	----	----	----	----	----	----	----
retinanet	Detection	32 imgs	pplnn	bbox_mAP	36.1%	36.1%	36.4%
faster_rcnn	Detection	32 imgs	pplnn	bbox_mAP	36.6%	36.7%	37.0%
fsaf	Detection	32 imgs	pplnn	bbox_mAP	36.5%	36.6%	37.4%
mask_rcnn	Detection	32 imgs	pplnn	bbox_mAP	37.7%	37.6%	37.9%
----	----	----	----	----	----	----	----
deepabv3	Segmentation	32 imgs	Conservative	aAcc / mIoU	96.13% / 78.81%	96.14% / 78.89%	96.17% / 79.12%
deepabv3plus	Segmentation	32 imgs	Conservative	aAcc / mIoU	96.27% / 79.39%	96.26% / 79.29%	96.29% / 79.60%
fcn	Segmentation	32 imgs	Conservative	aAcc / mIoU	95.75% / 74.56%	95.62% / 73.96%	95.68% / 72.35%
pspnet	Segmentation	32 imgs	Conservative	aAcc / mIoU	95.79% / 77.40%	95.79% / 77.41%	95.83% / 77.74%
----	----	----	----	----	----	----	----
srcnn	Editing	32 imgs	Conservative	PSNR / SSIM	27.88% / 79.70%	27.88% / 79.07%	28.41% / 81.06%
esrgan	Editing	32 imgs	Conservative	PSNR / SSIM	27.84% / 75.20%	27.49% / 72.90%	27.51% / 72.84%

PPQ(sim) stands for PPQ quantization simulator's result.
Dispatcher stands for dispatching policy of PPQ.
Classification models are evaluated with ImageNet, Detection and Segmentation models are evaluated with the COCO dataset, Editing models are evaluated with DIV2K dataset.
All calibration datasets are randomly picked from training data.