PPQ is a scalable, high-performance, neural network quantization tool for industrial applications.
Neural network quantization, as a commonly used neural network acceleration solution, has been widely used since 2016. Compared with neural network pruning and architecture search, network quantization is more versatile and has high industrial practical value. Especially for the end-side chip, in scenarios where both on-chip area and power consumption are limited, we always want to convert all floating-point operations into fixed-point operations. The value of quantitative technology lies in the fact that floating-point arithmetic and memory fetching are very expensive, and it relies on complex floating-point arithmetic and high memory fetching bandwidth. If we can approximate floating point results using a fixed-point operation with lower bit widths within an acceptable range, this will give us significant advantages in chip circuit design, system power consumption, system latency and throughput.
We are in the tide of the times, and artificial intelligence based on neural networks is developing rapidly, and technologies such as image recognition, image super resolution, content generation, model reconstruction are changing our lives. What comes with it is the ever-changing model structure, which has become the first difficulty before model quantification and deployment. To deal with complex structures, we designed a complete computational graph logic structure and graph scheduling logic. These efforts enable PPQ to parse and modify complex model structures, automatically determine the quantization and non-quantization areas in the network, and allow users to manually control the scheduling logic.
The quantization and performance optimization of the network are serious engineering problems. We hope that users can participate in the quantization and deployment of the network and participate in the performance optimization of the neural network. To this end, we provide corresponding deployment-related learning materials in Github, and deliberately emphasize the flexibility of the interface in software design. Through our continuous attempts and explorations, we abstracted the logic type of quantizer, responsible for initializing quantization strategies on different hardware platforms, and allowed users to customize the quantization bit width, quantization granularity and calibration algorithms of each operator and each tensor in the network. We reorganize the quantitative logic into 27 independent quantitative optimization processes. PPQ users can arbitrarily combine the optimization process according to their needs to complete highly flexible quantitative tasks. As a user of PPQ, you can add and modify all optimization processes according to your needs and explore new boundaries of quantitative technology.
This is a framework created to handle complex quantization tasks - PPQ's execution engine is specially designed for quantization. As of PPQ version 0.6.6, the software has built-in 99 common Onnx operator execution logic and natively supports quantitative simulation operations during execution. PPQ can complete the inference and quantification of the Onnx model without Onnxruntime. As part of the architectural design, we allow users to register new operator implementations for PPQ using Python + Pytorch or C++ / Cuda, and the new logic can also replace existing operator implementation logic. PPQ allows the same operator to have different execution logic on different platforms, thereby supporting the operational simulation of different hardware platforms. With the help of customized execution engines and high-performance implementation of PPQ Cuda Kernel, PPQ has extremely significant performance advantages and can often complete quantitative tasks with amazing efficiency.
PPQ development is closely related to the inference framework, which allows us to understand many details of hardware inference and thus strictly control hardware simulation errors. With the joint efforts of many open source workers at home and abroad, PPQ currently supports collaborative work with multiple inference frameworks such as TensorRT, OpenPPL, Openvino, ncnn, mnn, Onnxruntime, Tengine, Snpe, GraphCore, Metax, etc., and prefabricated corresponding quantizers and export logic. PPQ is a highly scalable model quantization framework. With the function functionality in ppq.lib, you can extend the quantization capabilities of PPQ to other possible hardware and reasoning libraries. We look forward to working with you to bring artificial intelligence to thousands of households.
Install CUDA from CUDA Toolkit
Install Complier
apt-get install ninja-build # for debian/ubuntu user
yum install ninja-build # for redhat/centos userFor Windows User:
(1) Download ninja.exe from https://github.com/ninja-build/ninja/releases, add it to Windows PATH.
(2) Install Visual Studio 2019 from https://visualstudio.microsoft.com.
(3) Add your C++ compiler to Windows PATH Environment, if you are using Visual Studio, it should be like "C:Program FilesMicrosoft Visual Studio2019CommunityVCToolsMSVC14.16.27023binHostx86x86"
(4) Update PyTorch version to 1.10+.
git clone https://github.com/openppl-public/ppq.git
cd ppq
pip install -r requirements.txt
python setup.py installdocker pull stephen222/ppq:ubuntu18.04_cuda11.4_cudnn8.4_trt8.4.1.5
docker run -it --rm --ipc=host --gpus all --mount type=bind,source=your custom path,target=/workspace stephen222/ppq:ubuntu18.04_cuda11.4_cudnn8.4_trt8.4.1.5 /bin/bash
git clone https://github.com/openppl-public/ppq.git
cd ppq
export PYTHONPATH= ${PWD} : ${PYTHONPATH}python3 -m pip install ppq| Description | Link Link | |
|---|---|---|
| 01 | Model quantization | onnx, caffe, pytorch |
| 02 | Actuator | executor |
| 03 | Error analysis | analyser |
| 04 | Calibrator | Calibration |
| 05 | Network fine-tuning | finetune |
| 06 | Network Scheduling | dispatch |
| 07 | Best Practices | Best Practice |
| 08 | Target platform | platform |
| 09 | Optimization process | Optim |
| 10 | Picture fusion | Fusion |
| Description | Link Link | |
|---|---|---|
| 01 | QuantSimplifyPass (general quantitative streamlining process) | doc |
| 02 | QuantFusionPass (general quantitative graph fusion process) | doc |
| 03 | QuantAlignmentPass (general quantization alignment process) | doc |
| 04 | RuntimeCalibrationPass (parameter calibration process) | doc |
| 05 | BiasCorrectionPass (Bias correction process) | doc |
| 06 | QuantSimplifyPass (general quantitative streamlining process) | doc |
| 07 | LayerwiseEqualizationPass (inter-layer weight equalization process) | doc |
| 08 | LayerSpilitPass (operator splitting process) | doc |
| 09 | LearnedStepSizePass (network fine-tuning process) | doc |
| 10 | Other(Other) | Refer to |
| Desc Introduction | Link Link | |
|---|---|---|
| 01 | Basics of Computer Architecture | link |
| 02 | Network performance analysis | link |
| 03 | Quantitative calculation principle | part1, part2 |
| 04 | Graph optimization and quantitative simulation | link |
| 05 | Graph Scheduling and Pattern Match | link |
| 06 | Neural Network Deployment | link |
| 07 | Quantitative parameter selection | link |
| 08 | Quantitative Error Propagation Analysis | link |
| Examples | Network deployment platform (Platform) | Input model format (Format) | Link (Link) | Related videos (Video) |
|---|---|---|---|---|
TensorRT | ||||
| Use Torch2trt to speed up your network | pytorch | pytorch | link | link |
| TensorRT Quantitative Training | TensorRT | pytorch | link | link |
| TensorRT post-training quantization (PPQ) | TensorRT | onnx | 1. Quant with TensorRT OnnxParser 2. Quant with TensorRT API | |
| TensorRT fp32 deployment | TensorRT | onnx | link | link |
| TensorRT Performance Comparison | TensorRT | pytorch | link | link |
| TensorRT Profiler | TensorRT | pytorch | link | link |
onnxruntime | ||||
| Use onnxruntime to speed up your network | onnxruntime | onnx | link | link |
| onnx post-training quantization (PPQ) | onnxruntime | onnx | link | link |
| onnxruntime performance comparison | onnxruntime | pytorch | link | link |
openvino | ||||
| Use openvino to speed up your network | openvino | onnx | link | |
| openvino quantitative training | openvino | pytorch | link | |
| Openvino Post-training Quantification (PPQ) | openvino | onnx | link | |
| openvino performance comparison | openvino | pytorch | link | |
snpe | ||||
| Post-snpe training quantization (PPQ) | snpe | caffe | link | |
ncnn | ||||
| ncnn post-training quantization (PPQ) | ncnn | onnx | link | |
OpenPPL | ||||
| ppl cuda post-training quantization (PPQ) | ppl cuda | onnx | link |
| Desc Introduction | Link Link | |
|---|---|---|
| 01 | PPQ Quantitative Execution Process | link |
| 02 | PPQ Network Analysis | link |
| 03 | PPQ Quantitative Graph Scheduling | link |
| 04 | PPQ Target Platform and TQC | link |
| 05 | PPQ quantizer | link |
| 06 | PPQ Quantitative Optimization Process | link |
| 07 | PPQ Quantitative Function | link |
| WeChat Official Account | QQ Group |
|---|---|
| OpenPPL | 627853444 |
![]() | ![]() |
Email: [email protected]
We appreciate all contributions. If you are planning to contribute back bug fixes, please do so without any further discussion.
If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.
PPQ is tested with models from mmlab-classification, mmlab-detection, mmlab-segamentation, mmlab-editing, here we listed part of out testing result.
| Model | Type | Calibration | Dispatcher | Metric | PPQ(sim) | PPLCUDA | FP32 |
|---|---|---|---|---|---|---|---|
| Resnet-18 | Classification | 512 imgs | Conservative | Acc-Top-1 | 69.50% | 69.42% | 69.88% |
| ResNeXt-101 | Classification | 512 imgs | Conservative | Acc-Top-1 | 78.46% | 78.37% | 78.66% |
| SE-ResNet-50 | Classification | 512 imgs | Conservative | Acc-Top-1 | 77.24% | 77.26% | 77.76% |
| ShuffleNetV2 | Classification | 512 imgs | Conservative | Acc-Top-1 | 69.13% | 68.85% | 69.55% |
| MobileNetV2 | Classification | 512 imgs | Conservative | Acc-Top-1 | 70.99% | 71.1% | 71.88% |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| retinanet | Detection | 32 imgs | pplnn | bbox_mAP | 36.1% | 36.1% | 36.4% |
| faster_rcnn | Detection | 32 imgs | pplnn | bbox_mAP | 36.6% | 36.7% | 37.0% |
| fsaf | Detection | 32 imgs | pplnn | bbox_mAP | 36.5% | 36.6% | 37.4% |
| mask_rcnn | Detection | 32 imgs | pplnn | bbox_mAP | 37.7% | 37.6% | 37.9% |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| deepabv3 | Segmentation | 32 imgs | Conservative | aAcc / mIoU | 96.13% / 78.81% | 96.14% / 78.89% | 96.17% / 79.12% |
| deepabv3plus | Segmentation | 32 imgs | Conservative | aAcc / mIoU | 96.27% / 79.39% | 96.26% / 79.29% | 96.29% / 79.60% |
| fcn | Segmentation | 32 imgs | Conservative | aAcc / mIoU | 95.75% / 74.56% | 95.62% / 73.96% | 95.68% / 72.35% |
| pspnet | Segmentation | 32 imgs | Conservative | aAcc / mIoU | 95.79% / 77.40% | 95.79% / 77.41% | 95.83% / 77.74% |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| srcnn | Editing | 32 imgs | Conservative | PSNR / SSIM | 27.88% / 79.70% | 27.88% / 79.07% | 28.41% / 81.06% |
| esrgan | Editing | 32 imgs | Conservative | PSNR / SSIM | 27.84% / 75.20% | 27.49% / 72.90% | 27.51% / 72.84% |

This project is distributed under the Apache License, Version 2.0.