Hi, Hello, friends! I am the author of KuiperInfer. As an open source course, KuiperInfer has won 2.5k stars on GitHub so far. Now based on the original course, we have launched the "Hand-On Hands-On Making Mockup Inference Framework". The new course supports Llama series of models (including the latest LLama3.2) and Qwen2.5 series of models, and supports Cuda acceleration and Int8 quantization , which has been widely praised since its launch.
https://l0kzvikuq0w.feishu.cn/docx/ZF2hd0xfAoaXqaxcpn2c5oHAnBc
If you are interested in big model reasoning, want to have an in-depth understanding and master relevant technologies, and want to stand out in school recruitment and autumn recruitment interviews, then this course "Hand-made Big Model Inference Framework" must not be missed. Come and join us and start your learning journey together! Interested students are welcome to scan the QR code below the course or add WeChat lyrry1997 to participate in the course

Lead you to create a deep learning reasoning framework with your own hands. Follow my B station space to get the latest video updates.
Follow this project and start with your own deep learning reasoning framework from scratch, you will gain the following:
Video course link: https://space.bilibili.com/1822828582
The second course is a reset version of the first course, and the content is more fulfilling and perfect. See the chapter below for the first course outline.
| Number of courses | schedule | Course link |
|---|---|---|
| Lecture 1 Project Preview and Environment Configuration | Finish | https://www.bilibili.com/video/BV118411f7yM |
| Lecture 2 Design and Implementation of Tensors | Finish | https://www.bilibili.com/video/BV1hN411k7q7 |
| Lecture 3 Definition of Computational Graph | Finish | https://www.bilibili.com/video/BV1vc411M7Yp |
| Lecture 4: Building Computational Graph Relationships and Execution Order | Finish | https://www.bilibili.com/video/BV19s4y1r7az |
| Lecture 5: Operators and registered workers in KuiperInfer | Finish | https://www.bilibili.com/video/BV1gx4y1o7pj |
| Lecture 6: Implementation of Convolution and Pooling Operators | Finish | https://www.bilibili.com/video/BV1hx4y197dS |
| Lecture 7: Lexical analysis and grammatical analysis and operator implementation in the expression layer | Finish | https://www.bilibili.com/video/BV1j8411o7ao |
| Lecture 8: The homemade reasoning framework supports the reasoning of Resnet networks | Finish | https://www.bilibili.com/video/BV1o84y1o7ni |
| Lecture 9: The homemade reasoning framework supports the reasoning of YoloV5 network | Finish | https://www.bilibili.com/video/BV1Qk4y1A7XL |
? KuiperInfer currently supports Unet network inference and uses carvana's pre-training weights
Reproducing the reasoning can refer to the demo running Kuiper at the end of the article
Demo directly uses the pre-trained weights (coco dataset) of yolov5-s and uses KuiperInfer to reason

I have a teaching course on Bilibili, and it is currently the first 13 courses in the course. The course outline is as follows, the homepage is: https://space.bilibili.com/1822828582. Everyone is welcome to follow and support. The way to enter the learning group is as shown in the QR code in the picture above.
| Number of courses | Main content | schedule | Course link |
|---|---|---|---|
| First class | Overall framework interpretation and development environment configuration | Finish | https://www.bilibili.com/video/BV1HV4y1A7H8/ |
| Second lesson | The analysis of the tensor Tensor class and the memory arrangement of input data | Finish | https://www.bilibili.com/video/BV1Ed4y1v7Gb/ |
| The third lesson | Initialize an instance of Tensor Tensor from a CSV file | Finish | https://www.bilibili.com/video/BV1Pg411J7V5/ |
| Lesson 4 | Handwritten the first operator Relu and complete the operator registration factory class | Finish | https://www.bilibili.com/video/BV1bG4y1J7sQ/ |
| Lesson 5 | The principle of Im2col and the implementation of convolution operator | Finish | https://www.bilibili.com/video/BV1F841137Ct |
| Lesson 6 | Complete the MaxPooling operator by drawing a cat and drawing a tiger | Finish | https://www.bilibili.com/video/BV1m3411S7yy |
| Lesson 7 | Graph Structure (PNNX) Explanation and Preliminary Calculation Graph | Finish | https://www.bilibili.com/video/BV1VW4y1V7vp |
| Lesson 8 | Read PNNX and build your own calculation diagram | Finish | https://www.bilibili.com/video/BV1HY4y1Z7S3 |
| Lesson 9 | The implementation of convolution operator and the principle of im2col accelerated calculation | Finish | https://www.bilibili.com/video/BV1F841137Ct |
| Lesson 10 | Explore the Tensor class again, build the graph relationship of the calculation graph and pre-allocate the input and output of the operator | Finish | https://www.bilibili.com/video/BV1M54y1K7AG |
| Lesson 11 | Operator execution process | Finish | https://www.bilibili.com/video/BV1wY411C7Kv |
| Lesson 12 | Use our homemade reasoning framework to complete the inference and classification of pictures of ResNet networks | Finish | https://www.bilibili.com/video/BV1jD4y1M772 |
| Lesson 13 | Support the reasoning of the Yolov5 model with a homemade reasoning framework | Finish | https://www.bilibili.com/video/BV1xs4y1J7t2 |
Thanks to the following students for their efforts to Kuiperinfer
This project is equivalent to the upstream or pre-research project of the course
Every feature here may become a knowledge point in the video course, whether it is developed by me or improved by other students.
Tips:
$DEVELOPMENT or specify -DDEVELOPMENT=ON in the cmake file. apt install cmake, libopenblas-dev, liblapack-dev, libarpack-dev, libsuperlu-devTips:
Please copy the absolute or relative address of the test.png image in tmp/unet/demo folder after compilation, and then run the inference program in the following format in build/demos
./unet_test test.png unet_demo.pnnx.param unet_demo.pnnx.binThe download address of the pnnx model: https://cowtransfer.com/s/09c7f337bab443
If the reasoning is successful, you will see the result of the splitting of the original image in the folder unet_output.jpg.
Please modify the following code in the yolo_test.cpp folder under the demos folder
const std::string& image_path = " imgs/car.jpg " ;
const std::string& param_path = " tmp/yolo/demo/yolov5s_batch8.pnnx.param " ;
const std::string& bin_path = " tmp/yolo/demo/yolov5s_batch8.pnnx.bin " ; image_path specifies the image directory, param_path is the model's parameter file, and bin_path is the model's weight file. Please replace it with your local path.
The model definition and weight download address are as follows: https://cowtransfer.com/s/9bc43e0905cb40
After compilation is completed, call ./build/demos/yolo_test in the project directory
Overall concept: Gradually optimize existing operators; develop unimplemented operators when needed
source is the source directory
test is a unit test directory, basically implementing the unit test rights of public method
benchmark is a google benchmark, which contains performance tests for MobilenetV3, Resnet18 and yolov5s.
15 core AMD EPYC 7543 (Xiaolong) 32-Core Processor (Docker container, the host has a total of 32 cores)
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Time-consuming and running five consecutive times and calculated in an average way
| input size | Model name | Computing equipment | time consuming |
|---|---|---|---|
| 224×224 batch = 8 | MobileNetV3Small | CPU(armadillo + openblas) | 6.76ms/image |
| 224×224 batch = 8 | ResNet18 | CPU(armadillo + openblas) | 23.53ms/image |
| 224×224 batch =16 | ResNet18 | CPU(armadillo + openblas) | 13.52ms/image |
| 640×640 batch = 8 | Yolov5nano | CPU(armadillo + openblas) | 78.37ms/image |
| 640×640 batch = 8 | Yolov5s | CPU(armadillo + openblas) | 177.54ms/image |
| 640×640 batch = 16 | Yolov5s | CPU(armadillo + openblas) | 134.57ms/image |
The reasoning framework NCNN has retained the BSD protocol of NCNN in the code referenced https://github.com/Tencent/ncnn
Excellent math library Openblas: https://github.com/xianyi/OpenBLAS
Excellent math library Armadillo: https://arma.sourceforge.net/docs.html
Caffe framework that inspires me: https://github.com/BVLC/caffe
fmath framework: https://github.com/herumi/fmath/