KuiperInfer Download - KuiperInfer Source code download

KuiperInfer

Python

新推了一门自制大模型推理框架课程，助力秋招

Download

News: New course release, "Hand-made Mass Model Inference Framework", full handwritten cuda operator, course framework supports LLama2 and 3.x and Qwen2.5 models

Hi, Hello, friends! I am the author of KuiperInfer. As an open source course, KuiperInfer has won 2.5k stars on GitHub so far. Now based on the original course, we have launched the "Hand-On Hands-On Making Mockup Inference Framework". The new course supports Llama series of models (including the latest LLama3.2) and Qwen2.5 series of models, and supports Cuda acceleration and Int8 quantization , which has been widely praised since its launch.

Course catalog of "Hand-made Mockup Inference Framework":

https://l0kzvikuq0w.feishu.cn/docx/ZF2hd0xfAoaXqaxcpn2c5oHAnBc

Course Advantages of "Hand-On Handmade Masque Inference Framework"

Use the latest C++ 20 standards to write code, with a unified and beautiful code style, and good error handling;
For excellent project management, we use CMake+Git to manage projects and connect with large companies;
Teach people how to design a modern C++ project, and teach you how to use unit tests and Benchmark to test and verify your projects;
CPU operator and CUDA dual backend implementation have very good support for new big models (LLama3 and Qwen series).

If you are interested in big model reasoning, want to have an in-depth understanding and master relevant technologies, and want to stand out in school recruitment and autumn recruitment interviews, then this course "Hand-made Big Model Inference Framework" must not be missed. Come and join us and start your learning journey together! Interested students are welcome to scan the QR code below the course or add WeChat lyrry1997 to participate in the course

Lead you to create a deep learning reasoning framework with your own hands. Follow my B station space to get the latest video updates.

Follow this project and start with your own deep learning reasoning framework from scratch, you will gain the following:

Learn the knowledge behind a deep learning framework, master the writing methods, debugging skills and engineering experience of modern C++ projects;
How to design and write a calculation diagram;
Implement common operators, convolution operators, pooling operators, fully connected operators, etc.;
Based on 3, learn common optimization methods to accelerate the execution of operators;
Finally, you will obtain your own reasoning framework, which can reason resnet, unet, yolov5, mobilenet and other models, which will be of great benefit to interviews and knowledge advancement.

Video course link: https://space.bilibili.com/1822828582

Course outline

The second course is a reset version of the first course, and the content is more fulfilling and perfect. See the chapter below for the first course outline.

Number of courses	schedule	Course link
Lecture 1 Project Preview and Environment Configuration	Finish	https://www.bilibili.com/video/BV118411f7yM
Lecture 2 Design and Implementation of Tensors	Finish	https://www.bilibili.com/video/BV1hN411k7q7
Lecture 3 Definition of Computational Graph	Finish	https://www.bilibili.com/video/BV1vc411M7Yp
Lecture 4: Building Computational Graph Relationships and Execution Order	Finish	https://www.bilibili.com/video/BV19s4y1r7az
Lecture 5: Operators and registered workers in KuiperInfer	Finish	https://www.bilibili.com/video/BV1gx4y1o7pj
Lecture 6: Implementation of Convolution and Pooling Operators	Finish	https://www.bilibili.com/video/BV1hx4y197dS
Lecture 7: Lexical analysis and grammatical analysis and operator implementation in the expression layer	Finish	https://www.bilibili.com/video/BV1j8411o7ao
Lecture 8: The homemade reasoning framework supports the reasoning of Resnet networks	Finish	https://www.bilibili.com/video/BV1o84y1o7ni
Lecture 9: The homemade reasoning framework supports the reasoning of YoloV5 network	Finish	https://www.bilibili.com/video/BV1Qk4y1A7XL

Demo effect

Unet semantic segmentation

? KuiperInfer currently supports Unet network inference and uses carvana's pre-training weights

Reproducing the reasoning can refer to the demo running Kuiper at the end of the article

Yolov5 target detection

Demo directly uses the pre-trained weights (coco dataset) of yolov5-s and uses KuiperInfer to reason

First course outline

I have a teaching course on Bilibili, and it is currently the first 13 courses in the course. The course outline is as follows, the homepage is: https://space.bilibili.com/1822828582. Everyone is welcome to follow and support. The way to enter the learning group is as shown in the QR code in the picture above.

Number of courses	Main content	schedule	Course link
First class	Overall framework interpretation and development environment configuration	Finish	https://www.bilibili.com/video/BV1HV4y1A7H8/
Second lesson	The analysis of the tensor Tensor class and the memory arrangement of input data	Finish	https://www.bilibili.com/video/BV1Ed4y1v7Gb/
The third lesson	Initialize an instance of Tensor Tensor from a CSV file	Finish	https://www.bilibili.com/video/BV1Pg411J7V5/
Lesson 4	Handwritten the first operator Relu and complete the operator registration factory class	Finish	https://www.bilibili.com/video/BV1bG4y1J7sQ/
Lesson 5	The principle of Im2col and the implementation of convolution operator	Finish	https://www.bilibili.com/video/BV1F841137Ct
Lesson 6	Complete the MaxPooling operator by drawing a cat and drawing a tiger	Finish	https://www.bilibili.com/video/BV1m3411S7yy
Lesson 7	Graph Structure (PNNX) Explanation and Preliminary Calculation Graph	Finish	https://www.bilibili.com/video/BV1VW4y1V7vp
Lesson 8	Read PNNX and build your own calculation diagram	Finish	https://www.bilibili.com/video/BV1HY4y1Z7S3
Lesson 9	The implementation of convolution operator and the principle of im2col accelerated calculation	Finish	https://www.bilibili.com/video/BV1F841137Ct
Lesson 10	Explore the Tensor class again, build the graph relationship of the calculation graph and pre-allocate the input and output of the operator	Finish	https://www.bilibili.com/video/BV1M54y1K7AG
Lesson 11	Operator execution process	Finish	https://www.bilibili.com/video/BV1wY411C7Kv
Lesson 12	Use our homemade reasoning framework to complete the inference and classification of pictures of ResNet networks	Finish	https://www.bilibili.com/video/BV1jD4y1M772
Lesson 13	Support the reasoning of the Yolov5 model with a homemade reasoning framework	Finish	https://www.bilibili.com/video/BV1xs4y1J7t2

Project Contribution

Contributor List

Thanks to the following students for their efforts to Kuiperinfer

zjhellofss
liuxubit
Azusachan
wfs2010
mlmz
Tigerrr07
zyt1024
zpye
cmcamdy
superCB
sanbuphy
TypeFloat
Jasmine-up
PerrySkywalker
Delve-wang
z-learner
Meihongtao

How to participate in project contributions?

Submit code to add new features or modify bugs;
Make particularly useful suggestions;
Improve documentation or add unit tests.

The relationship between this project and video course

This project is equivalent to the upstream or pre-research project of the course
Every feature here may become a knowledge point in the video course, whether it is developed by me or improved by other students.

Technology and development environment used

Development Language: C++ 17
Mathematics library: Armadillo + OpenBlas (or faster Intel MKL)
Acceleration library: OpenMP
Unit Test: Google Test
Performance Test: Google Benchmark

Installation process (using Docker)

docker pull registry.cn-hangzhou.aliyuncs.com/hellofss/kuiperinfer:latest
sudo docker run -t -i registry.cn-hangzhou.aliyuncs.com/hellofss/kuiperinfer:latest /bin/bash
cd code
git clone --recursive https://github.com/zjhellofss/KuiperInfer.git
cd KuiperInfer
git checkout -b Your new branch study_version_0.02 (If you want to copy the code for the project, please use this step to switch to study tag)
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DDEVELOPMENT=OFF ..
make -j$(nproc)

Tips:

If you need to develop KuiperInfer , please use git clone --recursive https://github.com/zjhellofss/KuiperInfer.git to download the subfolder tmp at the same time, and set $DEVELOPMENT or specify -DDEVELOPMENT=ON in the cmake file.
If the domestic Internet speed is stuttering , please use git clone https://gitee.com/fssssss/KuiperInferGitee.git
If you want a faster run experience , recompile openblas or apt install intel-mkl natively

Installation process (building Docker image)

docker build -t kuiperinfer:latest .
docker run --name kuiperinfer -it kuiperinfer:latest /bin/bash
cd /app
Refer to Steps 4-10 of the above installation process for the rest of the steps

Installation process (docker is not used)

git clone --recursive https://github.com/zjhellofss/KuiperInfer.git
git checkout -b Your new branch study_version_0.01 (If you want to copy the code for the project, please use this step to switch to study tag)
Install the necessary environment (openblas is recommended to compile and install, which can achieve faster running speed, or use apt install intel-mkl instead of openblas)

 apt install cmake, libopenblas-dev, liblapack-dev, libarpack-dev, libsuperlu-dev

Download and compile armadillo https://arma.sourceforge.net/download.html
Compile and install gloggoogle testgoogle benchmark
The remaining steps are consistent with the above

Tips:

During the compilation process of google benchmark, if you encounter an error about gtest missing, you can turn off the gtest option in the cmake of google benchmark

Run the demo of Kuiper

Run Unet's reasoning

Please copy the absolute or relative address of the test.png image in tmp/unet/demo folder after compilation, and then run the inference program in the following format in build/demos

./unet_test test.png unet_demo.pnnx.param unet_demo.pnnx.bin

The download address of the pnnx model: https://cowtransfer.com/s/09c7f337bab443

If the reasoning is successful, you will see the result of the splitting of the original image in the folder unet_output.jpg.

Reasoning for running Yolov5

Please modify the following code in the yolo_test.cpp folder under the demos folder

 const std::string& image_path = " imgs/car.jpg " ;
const std::string& param_path = " tmp/yolo/demo/yolov5s_batch8.pnnx.param " ;
const std::string& bin_path = " tmp/yolo/demo/yolov5s_batch8.pnnx.bin " ;

image_path specifies the image directory, param_path is the model's parameter file, and bin_path is the model's weight file. Please replace it with your local path.
The model definition and weight download address are as follows: https://cowtransfer.com/s/9bc43e0905cb40
After compilation is completed, call ./build/demos/yolo_test in the project directory

Operators that have been supported

Overall concept: Gradually optimize existing operators; develop unimplemented operators when needed

Convolution
AdaptivePooling
MaxPooling
Expression (abstract syntax tree)
Flatten, View (dimensional flattening and deformation)
Sigmoid
HardSigmoid
HardSwish
ReLU
Linear(matrix multiplication)
Softmax
BatchNorm
Upsample
SiLU
Concat
ConvTranspose

Performance Testing

Test equipment

15 core AMD EPYC 7543 (Xiaolong) 32-Core Processor (Docker container, the host has a total of 32 cores)

Compilation environment

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Performance results

Time-consuming and running five consecutive times and calculated in an average way

input size	Model name	Computing equipment	time consuming
224×224 batch = 8	MobileNetV3Small	CPU(armadillo + openblas)	6.76ms/image
224×224 batch = 8	ResNet18	CPU(armadillo + openblas)	23.53ms/image
224×224 batch =16	ResNet18	CPU(armadillo + openblas)	13.52ms/image
640×640 batch = 8	Yolov5nano	CPU(armadillo + openblas)	78.37ms/image
640×640 batch = 8	Yolov5s	CPU(armadillo + openblas)	177.54ms/image
640×640 batch = 16	Yolov5s	CPU(armadillo + openblas)	134.57ms/image