Stable-Diffusion implemented by ncnn framework based on C++, supported txt2img and img2img!
Zhihu: https://zhuanlan.zhihu.com/p/582552276
Video: https://www.bilibili.com/video/BV15g411x7Hc
txt2img Performance (time pre-it and ram)
| per-it | i7-12700 (512x512) | i7-12700 (256x256) | Snapdragon865 (256x256) |
|---|---|---|---|
| slow | 4.85s/5.24G(7.07G) | 1.05s/3.58G(4.02G) | 1.6s/2.2G(2.6G) |
| fast | 2.85s/9.47G(11.29G) | 0.65s/5.76G(6.20G) |
2023-03-11: happy to add img2img android and release new apk
2023-03-10: happy to add img2img x86
2023-01-19: speed up & less ram in x86, dynamic shape in x86
2023-01-12: update to the latest ncnn code and use optimize model, update android, add memory monitor
2023-01-05: add 256x256 model to x86 project
2023-01-04: merge and finish the mha op in x86, enable fast gelu

All models and exe file you can download from 百度网盘 or Google Drive or Release
If you only need ncnn model, you can search it from 硬件模型库-设备专用模型, it would be more faster and free.
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin, AutoencoderKL-encoder-512-512-fp16.bin and put them to assets foldermagic.txt, each line are:
stable-diffusion.exe
Note: Please comply with the requirements of the SD model and do not use it for illegal purposes
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin, AutoencoderKL-encoder-512-512-fp16.bin and put them to assets foldercd x86/linux
mkdir -p build && cd build
cmake ..
make -j$(nproc)AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin and put them to build/assets folder./stable-diffusion-ncnnAutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin and put them to assets folderI've uploaded the three onnx models used by Stable-Diffusion, so that you can do some interesting work.
You can find them from the link above.
ncnn (input & output): token, multiplier, cond, conds
onnx (input & output): onnx::Reshape_0, 2271
z = onnx(onnx::Reshape_0=token)
origin_mean = z.mean()
z *= multiplier
new_mean = z.mean()
z *= origin_mean / new_mean
conds = torch.concat([cond,z], dim=-2)ncnn (input & output): in0, in1, in2, c_in, c_out, outout
onnx (input & output): x, t, cc, out
outout = in0 + onnx(x=in0 * c_in, t=in1, cc=in2) * c_out