stable diffusion.cpp下载 - stable diffusion.cpp源代码下载

稳定扩散

纯C/C ++的稳定扩散和通量的推断

特征

基于GGML的普通C/C ++实现，以与Llama.cpp相同的方式工作
超轻量级，没有外部依赖性
SD1.X，SD2.X，SDXL和SD3/SD3.5支持
- ！！！ SDXL中的VAE遇到FP16下的NAN问题，但不幸的是，GGML_CONV_2D仅在FP16下运行。因此，需要参数来指定已解决FP16 NAN问题的VAE。您可以在这里找到它：SDXL VAE FP16修复。
通量-DEV/FLUX-SCHNELL支持
SD-Turbo和SDXL-Turbo支持
摄影制造商支持。
16位，32位浮动支撑
2位，3位，4位，5位和8位整数量化支持
加速记忆有效的CPU推理
- 仅在使用fp16精度的txt2img生成512x512图像时只需要〜2.3GB，启用闪光灯的注意力只需要〜1.8GB。
AVX，AVX2和AVX512支持X86架构
完整的CUDA，金属，Vulkan和SYCL后端，用于GPU加速。
可以加载CKPT，SafetEnsor和扩散器模型/检查点。独立的VAE模型
- 无需再转换为.ggml或.gguf ！
注意记忆使用优化的注意力
原始txt2img和img2img模式
负提示
稳定的扩散 - webui样式令牌（不是所有功能，目前只有令牌加权）
洛拉支持，与稳定的扩散 - webui相同
潜在的一致性模型支持（LCM/LCM-LORA）
taesd的更快和内存有效的潜在解码
Esrgan生成的高档图像
VAE瓷砖处理以减少内存使用
SD 1.5的控制网支持
采样方法
- Euler A
- Euler
- Heun
- DPM2
- DPM++ 2M
- DPM++ 2M v2
- DPM++ 2S a
- LCM
跨平台可重复性（ --rng cuda ，与stable-diffusion-webui GPU RNG一致）
将生成参数嵌入到png输出中，为webui兼容文本字符串
支持的平台
- Linux
- Mac OS
- 视窗
- Android（通过Termux）

托多

更多采样方法
更快地进行推理
- GGML_CONV_2D的当前实现速度很慢，并且使用高内存使用率
继续减少内存使用情况（量化GGML_Conv_2d的权重）
实施钻头支持

用法

对于大多数用户，您可以从最新版本中下载构建的可执行程序。如果内置产品不符合您的要求，则可以选择手动构建它。

获取代码

 git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp

如果您已经克隆了存储库，则可以使用以下命令将存储库更新为最新代码。

 cd stable-diffusion.cpp
git pull origin master
git submodule init
git submodule update

下载权重

下载原始权重（.ckpt或.safetensors）。例如

从https://huggingface.co/compvis/stable-diffusion-viffusion-v-1-4-original发出的稳定扩散v1.4
稳定的扩散v1.5从https://huggingface.co/runwayml/stable-diffusion-v1-5
从https://huggingface.co/stocietationai/stable-diffusion-2-1发出稳定的分散v2.1
稳定的扩散3 2b从https://huggingface.co/stocietyai/stable-diffusion-3-medium

curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
# curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-nonema-pruned.safetensors
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors

建造

从头开始构建

mkdir build
cd build
cmake ..
cmake --build . --config Release

使用OpenBlas

 cmake .. -DGGML_OPENBLAS=ON
cmake --build . --config Release

使用Cublas

这提供了使用NVIDIA GPU的CUDA核心的BLA加速度。确保安装了CUDA工具包。您可以从Linux发行版的软件包管理器（例如apt install nvidia-cuda-toolkit ）下载它，也可以从此处下载：CUDA Toolkit。建议至少具有4 GB的VRAM。

 cmake .. -DSD_CUBLAS=ON
cmake --build . --config Release

使用Hipblas

这提供了使用AMD GPU的ROCM内核的BLAS加速度。确保安装了ROCM工具包。

Windows用户参考文档/hipblas_on_windows.md以获取综合指南。

 cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
cmake --build . --config Release

使用金属

使用金属使计算在GPU上运行。当前，在以非常大的矩阵进行操作时，金属有一些问题，目前使其效率很低。预计在不久的将来会提高绩效。

 cmake .. -DSD_METAL=ON
cmake --build . --config Release

使用Vulkan

从https://www.lunarg.com/vulkan-sdk/安装Vulkan SDK。

 cmake .. -DSD_VULKAN=ON
cmake --build . --config Release

使用SYCL

使用SYCL使计算在英特尔GPU上运行。请确保您已经在开始之前安装了相关驱动程序和Intel®OneapiBase工具包。更多详细信息和步骤可以涉及Llama.cpp SYCL后端。

 # Export relevant ENV variables
source /opt/intel/oneapi/setvars.sh

# Option 1: Use FP32 (recommended for better performance in most cases)
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

# Option 2: Use FP16
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON

cmake --build . --config Release

text2img的示例通过使用SYCL后端：

下载stable-diffusion型号的重量，请参阅下载重量。
run ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors --cfg-scale 5 --steps 30 --sampling-method euler -H 1024 -W 1024 --seed 42 -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"

使用闪光注意力

使扩散模型的闪烁注意力通过不同量的MB减少记忆使用量。例如。：

Flux 768x768〜600MB
SD2 768x768〜1400MB

对于大多数后端而言，它会减慢速度，但是对于Cuda来说，它通常也会加快它的速度。目前，仅针对某些型号和一些后端（例如CPU，CUDA/ROCM，金属）支持它。

通过添加--diffusion-fa在参数中运行，并注意：

 [INFO ] stable-diffusion.cpp:312  - Using flash attention in the diffusion model

调试日志中的计算缓冲区收缩：

 [DEBUG] ggml_extend.hpp:1004 - flux compute buffer size: 650.00 MB(VRAM)

跑步

 usage: ./bin/sd [arguments]

arguments:
  -h, --help                         show this help message and exit
  -M, --mode [MODEL]                 run mode (txt2img or img2img or convert, default: txt2img)
  -t, --threads N                    number of threads to use during computation (default: -1)
                                     If threads <= 0, then threads will be set to the number of CPU physical cores
  -m, --model [MODEL]                path to full model
  --diffusion-model                  path to the standalone diffusion model
  --clip_l                           path to the clip-l text encoder
  --clip_g                           path to the clip-l text encoder
  --t5xxl                            path to the the t5xxl text encoder
  --vae [VAE]                        path to vae
  --taesd [TAESD_PATH]               path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
  --control-net [CONTROL_PATH]       path to control net model
  --embd-dir [EMBEDDING_PATH]        path to embeddings
  --stacked-id-embd-dir [DIR]        path to PHOTOMAKER stacked id embeddings
  --input-id-images-dir [DIR]        path to PHOTOMAKER input id images dir
  --normalize-input                  normalize PHOTOMAKER input id images
  --upscale-model [ESRGAN_PATH]      path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now
  --upscale-repeats                  Run the ESRGAN upscaler this many times (default 1)
  --type [TYPE]                      weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)
                                     If not specified, the default is the type of the weight file
  --lora-model-dir [DIR]             lora model directory
  -i, --init-img [IMAGE]             path to the input image, required by img2img
  --control-image [IMAGE]            path to image condition, control net
  -o, --output OUTPUT                path to write result image to (default: ./output.png)
  -p, --prompt [PROMPT]              the prompt to render
  -n, --negative-prompt PROMPT       the negative prompt (default: "")
  --cfg-scale SCALE                  unconditional guidance scale: (default: 7.0)
  --strength STRENGTH                strength for noising/unnoising (default: 0.75)
  --style-ratio STYLE-RATIO          strength for keeping input identity (default: 20%)
  --control-strength STRENGTH        strength to apply Control Net (default: 0.9)
                                     1.0 corresponds to full destruction of information in init image
  -H, --height H                     image height, in pixel space (default: 512)
  -W, --width W                      image width, in pixel space (default: 512)
  --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm}
                                     sampling method (default: "euler_a")
  --steps  STEPS                     number of sample steps (default: 20)
  --rng {std_default, cuda}          RNG (default: cuda)
  -s SEED, --seed SEED               RNG seed (default: 42, use random seed for < 0)
  -b, --batch-count COUNT            number of images to generate
  --schedule {discrete, karras, exponential, ays, gits} Denoiser sigma schedule (default: discrete)
  --clip-skip N                      ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
                                     <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
  --vae-tiling                       process vae in tiles to reduce memory usage
  --vae-on-cpu                       keep vae in cpu (for low vram)
  --clip-on-cpu                      keep clip in cpu (for low vram)
  --diffusion-fa                     use flash attention in the diffusion model (for low vram)
                                     Might lower quality, since it implies converting k and v to f16.
                                     This might crash if it is not supported by the backend.
  --control-net-cpu                  keep controlnet in cpu (for low vram)
  --canny                            apply canny preprocessor (edge detection)
  --color                            Colors the logging tags according to level
  -v, --verbose                      print extra info

txt2img示例

./bin/sd -m ../models/sd-v1-4.ckpt -p " a lovely cat "
# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
# ./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
# ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p 'a lovely cat holding a sign says "Stable Diffusion CPP"' --cfg-scale 4.5 --sampling-method euler -v
# ./bin/sd --diffusion-model  ../models/flux1-dev-q3_k.gguf --vae ../models/ae.sft --clip_l ../models/clip_l.safetensors --t5xxl ../models/t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
# ./bin/sd -m  ..modelssd3.5_large.safetensors --clip_l ..modelsclip_l.safetensors --clip_g ..modelsclip_g.safetensors --t5xxl ..modelst5xxl_fp16.safetensors  -H 1024 -W 1024 -p 'a lovely cat holding a sign says "Stable diffusion 3.5 Large"' --cfg-scale 4.5 --sampling-method euler -v

使用不同精确的格式将产生不同质量的结果。

F32	F16	Q8_0	Q5_0	Q5_1	Q4_0	Q4_1

IMG2IMG示例

./output.png是从上述TXT2IMG管道生成的图像

 ./bin/sd --mode img2img -m ../models/sd-v1-4.ckpt -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4

绑定

这些项目包裹stable-diffusion.cpp以便于其他语言/框架更容易使用。

Golang：季节/稳定扩散
C＃：Darthaffe/stablediffusion.net
Python：William-Murray1204/稳定 - 扩散CPP-Python
生锈：newfla/bixfusion-rs

UIS

这些项目使用stable-diffusion.cpp作为其图像生成的后端。

果冻
稳定的扩散GUI

贡献者

感谢所有已经为稳定扩散做出贡献的人！

星历史

参考

GGML
稳定扩散
SD3-REF
稳定扩散稳定性-AI
稳定的扩散webui
comfyui
k-diffusion
潜在抗性模型
生成模型
摄影师

展开

stable diffusion.cpp

稳定扩散

特征

托多

用法

获取代码

下载权重

建造

从头开始构建

使用OpenBlas

使用Cublas

使用Hipblas

使用金属

使用Vulkan

使用SYCL

使用闪光注意力

跑步

txt2img示例

IMG2IMG示例

更多指南

绑定

UIS

贡献者

星历史

参考

abseil cpp

cpp httplib

stable diffusion webui forge

krita ai diffusion

zenoh cpp

stable diffusion webui

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

hidusbf

Google Dorks

shepherd

hidusbf