stable diffusion.cpp下載 - stable diffusion.cpp源代碼下載

穩定擴散

純C/C ++的穩定擴散和通量的推斷

特徵

基於GGML的普通C/C ++實現，以與Llama.cpp相同的方式工作
超輕量級，沒有外部依賴性
SD1.X，SD2.X，SDXL和SD3/SD3.5支持
- ！！！ SDXL中的VAE遇到FP16下的NAN問題，但不幸的是，GGML_CONV_2D僅在FP16下運行。因此，需要參數來指定已解決FP16 NAN問題的VAE。您可以在這裡找到它：SDXL VAE FP16修復。
通量-DEV/FLUX-SCHNELL支持
SD-Turbo和SDXL-Turbo支持
攝影製造商支持。
16位，32位浮動支撐
2位，3位，4位，5位和8位整數量化支持
加速記憶有效的CPU推理
- 僅在使用fp16精度的txt2img生成512x512圖像時只需要〜2.3GB，啟用閃光燈的注意力只需要〜1.8GB。
AVX，AVX2和AVX512支持X86架構
完整的CUDA，金屬，Vulkan和SYCL後端，用於GPU加速。
可以加載CKPT，SafetEnsor和擴散器模型/檢查點。獨立的VAE模型
- 無需再轉換為.ggml或.gguf ！
注意記憶使用優化的注意力
原始txt2img和img2img模式
負提示
穩定的擴散 - webui樣式令牌（不是所有功能，目前只有令牌加權）
洛拉支持，與穩定的擴散 - webui相同
潛在的一致性模型支持（LCM/LCM-LORA）
taesd的更快和內存有效的潛在解碼
Esrgan生成的高檔圖像
VAE瓷磚處理以減少內存使用
SD 1.5的控製網支持
採樣方法
- Euler A
- Euler
- Heun
- DPM2
- DPM++ 2M
- DPM++ 2M v2
- DPM++ 2S a
- LCM
跨平台可重複性（ --rng cuda ，與stable-diffusion-webui GPU RNG一致）
將生成參數嵌入到png輸出中，為webui兼容文本字符串
支持的平台
- Linux
- Mac OS
- 視窗
- Android（通過Termux）

托多

更多采樣方法
更快地進行推理
- GGML_CONV_2D的當前實現速度很慢，並且使用高內存使用率
繼續減少內存使用情況（量化GGML_Conv_2d的權重）
實施鑽頭支持

用法

對於大多數用戶，您可以從最新版本中下載構建的可執行程序。如果內置產品不符合您的要求，則可以選擇手動構建它。

獲取代碼

 git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp

如果您已經克隆了存儲庫，則可以使用以下命令將存儲庫更新為最新代碼。

 cd stable-diffusion.cpp
git pull origin master
git submodule init
git submodule update

下載權重

下載原始權重（.ckpt或.safetensors）。例如

從https://huggingface.co/compvis/stable-diffusion-viffusion-v-1-4-original發出的穩定擴散v1.4
穩定的擴散v1.5從https://huggingface.co/runwayml/stable-diffusion-v1-5
從https://huggingface.co/stocietationai/stable-diffusion-2-1發出穩定的分散v2.1
穩定的擴散3 2b從https://huggingface.co/stocietyai/stable-diffusion-3-medium

curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
# curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-nonema-pruned.safetensors
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors

建造

從頭開始構建

mkdir build
cd build
cmake ..
cmake --build . --config Release

使用OpenBlas

 cmake .. -DGGML_OPENBLAS=ON
cmake --build . --config Release

使用Cublas

這提供了使用NVIDIA GPU的CUDA核心的BLA加速度。確保安裝了CUDA工具包。您可以從Linux發行版的軟件包管理器（例如apt install nvidia-cuda-toolkit ）下載它，也可以從此處下載：CUDA Toolkit。建議至少具有4 GB的VRAM。

 cmake .. -DSD_CUBLAS=ON
cmake --build . --config Release

使用Hipblas

這提供了使用AMD GPU的ROCM內核的BLAS加速度。確保安裝了ROCM工具包。

Windows用戶參考文檔/hipblas_on_windows.md以獲取綜合指南。

 cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
cmake --build . --config Release

使用金屬

使用金屬使計算在GPU上運行。當前，在以非常大的矩陣進行操作時，金屬有一些問題，目前使其效率很低。預計在不久的將來會提高績效。

 cmake .. -DSD_METAL=ON
cmake --build . --config Release

使用Vulkan

從https://www.lunarg.com/vulkan-sdk/安裝Vulkan SDK。

 cmake .. -DSD_VULKAN=ON
cmake --build . --config Release

使用SYCL

使用SYCL使計算在英特爾GPU上運行。請確保您已經在開始之前安裝了相關驅動程序和Intel®OneapiBase工具包。更多詳細信息和步驟可以涉及Llama.cpp SYCL後端。

 # Export relevant ENV variables
source /opt/intel/oneapi/setvars.sh

# Option 1: Use FP32 (recommended for better performance in most cases)
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

# Option 2: Use FP16
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON

cmake --build . --config Release

text2img的示例通過使用SYCL後端：

下載stable-diffusion型號的重量，請參閱下載重量。
run ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors --cfg-scale 5 --steps 30 --sampling-method euler -H 1024 -W 1024 --seed 42 -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"

使用閃光注意力

使擴散模型的閃爍注意力通過不同量的MB減少記憶使用量。例如。：

Flux 768x768〜600MB
SD2 768x768〜1400MB

對於大多數後端而言，它會減慢速度，但是對於Cuda來說，它通常也會加快它的速度。目前，僅針對某些型號和一些後端（例如CPU，CUDA/ROCM，金屬）支持它。

通過添加--diffusion-fa在參數中運行，並註意：

 [INFO ] stable-diffusion.cpp:312  - Using flash attention in the diffusion model

調試日誌中的計算緩衝區收縮：

 [DEBUG] ggml_extend.hpp:1004 - flux compute buffer size: 650.00 MB(VRAM)

跑步

 usage: ./bin/sd [arguments]

arguments:
  -h, --help                         show this help message and exit
  -M, --mode [MODEL]                 run mode (txt2img or img2img or convert, default: txt2img)
  -t, --threads N                    number of threads to use during computation (default: -1)
                                     If threads <= 0, then threads will be set to the number of CPU physical cores
  -m, --model [MODEL]                path to full model
  --diffusion-model                  path to the standalone diffusion model
  --clip_l                           path to the clip-l text encoder
  --clip_g                           path to the clip-l text encoder
  --t5xxl                            path to the the t5xxl text encoder
  --vae [VAE]                        path to vae
  --taesd [TAESD_PATH]               path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
  --control-net [CONTROL_PATH]       path to control net model
  --embd-dir [EMBEDDING_PATH]        path to embeddings
  --stacked-id-embd-dir [DIR]        path to PHOTOMAKER stacked id embeddings
  --input-id-images-dir [DIR]        path to PHOTOMAKER input id images dir
  --normalize-input                  normalize PHOTOMAKER input id images
  --upscale-model [ESRGAN_PATH]      path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now
  --upscale-repeats                  Run the ESRGAN upscaler this many times (default 1)
  --type [TYPE]                      weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)
                                     If not specified, the default is the type of the weight file
  --lora-model-dir [DIR]             lora model directory
  -i, --init-img [IMAGE]             path to the input image, required by img2img
  --control-image [IMAGE]            path to image condition, control net
  -o, --output OUTPUT                path to write result image to (default: ./output.png)
  -p, --prompt [PROMPT]              the prompt to render
  -n, --negative-prompt PROMPT       the negative prompt (default: "")
  --cfg-scale SCALE                  unconditional guidance scale: (default: 7.0)
  --strength STRENGTH                strength for noising/unnoising (default: 0.75)
  --style-ratio STYLE-RATIO          strength for keeping input identity (default: 20%)
  --control-strength STRENGTH        strength to apply Control Net (default: 0.9)
                                     1.0 corresponds to full destruction of information in init image
  -H, --height H                     image height, in pixel space (default: 512)
  -W, --width W                      image width, in pixel space (default: 512)
  --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm}
                                     sampling method (default: "euler_a")
  --steps  STEPS                     number of sample steps (default: 20)
  --rng {std_default, cuda}          RNG (default: cuda)
  -s SEED, --seed SEED               RNG seed (default: 42, use random seed for < 0)
  -b, --batch-count COUNT            number of images to generate
  --schedule {discrete, karras, exponential, ays, gits} Denoiser sigma schedule (default: discrete)
  --clip-skip N                      ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
                                     <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
  --vae-tiling                       process vae in tiles to reduce memory usage
  --vae-on-cpu                       keep vae in cpu (for low vram)
  --clip-on-cpu                      keep clip in cpu (for low vram)
  --diffusion-fa                     use flash attention in the diffusion model (for low vram)
                                     Might lower quality, since it implies converting k and v to f16.
                                     This might crash if it is not supported by the backend.
  --control-net-cpu                  keep controlnet in cpu (for low vram)
  --canny                            apply canny preprocessor (edge detection)
  --color                            Colors the logging tags according to level
  -v, --verbose                      print extra info

txt2img示例

./bin/sd -m ../models/sd-v1-4.ckpt -p " a lovely cat "
# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
# ./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
# ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p 'a lovely cat holding a sign says "Stable Diffusion CPP"' --cfg-scale 4.5 --sampling-method euler -v
# ./bin/sd --diffusion-model  ../models/flux1-dev-q3_k.gguf --vae ../models/ae.sft --clip_l ../models/clip_l.safetensors --t5xxl ../models/t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
# ./bin/sd -m  ..modelssd3.5_large.safetensors --clip_l ..modelsclip_l.safetensors --clip_g ..modelsclip_g.safetensors --t5xxl ..modelst5xxl_fp16.safetensors  -H 1024 -W 1024 -p 'a lovely cat holding a sign says "Stable diffusion 3.5 Large"' --cfg-scale 4.5 --sampling-method euler -v

使用不同精確的格式將產生不同質量的結果。

F32	F16	Q8_0	Q5_0	Q5_1	Q4_0	Q4_1

IMG2IMG示例

./output.png是從上述TXT2IMG管道生成的圖像

 ./bin/sd --mode img2img -m ../models/sd-v1-4.ckpt -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4

綁定

這些項目包裹stable-diffusion.cpp以便於其他語言/框架更容易使用。

Golang：季節/穩定擴散
C＃：Darthaffe/stablediffusion.net
Python：William-Murray1204/穩定 - 擴散CPP-Python
生鏽：newfla/bixfusion-rs

UIS

這些項目使用stable-diffusion.cpp作為其圖像生成的後端。

果凍
穩定的擴散GUI

貢獻者

感謝所有已經為穩定擴散做出貢獻的人！

星曆史

參考

GGML
穩定擴散
SD3-REF
穩定擴散穩定性-AI
穩定的擴散webui
comfyui
k-diffusion
潛在抗性模型
生成模型
攝影師

展開

stable diffusion.cpp

穩定擴散

特徵

托多

用法

獲取代碼

下載權重

建造

從頭開始構建

使用OpenBlas

使用Cublas

使用Hipblas

使用金屬

使用Vulkan

使用SYCL

使用閃光注意力

跑步

txt2img示例

IMG2IMG示例

更多指南

綁定

UIS

貢獻者

星曆史

參考

abseil cpp

cpp httplib

stable diffusion webui forge

krita ai diffusion

zenoh cpp

stable diffusion webui

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

hidusbf

Google Dorks

shepherd

hidusbf