Inference-only tiny reference implementation of SD3.5 and SD3 - everything you need for simple inference using SD3.5/SD3, excluding the weights files.
Contains code for the text encoders (OpenAI CLIP-L/14, OpenCLIP bigG, Google T5-XXL) (these models are all public), the VAE Decoder (similar to previous SD models, but 16-channels and no postquantconv step), and the core MM-DiT (entirely new).
Note: this repo is a reference library meant to assist partner organizations in implementing SD3.5/SD3. For alternate inference, use Comfy.
Download the following models from HuggingFace into models directory:
This code also works for Stability AI SD3 Medium.
# Note: on windows use "python" not "python3"
python3 -s -m venv .sd3.5
source .sd3.5/bin/activate
# or on windows: venv/scripts/activate
python3 -s -m pip install -r requirements.txt# Generate a cat using SD3.5 Large model (at models/sd3.5_large.safetensors) with its default settings
python3 sd3_infer.py --prompt "cute wallpaper art of a cat"
# Or use a text file with a list of prompts, using SD3.5 Large
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large.safetensors
# Generate from prompt file using SD3.5 Large Turbo with its default settings
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_large_turbo.safetensors
# Generate from prompt file using SD3.5 Medium with its default settings, at 2k resolution
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_medium.safetensors --width 1920 --height 1080
# Generate from prompt file using SD3 Medium with its default settings
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3_medium.safetensorsImages will be output to outputs/<MODEL>/<PROMPT>_<DATETIME>_<POSTFIX> by default.
To add a postfix to the output directory, add --postfix <my_postfix>. For example,
python3 sd3_infer.py --prompt path/to/my_prompts.txt --postfix "steps100" --steps 100To change the resolution of the generated image, add --width <WIDTH> --height <HEIGHT>.
Optionally, use Skip Layer Guidance for potentially better struture and anatomy coherency from SD3.5-Medium.
python3 sd3_infer.py --prompt path/to/my_prompts.txt --model models/sd3.5_medium.safetensors --skip_layer_cfg Truesd3_infer.py - entry point, review this for basic usage of diffusion modelsd3_impls.py - contains the wrapper around the MMDiTX and the VAEother_impls.py - contains the CLIP models, the T5 model, and some utilitiesmmditx.py - contains the core of the MMDiT-X itselfmodels with the following files (download separately):
clip_l.safetensors (OpenAI CLIP-L, same as SDXL/SD3, can grab a public copy)clip_g.safetensors (openclip bigG, same as SDXL/SD3, can grab a public copy)t5xxl.safetensors (google T5-v1.1-XXL, can grab a public copy)sd3.5_large.safetensors or sd3.5_large_turbo.safetensors or sd3.5_medium.safetensors (or sd3_medium.safetensors)The code included here originates from:
Check the LICENSE-CODE file.
Some code in other_impls originates from HuggingFace and is subject to the HuggingFace Transformers Apache2 License