This is a differentiator from automatic1111 webui, which is more friendly to developers, or virtual idol training.
Show me the lora effect of training with a small number of Dilraba Dilraba, a mixed-race European and American Reba 
pip install -r requirements.txt
git lfs install
# blip 模型
wget https : // storage . googleapis . com / sfr - vision - language - research / BLIP / models / model_base_caption_capfilt_large . pth - P . / pretrained_models
# bert-base-uncased
cd pretrained_models
git clone https : // huggingface . co / bert - base - uncased
# diffusion base model
# 我选用的是chilloutmix_NiPrunedFp32Fix
git clone https : // huggingface . co / naonovn / chilloutmix_NiPrunedFp32Fix
# safetenosor模型转换
cd ..
python process / convert_original_stable_diffusion_to_difdusers . py
- - checkpoint_path . / pretrained_models / chilloutmix_NiPrunedFp32Fix / chilloutmix_NiPrunedFp32Fix . safetensors
- - dump_path . / pretrained_models / chilloutmixNiPruned_Tw1O - - from_safetensors # 下载数据
mkdir -p dataset
cd dataset
git clone https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions/
# 图片文本获取
python process / run_caption . py - - img_base . / dataset / custom
# 将a woman 替换成<dlrb>
python process / change_txt . py - - img_base . / dataset / custom - - ori_txt 'a woman' - - new_txt "<dlrb>" Parameter adjustment self.custom = True to True use user data, False uses huggingfaec data
- - train_text_encoder # 开启text_encoder lora训练
- - dist # 关闭DDP多机多卡训练模式
- - batch_size 1 # 设置batch_size大小
# 训练脚本
python train . py - - batch_size 1 - - dist - - train_text_encoder python inference . py
- - mode 'lora'
- - lora_path checkpoint / Lora / 000 - 00000600. pth
- - prompt "<dlrb>,solo, long hair, black hair, choker, breasts, earrings, blue eyes, jewelry, lipstick, makeup, dark, bare shoulders, mountain, night, upper body, dress, large breasts, ((masterpiece))"
- - outpath results / 1. png
- - num_images_per_prompt 2 The fewer training images are, the smaller the number of iterations of the selected model should be. For example, if you choose about 1000 for a single image training, and if you choose about 2500 for a 10 image training, you can choose about 10 images training.
Added controlnet conversion, refer to Here
python process/tool_transfer_control.py
--path_input pretrained_models/chilloutmix_NiPrunedFp32Fix/chilloutmix_NiPrunedFp32Fix.safetensors
--path_output pretrained_models/chilloutmix_control.pth
python process / convert_controlnet_to_diffusers . py
- - checkpoint_path pretrained_models / chilloutmix_control . pth
- - original_config_file model / third / cldm_v15 . yaml
- - dump_path pretrained_models / chilloutmix_control - - device cuda python inference . py
- - mode 'control'
- - lora_path checkpoint / Lora / 000 - 00000600. pth
- - control_path pretrained_models / chilloutmix_control
- - pose_img assets / pose . png
- - prompt "<dlrb>,solo, long hair, black hair, choker, breasts, earrings, blue eyes, jewelry, lipstick, makeup, dark, bare shoulders, mountain, night, upper body, dress, large breasts, ((masterpiece))"
- - outpath results / 1. png
- - num_images_per_prompt 2 
cd pretrained_models
git clone https : // huggingface . co / runwayml / stable - diffusion - inpainting
# 下载parsing模型
wget https : // github . com / LeslieZhoa / LVT / releases / download / v0 . 0 / face_parsing . pt - P pretrained_models python inference . py
- - mode 'inpait'
- - inpait_path pretrained_models / stable - diffusion - inpainting
- - mask_area all
- - ref_img assets / ref . png
- - prompt "green hair,short hair,curly hair, green hair,beach,seaside"
- - outpath results / 1. png
- - num_images_per_prompt 2 
Inpaiting is more silky
wget https : // huggingface . co / TencentARC / T2I - Adapter / resolve / main / models / t2iadapter_seg_sd14v1 . pth - P pretrained_models python inference . py
- - mode 't2iinpait'
- - ref_img assets / t2i - input . png
- - mask assets / t2i - mask . png
- - adapter_mask assets / t2i - adapter . png
- - prompt "green hair,curly hair, green hair,beach,seaside"
- - outpath results / 1. png
- - num_images_per_prompt 2 
cd pretrained_models
git clone https : // huggingface . co / timbrooks / instruct - pix2pix python inference . py
- - mode 'instruct'
- - ref_img assets / t2i - input . png
- - prompt "turn her face to comic style"
- - neg_prompt None
- - image_guidance_scale 1
- - outpath results / 1. png
- - num_images_per_prompt 1 
The model is mainly derived from FaceVid2Vid, which adds 512 HD definition
wget https://github.com/LeslieZhoa/Simple-Lora/releases/download/v0.0/script.zip
unzip script.zip && rm -rf script.zip
python script/run.py --input assets/6.png
ffmpeg -r 25 -f image2 -i results/%06d.png -vcodec libx264 11.mp4
https://github.com/huggingface/diffusers
https://github.com/AUTOMATIC1111/stable-diffusion-webui
https://github.com/salesforce/BLIP
https://github.com/haofanwang/Lora-for-Diffusers
https://github.com/lllyasviel/ControlNet
https://github.com/haofanwang/ControlNet-for-Diffusers
https://github.com/haofanwang/T2I-Adapter-for-Diffusers
https://github.com/TencentARC/T2I-Adapter
https://github.com/HimariO/diffusers-t2i-adapter
https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis