EZIGen 다운로드 EZIGen 소스 코드 다운로드

EZIGen

기타 소스코드

1.0.0

다운로드

데이터 세트

Ezigen : 정확한 주제 인코딩 및 분리 된 지침으로 제로 샷 주제 중심 이미지 생성 향상

Zicheng Duan ¹ ; Yuxuan Ding ² ; Chenhui Gou ³ ; Ziqin Zhou ¹ ; 이단 스미스 ⁴ ; Lingqiao liu ^1,*

¹ AIML, 애들레이드 대학교 | ² Xidian University | ³ Monash University | ⁴ Leonardo.ai

데이터 세트

TODO 목록

데모 페이지
추론 코드 및 체크 포인트
교육 코드 및 데이터 로더
포옹 페이스 데모

설치

콘다 환경을 준비하십시오

 conda create -n ezigen python=3.10 -y && conda activate ezigen

Pytorch를 설치하십시오

 pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

디퓨저를 설치하십시오

 wget https://github.com/huggingface/diffusers/archive/refs/tags/v0.30.1.zip
cd diffusers-0.30.1
pip install -e ".[torch]" && cd .. && rm v0.30.1.zip

나머지 종속성을 설치하십시오

 pip install -r requirements.txt

검문소

Google 드라이브에서 Checkpoint ( checkpoint-200000.zip , ~ 6.5GB)를 다운로드하고 로컬 폴더로 압축을 풀어주십시오.

추론

우리는 주제 중심 생성 작업과 주제 중심 이미지 편집에 대한 추론 코드를 제공합니다. 예시적인 결과는 outputs 폴더에서 찾을 수 있습니다.

먼저 config/infer_config.yaml 로 돌아가 올바른 체크 포인트 폴더 경로를 할당하십시오 (예 : checkpoint-200000/ ).

1. 주제 중심 생성 및 인간 콘텐츠 생성

주제 중심 생성 및 인간 컨텐츠 생성에 대한 스크립트는 infer_generation.sh 에서 제공됩니다.

 # infer_generation.sh
python infer.py 
    --config configs/infer_config.yaml 
    --guidance_scale 7.5
    --seed 3154 
    --split_ratio 0.4 
    --infer_steps 50 
    --sim_threshold 0.99 
    --tar_prompt "a dog in police outfit" 
    --sub_prompt "a dog" 
    --sub_img_path "example_images/subjects/dog6.png" 
    --output_root "outputs/" 
    # --num_interations 6

논증에 대한 몇 가지 설명 :

split_ratio=0.4 외관 전송을 위해 마지막 40% 타임 스텝을 남기는 것을 의미합니다. 레이아웃 생성 프로세스의 첫 60% 단계입니다. 값은 0에서 1 사이이며 큰 값은 더 많은 모양 전달을 나타냅니다.
sim_threshold 는 Autostop의 클립 유사성 임계 값입니다. sub_prompt 장소 보유자 역할을하지만 항상 최상의 제목 기능 추출을 위해 제목 이미지의 올바른 클래스 이름을 입력하는 것이 좋습니다.
# --num_interations 6 은 기본적으로 -1로 설정되어 자동화 마하 카니즘 (최소 3 개 및 최대 10 회 반복)으로 설정되지만이 줄을 무책임하고 원하는 반복 번호를 할당 할 수 있습니다.

일부 과목은 example_images/subjects 에 제시됩니다.

2. 주제 중심 편집

 # infer_editing.sh
python infer.py 
    --config configs/infer_config.yaml 
    --guidance_scale 7.5
    --seed 3154 
    --split_ratio 0.4 
    --infer_steps 50 
    --sim_threshold 0.99 
    --tar_prompt "a woman" 
    --sub_prompt "a woman" 
    --sub_img_path "example_images/subjects/lifeifei.png" 
    --output_root "outputs/" 
    --foreground_mask_path example_images/source_images_with_masks/woman_mask.png 
    --source_image_path example_images/source_images_with_masks/woman.png 
    --do_editing
    # --num_interations 6

논쟁에 대한 몇 가지 설명 :

--sub_prompt 와 마찬가지로 --tar_prompt 편집 프로세스에 텍스트 프롬프트가 필요하지 않으므로 현재 자리 표시 자입니다.
source_image_path : 편집을위한 소스 RGB 이미지의 경로.
foreground_mask_path : (255, 255, 255)와 배경이있는 3 채널 마스크로가는 경로는 (0, 0, 0)과 같은 (0, 0, 0)과 동일해야합니다.

일부 입력 예제는 example_images/source_images_with_masks 에 나와 있습니다.

감사의 말

이 프로젝트는 Anydoor의 일부 코드를 참조 하여이 위대한 작품을 외치십시오!

소환

이 코드베이스가 연구에 유용하다고 생각되면 다음과 같이 인용하십시오.

 @article{duan2024ezigen,
  title={EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance},
  author={Duan, Zicheng and Ding, Yuxuan and Gou, Chenhui and Zhou, Ziqin and Smith, Ethan and Liu, Lingqiao},
  journal={arXiv preprint arXiv:2409.08091},
  year={2024}
}

확장하다

추가 정보