contrastive unpaired translation下载 - contrastive unpaired translation源代码下载

对比不成对的翻译（切割）

视频（1M）|视频（10m）|网站|纸

我们提供基于贴片对比学习和对抗性学习的未配对图像到图像翻译的Pytorch实施。不使用手工制作的损失和反向网络。与Cyclegan相比，我们的模型训练更快，内存较少。此外，我们的方法可以扩展到单个图像训练，其中每个“域”只是一个图像。

对违规图像到图像翻译的对比度学习
Taesung Park，Alexei A. Efros，Richard Zhang，Jun-Yan Zhu
UC Berkeley和Adobe Research
在ECCV 2020中

伪代码

 import torch
cross_entropy_loss = torch . nn . CrossEntropyLoss ()

# Input: f_q (BxCxS) and sampled features from H(G_enc(x))
# Input: f_k (BxCxS) are sampled features from H(G_enc(G(x))
# Input: tau is the temperature used in PatchNCE loss.
# Output: PatchNCE loss
def PatchNCELoss ( f_q , f_k , tau = 0.07 ):
    # batch size, channel size, and number of sample locations
    B , C , S = f_q . shape

    # calculate v * v+: BxSx1
    l_pos = ( f_k * f_q ). sum ( dim = 1 )[:, :, None ]

    # calculate v * v-: BxSxS
    l_neg = torch . bmm ( f_q . transpose ( 1 , 2 ), f_k )

    # The diagonal entries are not negatives. Remove them.
    identity_matrix = torch . eye ( S )[ None , :, :]
    l_neg . masked_fill_ ( identity_matrix , - float ( 'inf' ))

    # calculate logits: (B)x(S)x(S+1)
    logits = torch . cat (( l_pos , l_neg ), dim = 2 ) / tau

    # return PatchNCE loss
    predictions = logits . flatten ( 0 , 1 )
    targets = torch . zeros ( B * S , dtype = torch . long )
    return cross_entropy_loss ( predictions , targets )

示例结果

未配对的图像到图像翻译

单图像不成对的翻译

俄罗斯蓝猫到脾气暴躁的猫

巴黎街到布拉诺的彩绘房屋

先决条件

Linux或MacOS
Python 3
CPU或NVIDIA GPU + CUDA CUDNN

更新日志

9/12/2020：添加了单图像翻译。

入门

克隆这个仓库：

git clone https://github.com/taesungp/contrastive-unpaired-translation CUT
cd CUT

安装pytorch 1.1和其他依赖关系（例如，火炬，见点，统治，gputil）。
对于PIP用户，请键入命令pip install -r requirements.txt 。
对于Conda用户，您可以使用conda env create -f environment.yml创建新的Conda环境。

剪切和快速训练和测试

下载grumpifycat数据集（论文的图8。俄罗斯蓝色 - >脾气暴躁的猫）

bash ./datasets/download_cut_dataset.sh grumpifycat

在./datasets/grumpifycat/下载数据集并解压缩。

要查看培训结果和损失地块，请运行python -m visdom.server ，然后单击URL http：// localhost：8097。
训练切割模型：

python train.py --dataroot ./datasets/grumpifycat --name grumpycat_CUT --CUT_mode CUT

或训练快速键模型

python train.py --dataroot ./datasets/grumpifycat --name grumpycat_FastCUT --CUT_mode FastCUT

检查点将存储在./checkpoints/grumpycat_*/web中。

测试切割模型：

python test.py --dataroot ./datasets/grumpifycat --name grumpycat_CUT --CUT_mode CUT --phase train

测试结果将在此处保存到HTML文件： ./results/grumpifycat/latest_train/index.html latest_train/index.html。

剪切，快速和自行车

剪切经过身份保存损失和lambda_NCE=1的训练，而fastcut则在没有身份损失的情况下进行了训练，但lambda_NCE=10.0 。与Cyclegan相比，CUT学习执行更强大的分布匹配，而FastCut的设计为更轻（一半的GPU存储器，可以符合更大的图像），并且更快地（更快地训练）替代了Cyclegan。请参阅本文以获取更多详细信息。

在上图中，我们使用预先训练的语义分割模型来测量属于马/斑马体的像素的百分比。我们发现马与斑马图像之间的分布不匹配 - 斑马通常看起来更大（36.8％比17.9％）。我们的完整方法削减具有扩大马匹的灵活性，这是比Cyclegan更好地匹配训练统计数据的手段。快速开口的行为像Cyclegan一样保守。

使用我们的启动器脚本培训

请参阅生成上述命令行参数的experiments/grumpifycat_launcher.py 。启动器脚本可用于配置培训和测试的相当复杂的命令行参数。

使用启动器，下面的命令生成了剪切和快速键的训练命令。

python -m experiments grumpifycat train 0   # CUT
python -m experiments grumpifycat train 1   # FastCUT

要使用启动器进行测试，

python -m experiments grumpifycat test 0   # CUT
python -m experiments grumpifycat test 1   # FastCUT

可能的命令是运行，run_test，启动，关闭等。有关所有命令，请参见experiments/__main__.py 。启动器易于定义和使用。例如，grumpifycat启动器在几行中定义：

 from . tmux_launcher import Options , TmuxLauncher


class Launcher ( TmuxLauncher ):
    def common_options ( self ):
        return [
            Options (    # Command 0
                dataroot = "./datasets/grumpifycat" ,
                name = "grumpifycat_CUT" ,
                CUT_mode = "CUT"
            ),

            Options (    # Command 1
                dataroot = "./datasets/grumpifycat" ,
                name = "grumpifycat_FastCUT" ,
                CUT_mode = "FastCUT" ,
            )
        ]

    def commands ( self ):
        return [ "python train.py " + str ( opt ) for opt in self . common_options ()]

    def test_commands ( self ):
        # Russian Blue -> Grumpy Cats dataset does not have test split.
        # Therefore, let's set the test split to be the "train" set.
        return [ "python test.py " + str ( opt . set ( phase = 'train' )) for opt in self . common_options ()]

应用预训练的切割模型并评估FID

要运行预告片的型号，请运行以下内容。

 # Download and unzip the pretrained models. The weights should be located at
# checkpoints/horse2zebra_cut_pretrained/latest_net_G.pth, for example.
wget http://efrosgans.eecs.berkeley.edu/CUT/pretrained_models.tar
tar -xf pretrained_models.tar

# Generate outputs. The dataset paths might need to be adjusted.
# To do this, modify the lines of experiments/pretrained_launcher.py
# [id] corresponds to the respective commands defined in pretrained_launcher.py
# 0 - CUT on Cityscapes
# 1 - FastCUT on Cityscapes
# 2 - CUT on Horse2Zebra
# 3 - FastCUT on Horse2Zebra
# 4 - CUT on Cat2Dog
# 5 - FastCUT on Cat2Dog
python -m experiments pretrained run_test [id]

# Evaluate FID. To do this, first install pytorch-fid of https://github.com/mseitzer/pytorch-fid
# pip install pytorch-fid
# For example, to evaluate horse2zebra FID of CUT,
# python -m pytorch_fid ./datasets/horse2zebra/testB/ results/horse2zebra_cut_pretrained/test_latest/images/fake_B/
# To evaluate Cityscapes FID of FastCUT,
# python -m pytorch_fid ./datasets/cityscapes/valA/ ~/projects/contrastive-unpaired-translation/results/cityscapes_fastcut_pretrained/test_latest/images/fake_B/
# Note that a special dataset needs to be used for the Cityscapes model. Please read below. 
python -m pytorch_fid [path to real test images] [path to generated images]

注意：对The CityScapes预测的模型进行了对原始CityScapes数据集的大小和JPEG压缩版本进行培训和评估。要执行评估，请下载此验证集并执行评估。

Sincut单图像未配对的培训

要训练Sincut（单图像翻译，如图9、13和14所示），您需要

将--model选项设置为--model sincut ，该选项在./models/sincut_model.py和
在每个域中指定一个图像的数据集目录，例如此./datasets/single_image_monet_etretat/库中包含的示例数据集。

例如，要训练Etretat Cliff的模型（图13的第一张图像），请使用以下命令。

python train.py --model sincut --name singleimage_monet_etretat --dataroot ./datasets/single_image_monet_etretat

或使用实验启动器脚本，

python -m experiments singleimage run 0

对于单图像翻译，我们采用了stylegan2的网络架构组件，以及DTN和Cyclegan中使用的像素身份保存损失。特别是，我们采用了在models/stylegan_networks.py上存在的红色守则。

培训需要几个小时。要使用检查点生成最终图像，

python test.py --model sincut --name singleimage_monet_etretat --dataroot ./datasets/single_image_monet_etretat

或简单

python -m experiments singleimage run_test 0

数据集

下载cut/cyclegan/pix2pix数据集。例如，

bash datasets/download_cut_datasets.sh horse2zebra

CAT2DOG数据集是从AFHQ数据集准备的。请访问https://github.com/clovaai/stargan-v2，然后通过bash download.sh afhq-dataset 。然后按以下方式重组目录。

mkdir datasets/cat2dog
ln -s datasets/cat2dog/trainA [path_to_afhq]/train/cat
ln -s datasets/cat2dog/trainB [path_to_afhq]/train/dog
ln -s datasets/cat2dog/testA [path_to_afhq]/test/cat
ln -s datasets/cat2dog/testB [path_to_afhq]/test/dog

可以从https://cityscapes-dataset.com下载CityScapes数据集。之后，使用脚本./datasets/prepare_cityscapes_dataset.py准备数据集。

输入图像的预处理

输入图像的预处理（例如调整大小或随机裁剪）由选项控制--preprocess ， --load_size和--crop_size 。用法遵循Cyclegan/pix2pix repo。

例如，默认设置--preprocess resize_and_crop --load_size 286 --crop_size 256将输入映像大大为286x286 ，然后将大小256x256的随机作物作为执行数据增强的方式。可以指定其他预处理选项，并且在base_dataset.py中指定它们。以下是一些示例选项。

--preprocess none ：不执行任何预处理。请注意，图像大小仍然缩放为4的最接近倍数，因为卷积发生器否则无法维持相同的图像大小。
--preprocess scale_width --load_size 768 ：缩放图像的宽度为768。
--preprocess scale_shortside_and_crop ：缩放图像保留纵横比，以使短侧是load_size ，然后对窗口尺寸crop_size进行随机裁剪。

通过修改base_dataset.py的get_transform() ，可以添加更多预处理选项。

引用

如果您使用此代码进行研究，请引用我们的论文。

 @inproceedings{park2020cut,
  title={Contrastive Learning for Unpaired Image-to-Image Translation},
  author={Taesung Park and Alexei A. Efros and Richard Zhang and Jun-Yan Zhu},
  booktitle={European Conference on Computer Vision},
  year={2020}
}

如果您使用此存储库中包含的原始Pix2Pix和Cyclegan模型，请引用以下论文

 @inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2017}
}


@inproceedings{isola2017image,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017}
}

致谢

我们感谢艾伦·贾布里（Allan Jabri）和菲利普·索拉（Phillip Isela）的有益的讨论和反馈。我们的代码是基于pytorch-cyclegan and-pix2pix开发的。我们还要感谢Pytorch-FID的FID计算，DRN的MIOU计算以及stylegan2-pytorch，用于我们的单像翻译设置中使用的stylegan2的pytorch实现。

展开