| 姓名 | 关联 |
|---|---|
| FashionClip功能提取和分类 | |
| 教程 - 时尚克里普评估与居住 |
更新(10/03/23):我们已经更新了模型!我们发现Laion/Clip-Vit-B-32-Laion2B-S34B-B79K检查点(感谢Bin!)的效果要比时尚的原始OpenAi剪辑更好。因此,我们对FashionClip(从此以后的FashionClip 2.0)进行了更新的版本,同时保持架构相同。我们假设laion/CLIP-ViT-B-32-laion2B-s34B-b79K所获得的掌握率是由于训练数据的增加(5倍OpenAI剪辑数据)所致。然而,我们的论文保持不变 - 我们的时尚数据集上的微调laion/CLIP改善了我们的基准测试的零射击。请参阅下表比较跨模型的加权宏F1得分。 `
| 模型 | fmnist | kagl | 深的 |
|---|---|---|---|
| Openai剪辑 | 0.66 | 0.63 | 0.45 |
| FashionClip | 0.74 | 0.67 | 0.48 |
| Laion剪辑 | 0.78 | 0.71 | 0.58 |
| FashionClip 2.0 | 0.83 | 0.73 | 0.62 |
我们现在正在拥抱脸!该模型可在此处使用。
我们现在是大自然的科学报告!
@Article{Chia2022,
title="Contrastive language and vision learning of general fashion concepts",
author="Chia, Patrick John
and Attanasio, Giuseppe
and Bianchi, Federico
and Terragni, Silvia
and Magalh{~a}es, Ana Rita
and Goncalves, Diogo
and Greco, Ciro
and Tagliabue, Jacopo",
journal="Scientific Reports",
year="2022",
month="Nov",
day="08",
volume="12",
number="1",
pages="18958",
abstract="The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from general and transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model adapted for the fashion industry. We demonstrate the effectiveness of the representations learned by FashionCLIP with extensive tests across a variety of tasks, datasets and generalization probes. We argue that adaptations of large pre-trained models such as CLIP offer new perspectives in terms of scalability and sustainability for certain types of players in the industry. Finally, we detail the costs and environmental impact of training, and release the model weights and code as open source contribution to the community.",
issn="2045-2322",
doi="10.1038/s41598-022-23052-9",
url="https://doi.org/10.1038/s41598-022-23052-9"
}
我们正在等待Farfetch数据集的官方发布,并将微调的模型权重,预处理的图像和文本向量公开。同时,我们目前使用CLIP的拥抱脸实现,并可以按照标准的HugginFace命名约定(即fclip = FashionCLIP('<username>/<repo_name>', ... ) )。我们还支持私人存储库(ie fclip = FashionCLIP('<username>/<repo_name>', auth_token=<AUTH_TOKEN>, ... ) )。
有关更多详细信息,请参见下文!
FashionCLIP是一种针对时装行业进行微调的类似剪辑的模型。我们微调CLIP (Radford等,2021,超过700K <image,Text>对Farfetch DataSet 1对。
我们通过将其应用于取回,分类和时尚解析等行业中的开放问题来评估FashionClip。我们的结果表明,微调有助于捕获特定领域的概念,并在零拍摄方案中概括它们。我们还通过定性分析来补充定量测试,并提供有关概念如何在视觉空间中如何解锁语言概括的初步见解。请参阅我们的论文以获取更多详细信息。
在此存储库中,您将找到一个用于与FashionCLIP互动的API和一个使用SpraTlit(即将推出!)构建的交互式演示,该演示展示了FashionCLIP的功能。
需要快速生成嵌入的方法吗?您想测试检索性能吗?
首先,您应该能够使用PIP快速安装它。
$ pip install fashion-clip
如果您有文本和图像路径列表,则很容易生成嵌入:
from fashion_clip . fashion_clip import FashionCLIP
fclip = FashionCLIP ( 'fashion-clip' )
# we create image embeddings and text embeddings
image_embeddings = fclip . encode_images ( images , batch_size = 32 )
text_embeddings = fclip . encode_text ( texts , batch_size = 32 )
# we normalize the embeddings to unit norm (so that we can use dot product instead of cosine similarity to do comparisons)
image_embeddings = image_embeddings / np . linalg . norm ( image_embeddings , ord = 2 , axis = - 1 , keepdims = True )
text_embeddings = text_embeddings / np . linalg . norm ( text_embeddings , ord = 2 , axis = - 1 , keepdims = True )使用我们的COLAB笔记本查看更多功能。
from PIL import Image
import requests
from transformers import CLIPProcessor , CLIPModel
model = CLIPModel . from_pretrained ( "patrickjohncyh/fashion-clip" )
processor = CLIPProcessor . from_pretrained ( "patrickjohncyh/fashion-clip" )
image = Image . open ( "images/image1.jpg" )
inputs = processor ( text = [ "a photo of a red shoe" , "a photo of a black shoe" ],
images = image , return_tensors = "pt" , padding = True )
outputs = model ( ** inputs )
logits_per_image = outputs . logits_per_image # this is the image-text similarity score
probs = logits_per_image . softmax ( dim = 1 )
print ( probs )
image . resize (( 224 , 224 ))从项目root中, fashion-clip与
$ pip install -e .
有两个主要的抽象来促进FashionCLIP的使用。
首先, FCLIPDataset类封装了与给定目录相关的信息并暴露了对FashionCLIP至关重要的信息。此外,它提供了用于快速探索和可视化数据的辅助功能。主要的初始化参数是
name: str -> Name of dataset
image_source_path: str -> absolute path to images (can be local or s3)
image_source_type: str -> type of source (i.e. local or s3)
catalog: List[dict] = None -> list of dicts containing at miniumum the keys ['id', 'image', 'caption']
为了易于使用,API还可以通过简单地指定相应的目录名称来访问该数据集(一旦官员发布)用于培训FahionCLIP 。
from fashion_clip import FCLIPDataset
dataset = FCLIPDataset(name='FF',
image_source_path='path/to/images',
image_source_type='local')
from fashion_clip import FCLIPDataset
my_catalog = [{'id': 1, 'image': 'x.jpg', 'caption': 'image x'}]
dataset = FCLIPDataset(name='my_dataset',
image_source_path='path/to/images',
image_source_type='local',
catalog=my_catalog)
第二个抽象是FashionCLIP类,它采用了拥抱的面部剪辑模型名称和FCLIPDataset ,并提供了方便的功能来执行任务,例如多模式检索,零照片分类和本地化。 FashionCLIP的初始化参数如下:
model_name: str -> Name of model OR path to local model
dataset: FCLIPDataset -> Dataset,
normalize: bool -> option to convert embeddings to unit norm
approx: bool -> option to use approximate nearest neighbors
与FCLIPDataset抽象类似,我们在此托管的论文中包括了预先训练的FashionCLIP模型。如果接收了未知的数据集和模型组合,则将在对象实例化上生成图像和标题向量,否则将从S3中取出预计的向量/嵌入。
from fashion_clip import FCLIPDataset, FashionCLIP
dataset = FCLIPDataset(name='FF',
image_source_path='path/to/images',
image_source_type='local')
fclip = FashionCLIP('fasihon-clip', ff_dataset)
有关如何使用软件包的更多详细信息,请参阅随附的笔记本!
等待正式发布。 ↩