| 姓名 | 關聯 |
|---|---|
| FashionClip功能提取和分類 | |
| 教程 - 時尚克里普評估與居住 |
更新(10/03/23):我們已經更新了模型!我們發現Laion/Clip-Vit-B-32-Laion2B-S34B-B79K檢查點(感謝Bin!)的效果要比時尚的原始OpenAi剪輯更好。因此,我們對FashionClip(從此以後的FashionClip 2.0)進行了更新的版本,同時保持架構相同。我們假設laion/CLIP-ViT-B-32-laion2B-s34B-b79K所獲得的掌握率是由於訓練數據的增加(5倍OpenAI剪輯數據)所致。然而,我們的論文保持不變 - 我們的時尚數據集上的微調laion/CLIP改善了我們的基準測試的零射擊。請參閱下表比較跨模型的加權宏F1得分。 `
| 模型 | fmnist | kagl | 深的 |
|---|---|---|---|
| Openai剪輯 | 0.66 | 0.63 | 0.45 |
| FashionClip | 0.74 | 0.67 | 0.48 |
| Laion剪輯 | 0.78 | 0.71 | 0.58 |
| FashionClip 2.0 | 0.83 | 0.73 | 0.62 |
我們現在正在擁抱臉!該模型可在此處使用。
我們現在是大自然的科學報告!
@Article{Chia2022,
title="Contrastive language and vision learning of general fashion concepts",
author="Chia, Patrick John
and Attanasio, Giuseppe
and Bianchi, Federico
and Terragni, Silvia
and Magalh{~a}es, Ana Rita
and Goncalves, Diogo
and Greco, Ciro
and Tagliabue, Jacopo",
journal="Scientific Reports",
year="2022",
month="Nov",
day="08",
volume="12",
number="1",
pages="18958",
abstract="The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from general and transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model adapted for the fashion industry. We demonstrate the effectiveness of the representations learned by FashionCLIP with extensive tests across a variety of tasks, datasets and generalization probes. We argue that adaptations of large pre-trained models such as CLIP offer new perspectives in terms of scalability and sustainability for certain types of players in the industry. Finally, we detail the costs and environmental impact of training, and release the model weights and code as open source contribution to the community.",
issn="2045-2322",
doi="10.1038/s41598-022-23052-9",
url="https://doi.org/10.1038/s41598-022-23052-9"
}
我們正在等待Farfetch數據集的官方發布,並將微調的模型權重,預處理的圖像和文本向量公開。同時,我們目前使用CLIP的擁抱臉實現,並可以按照標準的HugginFace命名約定(即fclip = FashionCLIP('<username>/<repo_name>', ... ) )。我們還支持私人存儲庫(ie fclip = FashionCLIP('<username>/<repo_name>', auth_token=<AUTH_TOKEN>, ... ) )。
有關更多詳細信息,請參見下文!
FashionCLIP是一種針對時裝行業進行微調的類似剪輯的模型。我們微調CLIP (Radford等,2021,超過700K <image,Text>對Farfetch DataSet 1對。
我們通過將其應用於取回,分類和時尚解析等行業中的開放問題來評估FashionClip。我們的結果表明,微調有助於捕獲特定領域的概念,並在零拍攝方案中概括它們。我們還通過定性分析來補充定量測試,並提供有關概念如何在視覺空間中如何解鎖語言概括的初步見解。請參閱我們的論文以獲取更多詳細信息。
在此存儲庫中,您將找到一個用於與FashionCLIP互動的API和一個使用SpraTlit(即將推出!)構建的交互式演示,該演示展示了FashionCLIP的功能。
需要快速生成嵌入的方法嗎?您想測試檢索性能嗎?
首先,您應該能夠使用PIP快速安裝它。
$ pip install fashion-clip
如果您有文本和圖像路徑列表,則很容易生成嵌入:
from fashion_clip . fashion_clip import FashionCLIP
fclip = FashionCLIP ( 'fashion-clip' )
# we create image embeddings and text embeddings
image_embeddings = fclip . encode_images ( images , batch_size = 32 )
text_embeddings = fclip . encode_text ( texts , batch_size = 32 )
# we normalize the embeddings to unit norm (so that we can use dot product instead of cosine similarity to do comparisons)
image_embeddings = image_embeddings / np . linalg . norm ( image_embeddings , ord = 2 , axis = - 1 , keepdims = True )
text_embeddings = text_embeddings / np . linalg . norm ( text_embeddings , ord = 2 , axis = - 1 , keepdims = True )使用我們的COLAB筆記本查看更多功能。
from PIL import Image
import requests
from transformers import CLIPProcessor , CLIPModel
model = CLIPModel . from_pretrained ( "patrickjohncyh/fashion-clip" )
processor = CLIPProcessor . from_pretrained ( "patrickjohncyh/fashion-clip" )
image = Image . open ( "images/image1.jpg" )
inputs = processor ( text = [ "a photo of a red shoe" , "a photo of a black shoe" ],
images = image , return_tensors = "pt" , padding = True )
outputs = model ( ** inputs )
logits_per_image = outputs . logits_per_image # this is the image-text similarity score
probs = logits_per_image . softmax ( dim = 1 )
print ( probs )
image . resize (( 224 , 224 ))從項目root中, fashion-clip與
$ pip install -e .
有兩個主要的抽象來促進FashionCLIP的使用。
首先, FCLIPDataset類封裝了與給定目錄相關的信息並暴露了對FashionCLIP至關重要的信息。此外,它提供了用於快速探索和可視化數據的輔助功能。主要的初始化參數是
name: str -> Name of dataset
image_source_path: str -> absolute path to images (can be local or s3)
image_source_type: str -> type of source (i.e. local or s3)
catalog: List[dict] = None -> list of dicts containing at miniumum the keys ['id', 'image', 'caption']
為了易於使用,API還可以通過簡單地指定相應的目錄名稱來訪問該數據集(一旦官員發布)用於培訓FahionCLIP 。
from fashion_clip import FCLIPDataset
dataset = FCLIPDataset(name='FF',
image_source_path='path/to/images',
image_source_type='local')
from fashion_clip import FCLIPDataset
my_catalog = [{'id': 1, 'image': 'x.jpg', 'caption': 'image x'}]
dataset = FCLIPDataset(name='my_dataset',
image_source_path='path/to/images',
image_source_type='local',
catalog=my_catalog)
第二個抽像是FashionCLIP類,它採用了擁抱的面部剪輯模型名稱和FCLIPDataset ,並提供了方便的功能來執行任務,例如多模式檢索,零照片分類和本地化。 FashionCLIP的初始化參數如下:
model_name: str -> Name of model OR path to local model
dataset: FCLIPDataset -> Dataset,
normalize: bool -> option to convert embeddings to unit norm
approx: bool -> option to use approximate nearest neighbors
與FCLIPDataset抽像類似,我們在此託管的論文中包括了預先訓練的FashionCLIP模型。如果接收了未知的數據集和模型組合,則將在對象實例化上生成圖像和標題向量,否則將從S3中取出預計的向量/嵌入。
from fashion_clip import FCLIPDataset, FashionCLIP
dataset = FCLIPDataset(name='FF',
image_source_path='path/to/images',
image_source_type='local')
fclip = FashionCLIP('fasihon-clip', ff_dataset)
有關如何使用軟件包的更多詳細信息,請參閱隨附的筆記本!
等待正式發布。 ↩