ดาวน์โหลด fashion clip - ดาวน์โหลดซอร์สโค้ด fashion clip

fashion clip

ซอร์สโค้ดอื่น ๆ

1.0.0

ดาวน์โหลด

คลิปแฟชั่น

เริ่มต้นอย่างรวดเร็ว

ชื่อ	การเชื่อมโยง
การแยกฟีเจอร์ FashionClip และการจำแนกประเภท
บทช่วยสอน - การประเมินผลแฟชั่นด้วย reclist

อัปเดต (10/03/23): เราได้อัปเดตรุ่นแล้ว! เราพบว่า LAION/CLIP-VIT-B-32-LAION2B-S34B-B79K CHECKPOINT (ขอบคุณ BIN!) ทำงานได้ดีกว่าคลิป OpenAI แบบดั้งเดิมบนแฟชั่น เราจึงปรับแต่งแฟชั่นคลิปรุ่นใหม่ (และดีกว่า!) (ต่อจากนี้ไป FashionClip 2.0) ในขณะที่รักษาสถาปัตยกรรมให้เหมือนกัน เราอ้างว่า Perofrmance ได้รับจาก laion/CLIP-ViT-B-32-laion2B-s34B-b79K เกิดจากข้อมูลการฝึกอบรมที่เพิ่มขึ้น (ข้อมูลคลิป OpenAI 5X) อย่างไรก็ตามวิทยานิพนธ์ของเรายังคงเหมือนเดิม-การปรับแต่ง laion/CLIP อย่างละเอียดบนชุดข้อมูลแฟชั่นของเราปรับปรุงการยิงแบบศูนย์การยิงผ่านเกณฑ์มาตรฐานของเรา ดูตารางด้านล่างเปรียบเทียบคะแนนมาโคร F1 ถ่วงน้ำหนักในแบบจำลอง -

แบบอย่าง	นักแสดง	Kagl	ลึก
คลิป Openai	0.66	0.63	0.45
คลิปแฟชั่น	0.74	0.67	0.48
คลิป LAIN	0.78	0.71	0.58
FashionClip 2.0	0.83	0.73	0.62

ตอนนี้เรากำลังกอดหน้า! รุ่นนี้มีอยู่ที่นี่

ตอนนี้เราอยู่ในรายงานทางวิทยาศาสตร์ธรรมชาติ!

การอ้างอิง

 @Article{Chia2022,
    title="Contrastive language and vision learning of general fashion concepts",
    author="Chia, Patrick John
            and Attanasio, Giuseppe
            and Bianchi, Federico
            and Terragni, Silvia
            and Magalh{~a}es, Ana Rita
            and Goncalves, Diogo
            and Greco, Ciro
            and Tagliabue, Jacopo",
    journal="Scientific Reports",
    year="2022",
    month="Nov",
    day="08",
    volume="12",
    number="1",
    pages="18958",
    abstract="The steady rise of online shopping goes hand in hand with the development of increasingly complex ML and NLP models. While most use cases are cast as specialized supervised learning problems, we argue that practitioners would greatly benefit from general and transferable representations of products. In this work, we build on recent developments in contrastive learning to train FashionCLIP, a CLIP-like model adapted for the fashion industry. We demonstrate the effectiveness of the representations learned by FashionCLIP with extensive tests across a variety of tasks, datasets and generalization probes. We argue that adaptations of large pre-trained models such as CLIP offer new perspectives in terms of scalability and sustainability for certain types of players in the industry. Finally, we detail the costs and environmental impact of training, and release the model weights and code as open source contribution to the community.",
    issn="2045-2322",
    doi="10.1038/s41598-022-23052-9",
    url="https://doi.org/10.1038/s41598-022-23052-9"
}

ข้อมูล

เรากำลังรอการเปิดตัวชุดข้อมูล FARFETCH อย่างเป็นทางการซึ่งน้ำหนักของโมเดลที่ปรับแต่งได้อย่างละเอียดรูปภาพที่ประมวลผลล่วงหน้าและเวกเตอร์ข้อความจะถูกเปิดเผยต่อสาธารณะ ในขณะเดียวกันเราใช้การใช้ CLIP การกอดใบหน้าและสามารถใช้น้ำหนักรุ่นจาก OpenAI ได้โดยทำตามการประชุมการตั้งชื่อ Hugginface มาตรฐาน (เช่น fclip = FashionCLIP('<username>/<repo_name>', ... ) ) นอกจากนี้เรายังสนับสนุนที่เก็บส่วนตัว (เช่น fclip = FashionCLIP('<username>/<repo_name>', auth_token=<AUTH_TOKEN>, ... ) )

ดูรายละเอียดเพิ่มเติมด้านล่าง!

ภาพรวม

FashionCLIP เป็นแบบจำลองที่ได้รับการปรับแต่งสำหรับอุตสาหกรรมแฟชั่น เราปรับแต่ง CLIP (Radford et al., 2021 บนมากกว่า 700k <image, text> คู่จากชุดข้อมูล Farfetch ¹

เราประเมินคลิปแฟชั่นโดยใช้เพื่อเปิดปัญหาในอุตสาหกรรมเช่นการดึงการจำแนกและการแยกวิเคราะห์แฟชั่น ผลลัพธ์ของเราแสดงให้เห็นว่าการปรับแต่งอย่างละเอียดช่วยให้สามารถจับแนวคิดเฉพาะโดเมนและสรุปได้ทั่วไปในสถานการณ์ที่ไม่มีการยิง นอกจากนี้เรายังเสริมการทดสอบเชิงปริมาณด้วยการวิเคราะห์เชิงคุณภาพและนำเสนอข้อมูลเชิงลึกเบื้องต้นเกี่ยวกับวิธีการที่แนวคิดที่มีพื้นฐานในพื้นที่ภาพปลดล็อคการวางนัยทั่วไปทางภาษา โปรดดูเอกสารของเราสำหรับรายละเอียดเพิ่มเติม

ในที่เก็บนี้คุณจะพบ API สำหรับการโต้ตอบกับ FashionCLIP และการสาธิตแบบโต้ตอบที่สร้างขึ้นโดยใช้ Streamlit (เร็ว ๆ นี้!) ซึ่งแสดงความสามารถของ FashionCLIP

API & DEMO

อย่างรวดเร็ววิธีการ

ต้องการวิธีที่รวดเร็วในการสร้าง embeddings หรือไม่? คุณต้องการทดสอบประสิทธิภาพการดึงข้อมูลหรือไม่?

ก่อนอื่นคุณควรจะสามารถติดตั้งนี้ได้อย่างรวดเร็วโดยใช้ PIP

 $ pip install fashion-clip

หากคุณมีรายการข้อความและเส้นทางรูปภาพมันเป็นเรื่องง่ายมากที่จะสร้าง embeddings:

 from fashion_clip . fashion_clip import FashionCLIP

fclip = FashionCLIP ( 'fashion-clip' )

# we create image embeddings and text embeddings
image_embeddings = fclip . encode_images ( images , batch_size = 32 )
text_embeddings = fclip . encode_text ( texts , batch_size = 32 )

# we normalize the embeddings to unit norm (so that we can use dot product instead of cosine similarity to do comparisons)
image_embeddings = image_embeddings / np . linalg . norm ( image_embeddings , ord = 2 , axis = - 1 , keepdims = True )
text_embeddings = text_embeddings / np . linalg . norm ( text_embeddings , ord = 2 , axis = - 1 , keepdims = True )

ใช้สมุดบันทึก colab ของเรา เพื่อดูฟังก์ชันการทำงานเพิ่มเติม

HF API

 from PIL import Image
import requests
from transformers import CLIPProcessor , CLIPModel

model = CLIPModel . from_pretrained ( "patrickjohncyh/fashion-clip" )
processor = CLIPProcessor . from_pretrained ( "patrickjohncyh/fashion-clip" )

image = Image . open ( "images/image1.jpg" )

inputs = processor ( text = [ "a photo of a red shoe" , "a photo of a black shoe" ],
                   images = image , return_tensors = "pt" , padding = True )

outputs = model ( ** inputs )
logits_per_image = outputs . logits_per_image  # this is the image-text similarity score
probs = logits_per_image . softmax ( dim = 1 )  
print ( probs )
image . resize (( 224 , 224 ))

API Fashion Clip ภายในเพิ่มเติม

การติดตั้ง

จาก Project Root ติดตั้งแพ็คเกจ fashion-clip ในเครื่องด้วย

 $ pip install -e .

มีนามธรรมหลักสองประการเพื่ออำนวยความสะดวกในการใช้งาน FashionCLIP

ขั้นแรกคลาส FCLIPDataset ซึ่งห่อหุ้มข้อมูลที่เกี่ยวข้องกับแคตตาล็อกที่กำหนดและเปิดเผยข้อมูลที่สำคัญสำหรับ FashionCLIP นอกจากนี้ยังมีฟังก์ชั่นผู้ช่วยสำหรับการสำรวจอย่างรวดเร็วและการสร้างภาพข้อมูล พารามิเตอร์การเริ่มต้นหลักคือ

 name: str -> Name of dataset
image_source_path: str -> absolute path to images (can be local or s3) 
image_source_type: str -> type of source (i.e. local or s3)
catalog: List[dict] = None -> list of dicts containing at miniumum the keys ['id', 'image', 'caption']

เพื่อความสะดวกในการใช้งาน API ยังให้การเข้าถึงชุดข้อมูล ( เมื่อเปิดตัวอย่างเป็นทางการ ) ใช้ในกระดาษสำหรับการฝึกอบรม FahionCLIP โดยเพียงแค่ระบุชื่อแคตตาล็อกที่เกี่ยวข้อง

ชุดข้อมูลที่รวมเข้าด้วยกัน

 from fashion_clip import FCLIPDataset
dataset = FCLIPDataset(name='FF', 
                       image_source_path='path/to/images', 
                       image_source_type='local')

ชุดข้อมูลที่กำหนดเอง

 from fashion_clip import FCLIPDataset
my_catalog = [{'id': 1, 'image': 'x.jpg', 'caption': 'image x'}]
dataset = FCLIPDataset(name='my_dataset', 
                       image_source_path='path/to/images', 
                       image_source_type='local',
                       catalog=my_catalog)

สิ่งที่เป็นนามธรรมที่สองคือคลาส Clip FashionCLIP ซึ่งใช้ชื่อโมเดลคลิปหน้าคลิปและ FCLIPDataset และให้ฟังก์ชั่นที่สะดวกในการทำงานเช่นการดึงแบบหลายรูปแบบการจำแนกประเภทศูนย์และการแปล พารามิเตอร์การเริ่มต้นสำหรับ FashionCLIP มีดังนี้:

 model_name: str -> Name of model OR path to local model
dataset: FCLIPDataset -> Dataset, 
normalize: bool -> option to convert embeddings to unit norm  
approx: bool -> option to use approximate nearest neighbors

เช่นเดียวกับ FCLIPDataset Abstraction เราได้รวมโมเดล FashionCLIP ที่ผ่านการฝึกอบรมมาล่วงหน้าจากกระดาษซึ่งโฮสต์ไว้ที่นี่ หากได้รับชุดข้อมูลและชุดรูปแบบที่ไม่รู้จักจะสร้างภาพเวกเตอร์ภาพและคำบรรยายภาพตามการสร้างอินสแตนซ์ของวัตถุมิฉะนั้นเวกเตอร์/ฝังตัวที่คำนวณไว้ล่วงหน้าจะถูกดึงจาก S3

 from fashion_clip import FCLIPDataset, FashionCLIP
dataset = FCLIPDataset(name='FF', 
                       image_source_path='path/to/images', 
                       image_source_type='local')
fclip = FashionCLIP('fasihon-clip', ff_dataset)

สำหรับรายละเอียดเพิ่มเติมเกี่ยวกับวิธีการใช้แพ็คเกจโปรดดูสมุดบันทึกประกอบ!