ดาวน์โหลด clip as service - clip as service Source Download

clip as service

ซอร์สโค้ดอื่น ๆ

v0.8.3

ดาวน์โหลด

โลโก้ Clip-as-Service: โครงสร้างข้อมูลสำหรับข้อมูลที่ไม่มีโครงสร้าง

Clip-as-Service เป็นบริการความสามารถในการลดความสามารถสูงต่ำสำหรับการฝังภาพและข้อความ มันสามารถรวมเข้าด้วยกันเป็น microservice ในโซลูชั่นการค้นหาประสาทได้อย่างง่ายดาย

⚡ Fast : เสิร์ฟรุ่นคลิปด้วย Tensorrt, Onnx Runtime และ Pytorch w/o jit ด้วย 800qps ^[*] การสตรีมมิ่งดูเพล็กซ์ที่ไม่ปิดกั้นตามคำขอและการตอบสนองที่ออกแบบมาสำหรับข้อมูลขนาดใหญ่และงานที่ดำเนินมายาวนาน

- ยืดหยุ่น : ปรับขนาดคลิปหลายรุ่นในแนวนอนบน GPU เดี่ยวพร้อมการปรับสมดุลโหลดอัตโนมัติ

- ใช้งานง่าย : ไม่มีเส้นโค้งการเรียนรู้การออกแบบที่เรียบง่ายบนไคลเอนต์และเซิร์ฟเวอร์ API ที่ใช้งานง่ายและสอดคล้องกันสำหรับการฝังภาพและประโยค

- ทันสมัย : การสนับสนุนลูกค้า Async สลับระหว่าง GRPC, HTTP, โปรโตคอล WebSocket กับ TLS และการบีบอัดได้อย่างง่ายดาย

- การบูรณาการ : การบูรณาการที่ราบรื่นกับระบบนิเวศการค้นหาประสาทรวมถึง Jina และ Docarray สร้างโซลูชั่นข้ามรูปแบบและหลายรูปแบบในเวลาไม่นาน

^{[*] ด้วยการกำหนดค่าเริ่มต้น (แบบจำลองเดี่ยว, pytorch no jit) บน geforce RTX 3090}

การฝังข้อความและรูปภาพ

ผ่าน https?

ผ่าน grpc? ⚡⚡

curl 
-X POST https:// < your-inference-address > -http.wolf.jina.ai/post 
-H ' Content-Type: application/json ' 
-H ' Authorization: <your access token> ' 
-d ' {"data":[{"text": "First do it"}, 
    {"text": "then do it right"}, 
    {"text": "then do it better"}, 
    {"uri": "https://picsum.photos/200"}], 
    "execEndpoint":"/"} '

 # pip install clip-client
from clip_client import Client

c = Client (
    'grpcs://<your-inference-address>-grpc.wolf.jina.ai' ,
    credential = { 'Authorization' : '<your access token>' },
)

r = c . encode (
    [
        'First do it' ,
        'then do it right' ,
        'then do it better' ,
        'https://picsum.photos/200' ,
    ]
)
print ( r )

การใช้เหตุผลทางสายตา

มีทักษะการให้เหตุผลด้านการมองเห็นขั้นพื้นฐานสี่ประการ: การจดจำวัตถุการนับวัตถุการจดจำสีและความเข้าใจเชิงพื้นที่ มาลองกันบ้าง:

คุณต้องติดตั้ง jq (โปรเซสเซอร์ JSON) เพื่อให้ผลลัพธ์ล่วงหน้า

ภาพ	ผ่าน https?
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " ให้: `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " ให้: `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " ให้: `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`

เอกสาร

ติดตั้ง

Clip-as-Service ประกอบด้วยแพ็คเกจ Python สองชุด clip-server และ clip-client ที่สามารถติดตั้งได้ อย่างอิสระ ทั้งสองต้องการ Python 3.7+

ติดตั้งเซิร์ฟเวอร์

Pytorch Runtime ⚡	รันไทม์ onnx ⚡⚡	รันไทม์ Tensorrt ⚡⚡⚡
pip install clip-server	pip install " clip-server[onnx] "	pip install nvidia-pyindex pip install " clip-server[tensorrt] "

นอกจากนี้คุณยังสามารถโฮสต์เซิร์ฟเวอร์บน Google Colab ใช้ประโยชน์จาก GPU/TPU ฟรี

ติดตั้งไคลเอนต์

pip install clip-client

ตรวจสอบอย่างรวดเร็ว

คุณสามารถเรียกใช้การตรวจสอบการเชื่อมต่ออย่างง่ายหลังจากติดตั้ง

C/S	สั่งการ	คาดว่าผลผลิต
เซิร์ฟเวอร์	python -m clip_server
ลูกค้า	from clip_client import Client c = Client ( 'grpc://0.0.0.0:23456' ) c . profile ()

คุณสามารถเปลี่ยน 0.0.0.0 เป็นอินทราเน็ตหรือที่อยู่ IP สาธารณะเพื่อทดสอบการเชื่อมต่อผ่านเครือข่ายส่วนตัวและเครือข่ายสาธารณะ

เริ่มต้นใช้งาน

การใช้งานขั้นพื้นฐาน

เริ่มต้นเซิร์ฟเวอร์: python -m clip_server จำที่อยู่และพอร์ต

สร้างไคลเอนต์:

 from clip_client import Client

 c = Client ( 'grpc://0.0.0.0:51000' )

เพื่อรับการฝังประโยค:

 r = c . encode ([ 'First do it' , 'then do it right' , 'then do it better' ])

print ( r . shape )  # [3, 512]

เพื่อรับภาพฝังภาพ:

 r = c . encode ([ 'apple.png' ,  # local image 
              'https://clip-as-service.jina.ai/_static/favicon.png' ,  # remote image
              'data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7' ])  # in image URI

print ( r . shape )  # [3, 512]

คำแนะนำเซิร์ฟเวอร์และไคลเอนต์ที่ครอบคลุมมากขึ้นสามารถพบได้ในเอกสาร

การค้นหาข้ามรูปแบบข้อความเป็นภาพใน 10 บรรทัด

มาสร้างการค้นหาแบบข้อความเป็นภาพโดยใช้คลิปขณะที่บริการ กล่าวคือผู้ใช้สามารถป้อนประโยคและโปรแกรมส่งคืนภาพที่ตรงกัน เราจะใช้ชุดข้อมูลและแพ็คเกจ Docarray โดยสิ้นเชิง โปรดทราบว่า docarray รวมอยู่ใน clip-client เป็นการพึ่งพาต้นน้ำดังนั้นคุณไม่จำเป็นต้องติดตั้งแยกต่างหาก

โหลดภาพ

ก่อนอื่นเราโหลดภาพ คุณสามารถดึงพวกเขาออกจาก Jina Cloud:

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-original' , show_progress = True , local_cache = True )

หรือดาวน์โหลดชุดข้อมูล TTL, unzip, โหลดด้วยตนเอง

หรือคุณสามารถไปที่เว็บไซต์อย่างเป็นทางการโดยสิ้นเชิง unzip และ load images:

 from docarray import DocumentArray

da = DocumentArray . from_files ([ 'left/*.jpg' , 'right/*.jpg' ])

ชุดข้อมูลมีภาพ 12,032 ภาพดังนั้นอาจใช้เวลาสักครู่ในการดึง เมื่อเสร็จแล้วคุณสามารถเห็นภาพและได้รับรสชาติแรกของภาพเหล่านั้น:

 da . plot_image_sprites ()

การสร้างภาพของภาพสไปรต์ของชุดข้อมูลทั้งหมด

เข้ารหัสภาพ

เริ่มต้นเซิร์ฟเวอร์ด้วย python -m clip_server สมมติว่ามันอยู่ที่ 0.0.0.0:51000 ด้วยโปรโตคอล GRPC (คุณจะได้รับข้อมูลนี้หลังจากเรียกใช้เซิร์ฟเวอร์)

สร้างสคริปต์ไคลเอนต์ Python:

 from clip_client import Client

c = Client ( server = 'grpc://0.0.0.0:51000' )

da = c . encode ( da , show_progress = True )

ขึ้นอยู่กับเครือข่าย GPU และเซิร์ฟเวอร์ไคลเอนต์ของคุณอาจใช้เวลาสักครู่ในการฝังภาพ 12K ในกรณีของฉันใช้เวลาประมาณสองนาที

ดาวน์โหลดชุดข้อมูลที่เข้ารหัสล่วงหน้า

หากคุณเป็นคนใจร้อนหรือไม่มี GPU การรออาจเป็นนรก ในกรณีนี้คุณสามารถดึงชุดข้อมูลรูปภาพที่เข้ารหัสล่วงหน้าของเรา:

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-embedding' , show_progress = True , local_cache = True )

ค้นหาผ่านประโยค

มาสร้างพรอมต์ง่ายๆเพื่อให้ผู้ใช้พิมพ์ประโยค:

 while True :
    vec = c . encode ([ input ( 'sentence> ' )])
    r = da . find ( query = vec , limit = 9 )
    r [ 0 ]. plot_image_sprites ()

การแสดง

ตอนนี้คุณสามารถป้อนประโยคภาษาอังกฤษโดยพลการและดูภาพที่ตรงกัน 9 อันดับแรก การค้นหานั้นรวดเร็วและสัญชาตญาณ มาสนุกกันเถอะ:

"มันฝรั่งมีความสุข"	"AI Super Evil"	"ผู้ชายคนหนึ่งเพลิดเพลินกับเบอร์เกอร์ของเขา"

"ศาสตราจารย์แมวจริงจังมาก"	"วิศวกรอัตตาอาศัยอยู่กับผู้ปกครอง"	"จะไม่มีวันพรุ่งนี้ดังนั้นให้กินไม่ดีต่อสุขภาพ"

มาบันทึกผลลัพธ์การฝังสำหรับตัวอย่างต่อไปของเรากันเถอะ:

 da . save_binary ( 'ttl-image' )

การค้นหาแบบ cross-modal image-to-text ใน 10 บรรทัด

นอกจากนี้เรายังสามารถสลับอินพุตและเอาต์พุตของโปรแกรมสุดท้ายเพื่อให้ได้การค้นหาแบบรูปภาพเป็นข้อความ ได้อย่างแม่นยำเมื่อได้รับภาพแบบสอบถามค้นหาประโยคที่อธิบายภาพได้ดีที่สุด

มาใช้ประโยคทั้งหมดจากหนังสือ "Pride and Prejudice"

 from docarray import Document , DocumentArray

d = Document ( uri = 'https://www.gutenberg.org/files/1342/1342-0.txt' ). load_uri_to_text ()
da = DocumentArray (
    Document ( text = s . strip ()) for s in d . text . replace ( ' r n ' , '' ). split ( '.' ) if s . strip ()
)

มาดูสิ่งที่เราได้รับ:

 da . summary ()

            Documents Summary            
                                         
  Length                 6403            
  Homogenous Documents   True            
  Common Attributes      ('id', 'text')  
                                         
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    6403             False            
  text        ('str',)    6030             False

เข้ารหัสประโยค

ตอนนี้เข้ารหัสประโยค 6,403 ประโยคเหล่านี้อาจใช้เวลา 10 วินาทีหรือน้อยกว่าขึ้นอยู่กับ GPU และเครือข่ายของคุณ:

 from clip_client import Client

c = Client ( 'grpc://0.0.0.0:51000' )

r = c . encode ( da , show_progress = True )

ดาวน์โหลดชุดข้อมูลที่เข้ารหัสล่วงหน้า

อีกครั้งสำหรับคนที่ใจร้อนหรือไม่มี GPU เราได้เตรียมชุดข้อมูลข้อความที่เข้ารหัสล่วงหน้า:

 from docarray import DocumentArray

da = DocumentArray . pull ( 'ttl-textual' , show_progress = True , local_cache = True )

ค้นหาผ่านภาพ

มาโหลดการฝังอิมเมจที่เก็บไว้ก่อนหน้านี้ของเราตัวอย่างเอกสารภาพ 10 แบบสุ่มจากนั้นค้นหาเพื่อนบ้านที่ใกล้ที่สุดที่ใกล้ที่สุดของแต่ละคน

 from docarray import DocumentArray

img_da = DocumentArray . load_binary ( 'ttl-image' )

for d in img_da . sample ( 10 ):
    print ( da . find ( d . embedding , limit = 1 )[ 0 ]. text )

การแสดง

สนุก! หมายเหตุซึ่งแตกต่างจากตัวอย่างก่อนหน้านี้อินพุตเป็นภาพและประโยคคือเอาต์พุต ประโยคทั้งหมดมาจากหนังสือ "Pride and Prejudice"


นอกจากนี้ยังมีความจริงในรูปลักษณ์ของเขา	การ์ดิเนอร์ยิ้ม	เขาชื่ออะไร	อย่างไรก็ตามตามเวลาน้ำชาปริมาณก็เพียงพอแล้วและนาย	คุณดูไม่ดี


“ นักเล่นเกม!” เธอร้องไห้	หากคุณพูดถึงชื่อของฉันที่ระฆังคุณจะเข้าร่วม	ไม่เป็นไรผมของ Miss Lizzy	เอลิซาเบ ธ จะเป็นภรรยาของนายในไม่ช้า	ฉันเห็นพวกเขาเมื่อคืนก่อน

อันดับข้อความภาพที่ตรงกันผ่าน Clip Model

จาก 0.3.0 Clip-as Service เพิ่มจุดสิ้นสุด /rank ใหม่ที่จัดอันดับการแข่งขันข้ามรูปแบบใหม่ตามโอกาสร่วมกันในรูปแบบคลิป ตัวอย่างเช่นได้รับเอกสารรูปภาพที่มีการจับคู่ประโยคที่กำหนดไว้ล่วงหน้าบางอย่างด้านล่าง:

 from clip_client import Client
from docarray import Document

c = Client ( server = 'grpc://0.0.0.0:51000' )
r = c . rank (
    [
        Document (
            uri = '.github/README-img/rerank.png' ,
            matches = [
                Document ( text = f'a photo of a { p } ' )
                for p in (
                    'control room' ,
                    'lecture room' ,
                    'conference room' ,
                    'podium indoor' ,
                    'television studio' ,
                )
            ],
        )
    ]
)

print ( r [ '@m' , [ 'text' , 'scores__clip_score__value' ]])

 [['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'], 
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]

ตอนนี้สามารถเห็น a photo of a television studio ได้รับการจัดอันดับให้อยู่ในอันดับต้น ๆ ด้วยคะแนน clip_score ที่ 0.992 ในทางปฏิบัติเราสามารถใช้จุดสิ้นสุดนี้เพื่อจัดอันดับผลลัพธ์การจับคู่อีกครั้งจากระบบการค้นหาอื่นเพื่อปรับปรุงคุณภาพการค้นหาข้ามรูปแบบ

จัดอันดับภาพลักษณ์ข้อความที่ตรงกันผ่านคลิปโมเดล

ในโครงการ Dall · E Flow คลิปเรียกว่าจัดอันดับผลลัพธ์ที่สร้างขึ้นจาก Dall · E มันมีผู้บริหารที่ห่อหุ้ม clip-client ซึ่งเรียก .arank() - เวอร์ชัน async ของ .rank() :

 from clip_client import Client
from jina import Executor , requests , DocumentArray


class ReRank ( Executor ):
    def __init__ ( self , clip_server : str , ** kwargs ):
        super (). __init__ ( ** kwargs )
        self . _client = Client ( server = clip_server )

    @ requests ( on = '/' )
    async def rerank ( self , docs : DocumentArray , ** kwargs ):
        return await self . _client . arank ( docs )

คลิปขณะที่ใช้ในการไหลของ Dalle

สนใจ? นั่นเป็นเพียงรอยขีดข่วนพื้นผิวของสิ่งที่คลิปขณะบริการมีความสามารถ อ่านเอกสารของเราเพื่อเรียนรู้เพิ่มเติม

สนับสนุน

เข้าร่วมชุมชน Discord ของเราและแชทกับสมาชิกชุมชนคนอื่น ๆ เกี่ยวกับแนวคิด
ดูวิศวกรรมทั้งหมดของเราเพื่อเรียนรู้คุณสมบัติใหม่ของ Jina และติดตามความทันสมัยด้วยเทคนิค AI ล่าสุด
สมัครรับบทเรียนวิดีโอล่าสุดในช่อง YouTube ของเรา

เข้าร่วมกับเรา

Clip-as-Service ได้รับการสนับสนุนโดย Jina AI และได้รับใบอนุญาตภายใต้ Apache-2.0 เรากำลังจ้างวิศวกร AI วิศวกรโซลูชันเพื่อสร้างระบบนิเวศการค้นหาประสาทต่อไปในโอเพนซอร์ซ

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน v0.8.3
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2025-03-02
ขนาด 11.55MB
มาจาก Github

แอปที่เกี่ยวข้อง

Inf CLIP

2024-11-03
เวอร์ชันภาษาจีนบริการเต็มรูปแบบ

2023-10-20
เป็นหลุม

2023-05-29
เป็นซอฟต์แวร์รู

2023-05-29
รหัส AS ส่วนหัว

2022-07-24
คลิปบัคเก็ต

2011-05-24

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

ซอร์สโค้ดอื่น ๆ

1.0.0
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
hidusbf

ซอร์สโค้ดอื่น ๆ

1.0.0

ข้อมูลที่เกี่ยวข้อง ทั้งหมด

ภาพ	ผ่าน https?
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/1/300/300", "matches": [{"text": "there is a woman in the photo"}, {"text": "there is a man in the photo"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " ให้: `"there is a woman in the photo" 0.626907229423523 "there is a man in the photo" 0.37309277057647705`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/133/300/300", "matches": [ {"text": "the blue car is on the left, the red car is on the right"}, {"text": "the blue car is on the right, the red car is on the left"}, {"text": "the blue car is on top of the red car"}, {"text": "the blue car is below the red car"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " ให้: `"the blue car is on the left, the red car is on the right" 0.5232442617416382 "the blue car is on the right, the red car is on the left" 0.32878655195236206 "the blue car is below the red car" 0.11064132302999496 "the blue car is on top of the red car" 0.03732786327600479`
	curl -X POST https:// < your-inference-address > -http.wolf.jina.ai/post -H ' Content-Type: application/json ' -H ' Authorization: <your access token> ' -d ' {"data":[{"uri": "https://picsum.photos/id/102/300/300", "matches": [{"text": "this is a photo of one berry"}, {"text": "this is a photo of two berries"}, {"text": "this is a photo of three berries"}, {"text": "this is a photo of four berries"}, {"text": "this is a photo of five berries"}, {"text": "this is a photo of six berries"}]}], "execEndpoint":"/rank"} ' \| jq " .data[].matches[] \| (.text, .scores.clip_score.value) " ให้: `"this is a photo of three berries" 0.48507222533226013 "this is a photo of four berries" 0.2377079576253891 "this is a photo of one berry" 0.11304923892021179 "this is a photo of five berries" 0.0731358453631401 "this is a photo of two berries" 0.05045759305357933 "this is a photo of six berries" 0.04057715833187103`