uform 다운로드 uform 소스 코드 다운로드

UFF

포켓 크기의 멀티 모달 AI
콘텐츠 이해와 세대를 위해

LinkedIn 지저귀다 블로그 github

64 ~ 768 크기의 멀티 모드 임베딩 • 1B 매개 변수 채팅
짧은 텍스트 • 이미지 • 비디오 클립 • 긴 문서
Onnx • Coreml • Pytorch
파이썬 • JavaScript • Swift

uform 채팅 미리보기

효율적으로 다재다능한 다재다능한 멀티 모달 AI 라이브러리 인 UForm에 오신 것을 환영합니다. Uffort Tiny Embedding 모델은 다양한 언어에서 시각적 및 텍스트 컨텐츠를 이해하고 검색하는 데 도움이됩니다. 반면에 소형 생성 모델은 대화 및 채팅 사용 사례를 지원할뿐만 아니라 빠른 이미지 캡션 및 시각적 질문 응답 (VQA)에 적합합니다. 소형 사용자 정의 사전 훈련 된 변압기 모델을 사용하면 서버 팜에서 스마트 폰까지 어디에서나 실행할 수 있습니다.

특징

작은 임베딩 : 매우 빠른 검색을위한 64 차원 Matryoshka 스타일 임베딩.
Throughput : Thanks to the small size, the inference speed is 2-4x faster than competitors.
Portable : Models come with native ONNX support, making them easy to deploy on any platform.
Quantization Aware : Down-cast embeddings from f32 to i8 without losing much recall.
Multilingual : Trained on a balanced dataset, the recall is great across over 20 languages.

모델

For accuracy and speed benchmarks refer to the evaluation page.

임베딩 모델

모델	매개 변수	언어	건축학
`uform3-image-text-english-large` ?	365 M	1	12 층 버트, VIT-L/14
`uform3-image-text-english-base`	143m	1	4 층 버트, VIT-B/16
`uform3-image-text-english-small` ?	79m	1	4 층 버트, VIT-S/16
`uform3-image-text-multilingual-base`	206m	21	12 층 버트, VIT-B/16

생성 모델

모델	매개 변수	목적	건축학
`uform-gen2-dpo` ?	1.2 b	채팅, 이미지 캡션, vqa	QWEN1.5-0.5B, VIT-H/14
`uform-gen2-qwen-500m`	1.2 b	채팅, 이미지 캡션, vqa	QWEN1.5-0.5B, VIT-H/14
`uform-gen` 켈	1.5 b	이미지 캡션, VQA	LLAMA-1.3B, VIT-B/16

빠른 시작 예제

임베딩 모델

먼저, pip install uform . 그런 다음 모델을로드하십시오.

 from uform import get_model , Modality

processors , models = get_model ( 'unum-cloud/uform3-image-text-english-small' )

model_text = models [ Modality . TEXT_ENCODER ]
model_image = models [ Modality . IMAGE_ENCODER ]
processor_text = processors [ Modality . TEXT_ENCODER ]
processor_image = processors [ Modality . IMAGE_ENCODER ]

이미지 포함 :

 import requests
from io import BytesIO
from PIL import Image

image_url = 'https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'
image = Image . open ( BytesIO ( requests . get ( image_url ). content ))
image_data = processor_image ( image )
image_features , image_embedding = model_image . encode ( image_data , return_features = True )

쿼리 삽입 :

 text = 'a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'
text_data = processor_text ( text )
text_features , text_embedding = model_text . encode ( text_data , return_features = True )

자세한 내용은 다음을 확인하십시오.

Python/Readme.md에 모델 임베딩에 대한 Python 문서
JavaScript/readme.md에 모델 임베딩에 대한 JavaScript 문서
Swift/Readme.md에 모델 임베딩에 대한 신속한 문서

생성 모델

생성 모델은 기본적으로 호환됩니다

 from transformers import AutoModel , AutoProcessor

model = AutoModel . from_pretrained ( 'unum-cloud/uform-gen2-dpo' , trust_remote_code = True )
processor = AutoProcessor . from_pretrained ( 'unum-cloud/uform-gen2-dpo' , trust_remote_code = True )

prompt = 'Question or Instruction'
image = Image . open ( 'image.jpg' )

inputs = processor ( text = [ prompt ], images = [ image ], return_tensors = 'pt' )

with torch . inference_mode ():
     output = model . generate (
        ** inputs ,
        do_sample = False ,
        use_cache = True ,
        max_new_tokens = 256 ,
        eos_token_id = 151645 ,
        pad_token_id = processor . tokenizer . pad_token_id
    )
prompt_len = inputs [ 'input_ids' ]. shape [ 1 ]
decoded_text = processor . batch_decode ( output [:, prompt_len :])[ 0 ]

자세한 내용은 다음을 확인하십시오.

Python/Readme.md의 생성 모델에 대한 Python 문서
생성 모델에 대한 JavaScript 문서
생성 모델에 대한 신속한 문서

기술적 인 세부 사항

다운 캐스팅, 양자화, Matryoshka 및 슬라이싱

응용 프로그램에 따라, 임베딩은 많은 리콜을 잃지 않고 더 작은 숫자 표현으로 다운 캐스트 될 수 있습니다. Half-Precision 지원없이 매우 오래된 하드웨어에서 실행되지 않는 한 거의 모든 경우에 f32 에서 f16 으로 전환하는 것이 좋습니다. 선형 스케일링으로 i8 로 전환하는 것도 가능하지만 수백만 개의 검색 가능한 항목이있는 더 큰 컬렉션의 리콜에서 눈에 띄게됩니다. 마찬가지로, 고차원 임베드 (512 또는 768)의 경우 일반적인 전략은 더 빠른 검색을 위해 단일 비트 표현으로 정량화하는 것입니다.

 import numpy as np

f32_embedding : np . ndarray = model . encode_text ( text_data , return_features = False )
f16_embedding : np . ndarray = f32_embedding . astype ( np . float16 )
i8_embedding : np . ndarray = ( f32_embedding * 127 ). astype ( np . int8 )
b1_embedding : np . ndarray = np . packbits (( f32_embedding > 0 ). astype ( np . uint8 ))

양자화에 대한 대안 적 접근은 임베딩이 작은 부분으로 썰어지고 검색은 계층 적 방식으로 수행되는 Matryoshka 임베딩을 사용하는 것입니다.

 import numpy as np

large_embedding : np . ndarray = model . encode_text ( text_data , return_features = False )
small_embedding : np . ndarray = large_embedding [:, : 256 ]
tiny_embedding : np . ndarray = large_embedding [:, : 64 ]

두 가지 접근법 모두 USEARCH 벡터 검색 엔진과 SIMSIMD Numerics 라이브러리에서 기본적으로 지원됩니다. 작은 컬렉션 (최대 수백만 개의 항목)을 다루고 저도가 낮은 코사인 거리 계산을 찾을 때 Simsimd를 사용하여 Torch, Numpy, Scipy 및 Vanilla Python보다 5x-2500x 성능 향상을 달성 할 수 있습니다.

 from simsimd import cosine , hamming

distance : float = cosine ( f32_embedding , f32_embedding ) # 32x SciPy performance on Apple M2 CPU
distance : float = cosine ( f16_embedding , f16_embedding ) # 79x SciPy performance on Apple M2 CPU
distance : float = cosine ( i8_embedding , i8_embedding ) # 133x SciPy performance on Apple M2 CPU
distance : float = hamming ( b1_embedding , b1_embedding ) # 17x SciPy performance on Apple M2 CPU

마찬가지로, 대규모 컬렉션 (서버 당 최대 수십억 개의 항목)을 처리하고 고 처리량 검색을 찾을 때 USEarch를 사용하여 FAISS 및 기타 벡터 검색 솔루션보다 100 배의 성능 향상을 달성 할 수 있습니다. 몇 가지 예는 다음과 같습니다.

 from usearch . index import Index

f32_index = Index ( ndim = 64 , metric = 'cos' , dtype = 'f32' ) # for Matryoshka embeddings
f16_index = Index ( ndim = 64 , metric = 'cos' , dtype = 'f16' ) # for Matryoshka embeddings
i8_index = Index ( ndim = 256 , metric = 'cos' , dtype = 'i8' ) # for quantized embeddings
b1_index = Index ( ndim = 768 , metric = 'hamming' , dtype = 'b1' ) # for binary embeddings

소형 포장

Pytorch는 특히 Edge 또는 IoT 장치에서 실행되는 경우 운반하기에 크게 의존합니다. 바닐라 onx 런타임을 사용하면 메모리 소비 및 배포 대기 시간을 크게 줄일 수 있습니다.

$ conda create -n uform_torch python=3.10 -y
$ conda create -n uform_onnx python=3.10 -y
$ conda activate uform_torch && pip install -e " .[torch] " && conda deactivate
$ conda activate uform_onnx && pip install -e " .[onnx] " && conda deactivate
$ du -sh $( conda info --envs | grep ' uform_torch ' | awk ' {print $2} ' )
> 5.2G    ~ /conda/envs/uform_torch
$ du -sh $( conda info --envs | grep ' uform_onnx ' | awk ' {print $2} ' )
> 461M    ~ /conda/envs/uform_onnx

이 중량의 대부분은 모델과 런타임 모두에 대해 100MB로 더 줄일 수 있습니다. NVIDIA GPUS 용 XNNPACK, CUDA 및 Tensorrt를 포함하여 많은 지원되는 ONNX 실행 제공자 중 하나를 선택할 수 있습니다.

CLI의 멀티 모달 채팅

생성 모델은 명령 줄의 채팅과 같은 경험에 사용할 수 있습니다. 이를 위해 UFORM 패키지에서 사용할 수있는 uform-chat CLI 도구를 사용할 수 있습니다.

$ pip install uform
$ uform-chat --model unum-cloud/uform-gen2-dpo --image=zebra.jpg
$ uform-chat --model unum-cloud/uform-gen2-dpo 
>     --image= " https://bit.ly/3tIVg9M " 
>     --device= " cuda:0 " 
>     --fp16

확장하다

uform

UFF

포켓 크기의 멀티 모달 AI
콘텐츠 이해와 세대를 위해

특징

모델

임베딩 모델

생성 모델

빠른 시작 예제

임베딩 모델

생성 모델

기술적 인 세부 사항

다운 캐스팅, 양자화, Matryoshka 및 슬라이싱

소형 포장

CLI의 멀티 모달 채팅

UFO 장애물 경주 모바일 게임

QEDAUFON 앱

UFO 침입 무료 정품

UFO 기록 보관소 기사 시스템

도둑이 UFO를 수집하다

UFO

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express

uform

UFF

포켓 크기의 멀티 모달 AI 콘텐츠 이해와 세대를 위해

특징

모델

임베딩 모델

생성 모델

빠른 시작 예제

임베딩 모델

생성 모델

기술적 인 세부 사항

다운 캐스팅, 양자화, Matryoshka 및 슬라이싱

소형 포장

CLI의 멀티 모달 채팅

포켓 크기의 멀티 모달 AI
콘텐츠 이해와 세대를 위해