首頁>編程相關>Python

我們與社區AIMV2預先訓練的能力,培訓前決議的預培訓的檢查站:

安裝

請使用官方安裝說明安裝Pytorch。之後,將軟件包安裝為:

 pip install 'git+https://github.com/apple/ml-aim.git#subdirectory=aim-v1'
pip install 'git+https://github.com/apple/ml-aim.git#subdirectory=aim-v2'

我們還為蘋果矽的研究和實驗提供了MLX後端支持。要啟用MLX支持,只需運行:

 pip install mlx

例子

使用Pytorch

 from PIL import Image

from aim . v2 . utils import load_pretrained
from aim . v1 . torch . data import val_transforms

img = Image . open (...)
model = load_pretrained ( "aimv2-large-patch14-336" , backend = "torch" )
transform = val_transforms ( img_size = 336 )

inp = transform ( img ). unsqueeze ( 0 )
features = model ( inp )

使用MLX

 from PIL import Image
import mlx . core as mx

from aim . v2 . utils import load_pretrained
from aim . v1 . torch . data import val_transforms

img = Image . open (...)
model = load_pretrained ( "aimv2-large-patch14-336" , backend = "mlx" )
transform = val_transforms ( img_size = 336 )

inp = transform ( img ). unsqueeze ( 0 )
inp = mx . array ( inp . numpy ())
features = model ( inp )

使用jax

 from PIL import Image
import jax . numpy as jnp

from aim . v2 . utils import load_pretrained
from aim . v1 . torch . data import val_transforms

img = Image . open (...)
model , params = load_pretrained ( "aimv2-large-patch14-336" , backend = "jax" )
transform = val_transforms ( img_size = 336 )

inp = transform ( img ). unsqueeze ( 0 )
inp = jnp . array ( inp )
features = model . apply ({ "params" : params }, inp )

預訓練的檢查點

可以通過HuggingFace Hub訪問預訓練的模型:

 from PIL import Image
from transformers import AutoImageProcessor , AutoModel

image = Image . open (...)
processor = AutoImageProcessor . from_pretrained ( "apple/aimv2-large-patch14-336" )
model = AutoModel . from_pretrained ( "apple/aimv2-large-patch14-336" , trust_remote_code = True )

inputs = processor ( images = image , return_tensors = "pt" )
outputs = model ( ** inputs )

AIMV2與224px

model_id #params IN-1K HF鏈接骨幹
AIMV2-LARGE-PATCH14-224 0.3b 86.6 ?關聯關聯
AIMV2-HUGE-PATCH14-224 0.6b 87.5 ?關聯關聯
AIMV2-1B-PATCH14-224 1.2b 88.1 ?關聯關聯
AIMV2-3B-PATCH14-224 2.7b 88.5 ?關聯關聯

AIMV2與336px

model_id #params IN-1K HF鏈接骨幹
AIMV2-LARGE-PATCH14-336 0.3b 87.6 ?關聯關聯
AIMV2-HUGE-PATCH14-336 0.6b 88.2 ?關聯關聯
AIMV2-1B-PATCH14-336 1.2b 88.7 ?關聯關聯
AIMV2-3B-PATCH14-336 2.7b 89.2 ?關聯關聯

AIMV2的448px

model_id #params IN-1K HF鏈接骨幹
AIMV2-LARGE-PATCH14-448 0.3b 87.9 ?關聯關聯
AIMV2-HUGE-PATCH14-448 0.6b 88.6 ?關聯關聯
AIMV2-1B-PATCH14-448 1.2b 89.0 ?關聯關聯
AIMV2-3B-PATCH14-448 2.7b 89.5 ?關聯關聯

AIMV2具有本地分辨率

我們還提供了一個AIMV2-L檢查點,該檢查點旨在處理各種圖像分辨率和寬高比。不管縱橫比如何,對圖像進行了修補(Patch_size = 14),並將2D正弦位置嵌入添加到線性投射的輸入貼片中。該檢查點支持[112,4096]範圍內的補丁數量

model_id #params IN-1K HF鏈接骨幹
AIMV2-LARGE-PATCH14-NENATIDE 0.3b 87.3 ?關聯關聯

AIMV2蒸餾vit-large

我們提供了從AIMV2-3B蒸餾出的AIMV2-L檢查點,該檢查點為多模式理解基準提供了出色的性能。

模型VQAV2 GQA OKVQA textvqa DOCVQA Infovqa Chartqa科學mmep
AIMV2-L 80.2 72.6 60.9 53.9 26.8 22.4 20.3 74.5 1457
AIMV2-L diSTLED 81.1 73.0 61.4 53.5 29.2 23.3 24.0 76.3 1627年
model_id #params res。 HF鏈接骨幹
AIMV2-LARGE-PATCH14-224-DISTILD 0.3b 224px ?關聯關聯
AIMV2-LARGE-PATCH14-336延伸0.3b 336px ?關聯關聯

零射擊適應AIMV2

在點亮調整後,我們提供AIMV2-L視覺和文本編碼器,以實現零擊識別。

模型#params 1-K中的零射擊骨幹
AIMV2-L 0.3b 77.0關聯

引用

如果您發現我們的工作有用,請考慮以我們為:

AIMV2 BIBTEX

 @misc { fini2024multimodal ,
    title = { Multimodal Autoregressive Pre-training of Large Vision Encoders } ,
    author = { Enrico Fini and Mustafa Shukor and Xiujun Li and Philipp Dufter and Michal Klein and David Haldimann and Sai Aitharaju and Victor Guilherme Turrisi da Costa and Louis Béthune and Zhe Gan and Alexander T Toshev and Marcin Eichner and Moin Nabi and Yinfei Yang and Joshua M. Susskind and Alaaeldin El-Nouby } ,
    year = { 2024 } ,
    eprint = { 2411.14402 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

AIMV1 BIBTEX

 @InProceedings { pmlr-v235-el-nouby24a ,
  title     = { Scalable Pre-training of Large Autoregressive Image Models } ,
  author    = { El-Nouby, Alaaeldin and Klein, Michal and Zhai, Shuangfei and Bautista, Miguel '{A}ngel and Shankar, Vaishaal and Toshev, Alexander T and Susskind, Joshua M. and Joulin, Armand } ,
  booktitle = { Proceedings of the 41st International Conference on Machine Learning } ,
  pages     = { 12371--12384 } ,
  year      = { 2024 } ,
}

執照

在使用提供的代碼和型號之前,請在存儲庫許可證上查看。

展開
附加信息