首页>编程相关>Python

我们与社区AIMV2预先训练的能力,培训前决议的预培训的检查站:

安装

请使用官方安装说明安装Pytorch。之后,将软件包安装为:

 pip install 'git+https://github.com/apple/ml-aim.git#subdirectory=aim-v1'
pip install 'git+https://github.com/apple/ml-aim.git#subdirectory=aim-v2'

我们还为苹果硅的研究和实验提供了MLX后端支持。要启用MLX支持,只需运行:

 pip install mlx

例子

使用Pytorch

 from PIL import Image

from aim . v2 . utils import load_pretrained
from aim . v1 . torch . data import val_transforms

img = Image . open (...)
model = load_pretrained ( "aimv2-large-patch14-336" , backend = "torch" )
transform = val_transforms ( img_size = 336 )

inp = transform ( img ). unsqueeze ( 0 )
features = model ( inp )

使用MLX

 from PIL import Image
import mlx . core as mx

from aim . v2 . utils import load_pretrained
from aim . v1 . torch . data import val_transforms

img = Image . open (...)
model = load_pretrained ( "aimv2-large-patch14-336" , backend = "mlx" )
transform = val_transforms ( img_size = 336 )

inp = transform ( img ). unsqueeze ( 0 )
inp = mx . array ( inp . numpy ())
features = model ( inp )

使用jax

 from PIL import Image
import jax . numpy as jnp

from aim . v2 . utils import load_pretrained
from aim . v1 . torch . data import val_transforms

img = Image . open (...)
model , params = load_pretrained ( "aimv2-large-patch14-336" , backend = "jax" )
transform = val_transforms ( img_size = 336 )

inp = transform ( img ). unsqueeze ( 0 )
inp = jnp . array ( inp )
features = model . apply ({ "params" : params }, inp )

预训练的检查点

可以通过HuggingFace Hub访问预训练的模型:

 from PIL import Image
from transformers import AutoImageProcessor , AutoModel

image = Image . open (...)
processor = AutoImageProcessor . from_pretrained ( "apple/aimv2-large-patch14-336" )
model = AutoModel . from_pretrained ( "apple/aimv2-large-patch14-336" , trust_remote_code = True )

inputs = processor ( images = image , return_tensors = "pt" )
outputs = model ( ** inputs )

AIMV2与224px

model_id #params IN-1K HF链接骨干
AIMV2-LARGE-PATCH14-224 0.3b 86.6 ?关联关联
AIMV2-HUGE-PATCH14-224 0.6b 87.5 ?关联关联
AIMV2-1B-PATCH14-224 1.2b 88.1 ?关联关联
AIMV2-3B-PATCH14-224 2.7b 88.5 ?关联关联

AIMV2与336px

model_id #params IN-1K HF链接骨干
AIMV2-LARGE-PATCH14-336 0.3b 87.6 ?关联关联
AIMV2-HUGE-PATCH14-336 0.6b 88.2 ?关联关联
AIMV2-1B-PATCH14-336 1.2b 88.7 ?关联关联
AIMV2-3B-PATCH14-336 2.7b 89.2 ?关联关联

AIMV2的448px

model_id #params IN-1K HF链接骨干
AIMV2-LARGE-PATCH14-448 0.3b 87.9 ?关联关联
AIMV2-HUGE-PATCH14-448 0.6b 88.6 ?关联关联
AIMV2-1B-PATCH14-448 1.2b 89.0 ?关联关联
AIMV2-3B-PATCH14-448 2.7b 89.5 ?关联关联

AIMV2具有本地分辨率

我们还提供了一个AIMV2-L检查点,该检查点旨在处理各种图像分辨率和宽高比。不管纵横比如何,对图像进行了修补(Patch_size = 14),并将2D正弦位置嵌入添加到线性投射的输入贴片中。该检查点支持[112,4096]范围内的补丁数量

model_id #params IN-1K HF链接骨干
AIMV2-LARGE-PATCH14-NENATIDE 0.3b 87.3 ?关联关联

AIMV2蒸馏vit-large

我们提供了从AIMV2-3B蒸馏出的AIMV2-L检查点,该检查点为多模式理解基准提供了出色的性能。

模型VQAV2 GQA OKVQA textvqa DOCVQA Infovqa Chartqa科学mmep
AIMV2-L 80.2 72.6 60.9 53.9 26.8 22.4 20.3 74.5 1457
AIMV2-L diSTLED 81.1 73.0 61.4 53.5 29.2 23.3 24.0 76.3 1627年
model_id #params res。 HF链接骨干
AIMV2-LARGE-PATCH14-224-DISTILD 0.3b 224px ?关联关联
AIMV2-LARGE-PATCH14-336延伸0.3b 336px ?关联关联

零射击适应AIMV2

在点亮调整后,我们提供AIMV2-L视觉和文本编码器,以实现零击识别。

模型#params 1-K中的零射击骨干
AIMV2-L 0.3b 77.0关联

引用

如果您发现我们的工作有用,请考虑以我们为:

AIMV2 BIBTEX

 @misc { fini2024multimodal ,
    title = { Multimodal Autoregressive Pre-training of Large Vision Encoders } ,
    author = { Enrico Fini and Mustafa Shukor and Xiujun Li and Philipp Dufter and Michal Klein and David Haldimann and Sai Aitharaju and Victor Guilherme Turrisi da Costa and Louis Béthune and Zhe Gan and Alexander T Toshev and Marcin Eichner and Moin Nabi and Yinfei Yang and Joshua M. Susskind and Alaaeldin El-Nouby } ,
    year = { 2024 } ,
    eprint = { 2411.14402 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

AIMV1 BIBTEX

 @InProceedings { pmlr-v235-el-nouby24a ,
  title     = { Scalable Pre-training of Large Autoregressive Image Models } ,
  author    = { El-Nouby, Alaaeldin and Klein, Michal and Zhai, Shuangfei and Bautista, Miguel '{A}ngel and Shankar, Vaishaal and Toshev, Alexander T and Susskind, Joshua M. and Joulin, Armand } ,
  booktitle = { Proceedings of the 41st International Conference on Machine Learning } ,
  pages     = { 12371--12384 } ,
  year      = { 2024 } ,
}

执照

在使用提供的代码和型号之前,请在存储库许可证上查看。

展开
附加信息