Téléchargement XMem - Téléchargement du code source XMem

XMem

Python

v1.0

Télécharger

XMEM: Segmentation d'objets vidéo à long terme avec un modèle de mémoire Atkinson-Shiffrin

Nouveau projet VOS: remettre l'objet dans la segmentation des objets vidéo: https://github.com/hkchengrex/cutie

Nouveau projet: Segmentation vidéo en monde ouvert avec XMEM: https://github.com/hkchengrex/tracking-anything-with-deva

Ho Kei Cheng, Alexander Schwing

Université de l'Illinois Urbana-Champaign

[arXiv] [PDF] [Page du projet]

Démo

Gestion de l'occlusion à long terme:

cans_crf20.mp4

Vidéo très longue; Insertion de couche masquée:

breakdance_soft_crf20.mp4

Source: https://www.youtube.com/watch?v=q5xr0f4a0iu

Cas hors du domaine:

Fujiwara_chika.mp4

Source: Kaguya-sama: L'amour est la guerre - la bataille de l'amour et du cerveau des génies - Ep.3; A-1 Photos

[Cas de défaillance]

Caractéristiques

Gérer les vidéos très longues avec une utilisation limitée de la mémoire GPU.
Très rapide. Attendez-vous à ~ 20 ips même avec de longues vidéos (matériel dépendant).
Venez avec une interface graphique (modifiée à partir de mibos).

Table des matières

Introduction
Résultats
Demo interactif de l'interface graphique
Formation / inférence
Citation

Introduction

Nous encadrer la segmentation des objets vidéo (VOS), avant tout, en tant que problème de mémoire . Les travaux antérieurs utilisent principalement un seul type de mémoire de fonctionnalité. Cela peut être sous la forme de poids de réseau (c.-à-d. Apprentissage en ligne), de la dernière segmentation des trames (par exemple, masktrack), de représentation cachée spatiale (par exemple, méthodes basées sur Conv-RNN), de caractéristiques spatiales-intérimaires (par exemple, STM, STCN, AOT), ou une sorte de caractéristiques compactes à long terme (EG, AFB-urr).

Les méthodes avec une courte durée de mémoire ne sont pas robustes aux changements, tandis que ceux avec une grande banque de mémoire sont soumis à une augmentation catastrophique du calcul et de l'utilisation de la mémoire du GPU. Les tentatives d'OVE attentionnelles à long terme comme la compresse AFB-urr sont impatiente dès qu'elles sont générées, conduisant à une perte de résolution des fonctionnalités.

Notre méthode est inspirée par le modèle de mémoire humaine Atkinson-Shiffrin, qui a une mémoire sensorielle , une mémoire de travail et une mémoire à long terme . Ces magasins de mémoire ont des échelles temporaires différentes et se complètent dans notre mécanisme de lecture de mémoire. Il fonctionne bien dans les ensembles de données vidéo à court terme et à long terme, gérant facilement les vidéos avec plus de 10 000 images.

Formation / inférence

Tout d'abord, installez les packages et ensembles de données Python requis suivant getS_Started.md.

Pour la formation, voir Training.Md.

Pour l'inférence, voir Inférence.md.

Projets / extensions connexes:

Suivre n'importe quoi
Deva
Autotrackanything

Citation

Veuillez citer notre article si vous trouvez ce repo utile!

 @inproceedings { cheng2022xmem ,
  title = { {XMem}: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model } ,
  author = { Cheng, Ho Kei and Alexander G. Schwing } ,
  booktitle = { ECCV } ,
  year = { 2022 }
}

Projets connexes sur lesquels ce document est développé:

 @inproceedings { cheng2021stcn ,
  title = { Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation } ,
  author = { Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung } ,
  booktitle = { NeurIPS } ,
  year = { 2021 }
}

@inproceedings { cheng2021mivos ,
  title = { Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion } ,
  author = { Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung } ,
  booktitle = { CVPR } ,
  year = { 2021 }
}

Nous utilisons F-BRS dans la démo interactive: https://github.com/saic-vul/fbrs_interactive_segmentation

Et si vous souhaitez citer les ensembles de données:

bibtex

 @inproceedings { shi2015hierarchicalECSSD ,
  title = { Hierarchical image saliency detection on extended CSSD } ,
  author = { Shi, Jianping and Yan, Qiong and Xu, Li and Jia, Jiaya } ,
  booktitle = { TPAMI } ,
  year = { 2015 } ,
}

@inproceedings { wang2017DUTS ,
  title = { Learning to Detect Salient Objects with Image-level Supervision } ,
  author = { Wang, Lijun and Lu, Huchuan and Wang, Yifan and Feng, Mengyang 
  and Wang, Dong, and Yin, Baocai and Ruan, Xiang } , 
  booktitle = { CVPR } ,
  year = { 2017 }
}

@inproceedings { FSS1000 ,
  title = { FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation } ,
  author = { Li, Xiang and Wei, Tianhan and Chen, Yau Pun and Tai, Yu-Wing and Tang, Chi-Keung } ,
  booktitle = { CVPR } ,
  year = { 2020 }
}

@inproceedings { zeng2019towardsHRSOD ,
  title = { Towards High-Resolution Salient Object Detection } ,
  author = { Zeng, Yi and Zhang, Pingping and Zhang, Jianming and Lin, Zhe and Lu, Huchuan } ,
  booktitle = { ICCV } ,
  year = { 2019 }
}

@inproceedings { cheng2020cascadepsp ,
  title = { {CascadePSP}: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement } ,
  author = { Cheng, Ho Kei and Chung, Jihoon and Tai, Yu-Wing and Tang, Chi-Keung } ,
  booktitle = { CVPR } ,
  year = { 2020 }
}

@inproceedings { xu2018youtubeVOS ,
  title = { Youtube-vos: A large-scale video object segmentation benchmark } ,
  author = { Xu, Ning and Yang, Linjie and Fan, Yuchen and Yue, Dingcheng and Liang, Yuchen and Yang, Jianchao and Huang, Thomas } ,
  booktitle = { ECCV } ,
  year = { 2018 }
}

@inproceedings { perazzi2016benchmark ,
  title = { A benchmark dataset and evaluation methodology for video object segmentation } ,
  author = { Perazzi, Federico and Pont-Tuset, Jordi and McWilliams, Brian and Van Gool, Luc and Gross, Markus and Sorkine-Hornung, Alexander } ,
  booktitle = { CVPR } ,
  year = { 2016 }
}

@inproceedings { denninger2019blenderproc ,
  title = { BlenderProc } ,
  author = { Denninger, Maximilian and Sundermeyer, Martin and Winkelbauer, Dominik and Zidan, Youssef and Olefir, Dmitry and Elbadrawy, Mohamad and Lodhi, Ahsan and Katam, Harinandan } ,
  booktitle = { arXiv:1911.01911 } ,
  year = { 2019 }
}

@inproceedings { shapenet2015 ,
  title       = { {ShapeNet: An Information-Rich 3D Model Repository} } ,
  author      = { Chang, Angel Xuan and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and Xiao, Jianxiong and Yi, Li and Yu, Fisher } ,
  booktitle   = { arXiv:1512.03012 } ,
  year        = { 2015 }
}