semantic segmentationダウンロード - semantic segmentationソースコードのダウンロード

semantic segmentation

パイソン

v0.2.6

ダウンロード

セマンティックセグメンテーション

Pytorchに豊富なデータセットを備えた使いやすくカスタマイズ可能なSOTAセマンティックセグメンテーションモデル

バナー

メジャーリワーク！乞うご期待...

2022年以来多くの変更が変更されていますが、今日ではオープンワールドセグメンテーションモデル（セグメントでも）があります。ただし、従来のセグメンテーションモデルは、高精度とカスタムユースケースを依然として需要があります。このリポジトリは、新しいPytorchバージョン、更新されたモデル、およびカスタムデータセットなどで使用する方法のドキュメントに従って更新されます。

予想されるリリース日 - > 2024年5月

計画された機能：

トレーニングパイプライン全体で作り直します
ベースライン事前訓練モデル
新しい更新されたアイデア
SOTAバックボーンモデルとの簡単な統合（チュートリアルを使用）
カスタムデータセットのチュートリアル
分散トレーニング

廃棄される現在の機能：

提供されるデータセットの量は削減されます。しかし、代わりに、代表的なものはカスタムデータセットのチュートリアルを維持します。
提供されるモデルの量は削減されます。代わりに、貴重なトリックとモジュールが残り、任意のモデルと簡単に統合できます。
増強は、公式のtorchvisionv2変換に置き換えられます。
他のフレームワークとの変換と推論

特徴

次のタスクに適用できます：
- シーンの解析
- 人間の解析
- 顔の解析
- 医療画像セグメンテーション（近日公開）
20以上のデータセット
15以上のソタバックボーン
10以上のSOTAセマンティックセグメンテーションモデル
Pytorch、onnx、tflite、openvino export＆Inference

モデル動物園

サポートされているバックボーン：

ResNet（CVPR 2016）
ResnetD（Arxiv 2018）
Mobilenetv2（CVPR 2018）
Mobilenetv3（ICCV 2019）
MIT（Neurips2021）
REST（Neurips2021）
Micronet（ICCV 2021）
resnet+（arxiv2021）
PVTV2（CVMJ 2022）
プールフォーマー（CVPR 2022）
Convnext（CVPR 2022）
均一（arxiv2022）
van（arxiv2022）
ダビット（arxiv2022）

サポートされているヘッド/方法：

FCN（CVPR 2015）
upernet（ECCV 2018）
bisenetv1（ECCV 2018）
FPN（CVPR 2019）
SFNET（ECCV 2020）
Segformer（Neurips2021）
FAPN（ICCV 2021）
condnet（IEEE spl2021）
ライトハム（ICLR 2021）
Lawin（Arxiv2022）
トップフォーマー（CVPR 2022）

サポートされているスタンドアロンモデル：

bisenetv2（IJCV 2021）
ddrnet（arxiv2021）

サポートされているモジュール：

PPM（CVPR 2017）
PSA（arxiv2021）

ベンチマークと利用可能な事前訓練モデルのモデルを参照してください。

サポートされているバックボーンについては、バックボーンを確認してください。

注：ほとんどの方法には、事前に訓練されたモデルがありません。さまざまなモデルを1つのリポジトリで事前に訓練した重みと組み合わせて、自分自身を再訓練するためのリソースが限られていることは非常に困難です。

サポートされているデータセット

シーンの解析：

ADE20K
街並み
ココスタッフ
Camvid
Pascal-context
Mapillary Vistas
Sun RGB-D

人間の解析：

MHPV2
MHPV1
リップ
CCIHP
CIHP
ATR

顔の解析：

ヘレン
ラパ
ibugmask
celebamaskhq
facesynthetics

その他：

スイム

詳細とデータセットの準備については、データセットを参照してください。

利用可能な増強（クリックして展開）

ここでノートブックを確認して、増強効果をテストしてください。

ピクセルレベルの変換：

ColorJitter（明るさ、コントラスト、飽和、色相）
ガンマ、シャープネス、オートコンストラスト、イコライズ、ポスト化
Gaussianblur、グレイスケール

空間レベルの変換：

アフィン、ランダム言います
horizontalflip、verticalflip
CenterCrop、ランダムクラップ
パッド、resizepad、サイズ
ランダムレジットクラップ

使用法

インストール

Python> = 3.6
トーチ> = 1.8.1
TorchVision> = 0.9.1

次に、リポジトリをクローンして、次のようにプロジェクトをインストールします。

$ git clone https://github.com/sithu31296/semantic-segmentation
$ cd semantic-segmentation
$ pip install -e .

構成（クリックして展開）

configsで構成ファイルを作成します。 ADE20Kデータセットのサンプル構成はこちらをご覧ください。次に、それが必要かどうかと思うフィールドを編集します。この構成ファイルは、すべてのトレーニング、評価、予測スクリプトに必要です。

トレーニング（クリックして展開）

単一のGPUでトレーニングするには：

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

複数のGPUでトレーニングするには、構成ファイルのDDPフィールドをtrueに設定し、次のように実行します。

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/ < CONFIG_FILE_NAME > .yaml

評価（クリックして展開）

構成ファイルのMODEL_PATHトレーニングされたモデルディレクトリに設定してください。

$ python tools/val.py --cfg configs/ < CONFIG_FILE_NAME > .yaml

マルチスケールとフリップで評価するには、 MSFのENABLEをtrueに変更し、上記と同じコマンドを実行します。

推論

推論を行うには、構成ファイルのパラメーターを下から編集します。

MODELを変更してください>> NAMEとBACKBONE 、希望の事前に除外されたモデルになります。
DATASET変更>>前提型モデルに応じて、データセット名にNAMEを変更します。
TEST >> MODEL_PATHをテストモデルの前提条件の重みに設定します。
TEST >> FILEをファイルまたは画像フォルダーパスに変更します。
テスト結果はSAVE_DIRで保存されます。

 # # example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

テスト結果の例（Segformer-B2）：

test_result

他のフレームワークに変換する（onnx、coreml、openvino、tflite）

onnxとcoremlに変換するには、実行してください。

$ python tools/export.py --cfg configs/ < CONFIG_FILE_NAME > .yaml

OpenVinoとTfliteに変換するには、torch_optimizeを参照してください。

推論（onnx、openvino、tflite）

 # # ONNX Inference
$ python scripts/onnx_infer.py --model < ONNX_MODEL_PATH > --img-path < TEST_IMAGE_PATH >

# # OpenVINO Inference
$ python scripts/openvino_infer.py --model < OpenVINO_MODEL_PATH > --img-path < TEST_IMAGE_PATH >

# # TFLite Inference
$ python scripts/tflite_infer.py --model < TFLite_MODEL_PATH > --img-path < TEST_IMAGE_PATH >

参照（クリックして展開）

https://github.com/coincheung/bisenet
https://github.com/open-mmlab/mmsegmentation
https://github.com/rwightman/pytorch-image-models

引用（クリックして展開）

 @article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

@misc{yan2022lawin,
  title={Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention}, 
  author={Haotian Yan and Chuang Zhang and Ming Wu},
  year={2022},
  eprint={2201.01615},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yu2021metaformer,
  title={MetaFormer is Actually What You Need for Vision}, 
  author={Weihao Yu and Mi Luo and Pan Zhou and Chenyang Si and Yichen Zhou and Xinchao Wang and Jiashi Feng and Shuicheng Yan},
  year={2021},
  eprint={2111.11418},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wightman2021resnet,
  title={ResNet strikes back: An improved training procedure in timm}, 
  author={Ross Wightman and Hugo Touvron and Hervé Jégou},
  year={2021},
  eprint={2110.00476},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{liu2022convnet,
  title={A ConvNet for the 2020s}, 
  author={Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  year={2022},
  eprint={2201.03545},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{li2022uniformer,
  title={UniFormer: Unifying Convolution and Self-attention for Visual Recognition}, 
  author={Kunchang Li and Yali Wang and Junhao Zhang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
  year={2022},
  eprint={2201.09450},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

拡大する

追加情報