torchxrayvision下載 - torchxrayvision源代碼下載

現在在線紙！ https://arxiv.org/abs/2111.00595

現在在線文檔！ https://mlmed.org/torchxrayvision/

Torchxrayvision

	（？促銷視頻））

這是什麼？

用於胸部X射線數據集和型號的庫。包括預訓練的模型。

TorchxRayVision是一個開源軟件庫，用於使用胸部X射線數據集和深度學習模型。它為一組廣泛的公共胸部X射線數據集提供了一個常見的接口和常見的預處理鏈。此外，可以通過庫作為基准或特徵提取器提供許多具有不同架構的分類和具有不同體系結構的表示模型。

對於研究人員解決臨床問題，他們從頭開始訓練模型是浪費時間。為了解決這個問題，TorchxRayVision提供了預先訓練的模型，這些模型經過大量數據的培訓，並啟用了1）對大數據集的快速分析2）功能重複使用以進行幾次學習。
對於研究人員開發算法，重要的是使用多個外部數據集對模型進行魯棒性評估。與每個數據集關聯的元數據可能會差異很大，這使得很難將方法應用於多個數據集。 TorchxRayVision以統一的方式提供對許多數據集的訪問，以便可以用一行代碼將其換成。這些數據集也可以合併並過濾，以構建特定的分佈轉移以研究概括。

Twitter：@torchxrayvision

入門

 $ pip install torchxrayvision

 import torchxrayvision as xrv
import skimage , torch , torchvision

# Prepare the image:
img = skimage . io . imread ( "16747_3_1.jpg" )
img = xrv . datasets . normalize ( img , 255 ) # convert 8-bit image to [-1024, 1024] range
img = img . mean ( 2 )[ None , ...] # Make single color channel

transform = torchvision . transforms . Compose ([ xrv . datasets . XRayCenterCrop (), xrv . datasets . XRayResizer ( 224 )])

img = transform ( img )
img = torch . from_numpy ( img )

# Load model and process image
model = xrv . models . DenseNet ( weights = "densenet121-res224-all" )
outputs = model ( img [ None ,...]) # or model.features(img[None,...]) 

# Print results
dict ( zip ( model . pathologies , outputs [ 0 ]. detach (). numpy ()))

{ 'Atelectasis' : 0.32797316 ,
 'Consolidation' : 0.42933336 ,
 'Infiltration' : 0.5316924 ,
 'Pneumothorax' : 0.28849724 ,
 'Edema' : 0.024142697 ,
 'Emphysema' : 0.5011832 ,
 'Fibrosis' : 0.51887786 ,
 'Effusion' : 0.27805611 ,
 'Pneumonia' : 0.18569896 ,
 'Pleural_Thickening' : 0.24489835 ,
 'Cardiomegaly' : 0.3645515 ,
 'Nodule' : 0.68982 ,
 'Mass' : 0.6392845 ,
 'Hernia' : 0.00993878 ,
 'Lung Lesion' : 0.011150705 ,
 'Fracture' : 0.51916164 ,
 'Lung Opacity' : 0.59073937 ,
 'Enlarged Cardiomediastinum' : 0.27218717 }

示例腳本以處理圖像使用概述的模型是process_image.py

 $ python3 process_image.py ../tests/00000001_000.png
{'preds': {'Atelectasis': 0.50500506,
           'Cardiomegaly': 0.6600903,
           'Consolidation': 0.30575264,
           'Edema': 0.274184,
           'Effusion': 0.4026162,
           'Emphysema': 0.5036339,
           'Enlarged Cardiomediastinum': 0.40989172,
           'Fibrosis': 0.53293407,
           'Fracture': 0.32376793,
           'Hernia': 0.011924741,
           'Infiltration': 0.5154413,
           'Lung Lesion': 0.22231922,
           'Lung Opacity': 0.2772148,
           'Mass': 0.32237658,
           'Nodule': 0.5091847,
           'Pleural_Thickening': 0.5102617,
           'Pneumonia': 0.30947986,
           'Pneumothorax': 0.24847917}}

模型（演示筆記本）

指定預驗證模型的權重（當前全部densenet121）注意：每個驗證的模型都有18個輸出。 all模型都有訓練的每個輸出。但是，對於其他權重，某些目標沒有受過訓練，並且會預測它們在訓練數據集中不存在。唯一有效的輸出在{dataset}.pathologies 。

 ## 224x224 models
model = xrv . models . DenseNet ( weights = "densenet121-res224-all" )
model = xrv . models . DenseNet ( weights = "densenet121-res224-rsna" ) # RSNA Pneumonia Challenge
model = xrv . models . DenseNet ( weights = "densenet121-res224-nih" ) # NIH chest X-ray8
model = xrv . models . DenseNet ( weights = "densenet121-res224-pc" ) # PadChest (University of Alicante)
model = xrv . models . DenseNet ( weights = "densenet121-res224-chex" ) # CheXpert (Stanford)
model = xrv . models . DenseNet ( weights = "densenet121-res224-mimic_nb" ) # MIMIC-CXR (MIT)
model = xrv . models . DenseNet ( weights = "densenet121-res224-mimic_ch" ) # MIMIC-CXR (MIT)

# 512x512 models
model = xrv . models . ResNet ( weights = "resnet50-res512-all" )

# DenseNet121 from JF Healthcare for the CheXpert competition
model = xrv . baseline_models . jfhealthcare . DenseNet () 

# Official Stanford CheXpert model
model = xrv . baseline_models . chexpert . DenseNet ( weights_zip = "chexpert_weights.zip" )

# Emory HITI lab race prediction model
model = xrv . baseline_models . emory_hiti . RaceModel ()
model . targets - > [ "Asian" , "Black" , "White" ]

# Riken age prediction model
model = xrv . baseline_models . riken . AgeModel ()

模式的基准在這裡：Benchmarks.md和某些模型的性能可以在本文arxiv.org/abs/2002.02497中看到。

自動編碼器

您還可以加載預先訓練的自動編碼器，該自動編碼器在Padchest，NIH，Chexpert和Mimic數據集上進行了訓練。

 ae = xrv . autoencoders . ResNetAE ( weights = "101-elastic" )
z = ae . encode ( image )
image2 = ae . decode ( z )

分割

您可以加載預貼的解剖分割模型。演示筆記本

 seg_model = xrv . baseline_models . chestx_det . PSPNet ()
output = seg_model ( image )
output . shape # [1, 14, 512, 512]
seg_model . targets # ['Left Clavicle', 'Right Clavicle', 'Left Scapula', 'Right Scapula',
                  #  'Left Lung', 'Right Lung', 'Left Hilus Pulmonis', 'Right Hilus Pulmonis',
                  #  'Heart', 'Aorta', 'Facies Diaphragmatica', 'Mediastinum',  'Weasand', 'Spine']

數據集

在每個數據集和演示筆記本上查看docstrings，以獲取更多詳細信息，並示例加載腳本

 transform = torchvision . transforms . Compose ([ xrv . datasets . XRayCenterCrop (),
                                            xrv . datasets . XRayResizer ( 224 )])

# RSNA Pneumonia Detection Challenge. https://pubs.rsna.org/doi/full/10.1148/ryai.2019180041
d_kaggle = xrv . datasets . RSNA_Pneumonia_Dataset ( imgpath = "path to stage_2_train_images_jpg" ,
                                       transform = transform )
                
# CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. https://arxiv.org/abs/1901.07031             
d_chex = xrv . datasets . CheX_Dataset ( imgpath = "path to CheXpert-v1.0-small" ,
                                   csvpath = "path to CheXpert-v1.0-small/train.csv" ,
                                   transform = transform )

# National Institutes of Health ChestX-ray8 dataset. https://arxiv.org/abs/1705.02315
d_nih = xrv . datasets . NIH_Dataset ( imgpath = "path to NIH images" )

# A relabelling of a subset of NIH images from: https://pubs.rsna.org/doi/10.1148/radiol.2019191293
d_nih2 = xrv . datasets . NIH_Google_Dataset ( imgpath = "path to NIH images" )

# PadChest: A large chest x-ray image dataset with multi-label annotated reports. https://arxiv.org/abs/1901.07441
d_pc = xrv . datasets . PC_Dataset ( imgpath = "path to image folder" )

# COVID-19 Image Data Collection. https://arxiv.org/abs/2006.11988
d_covid19 = xrv . datasets . COVID19_Dataset () # specify imgpath and csvpath for the dataset

# SIIM Pneumothorax Dataset. https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation
d_siim = xrv . datasets . SIIM_Pneumothorax_Dataset ( imgpath = "dicom-images-train/" ,
                                                csvpath = "train-rle.csv" )

# VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations. https://arxiv.org/abs/2012.15029
d_vin = xrv . datasets . VinBrain_Dataset ( imgpath = ".../train" ,
                                      csvpath = ".../train.csv" )

# National Library of Medicine Tuberculosis Datasets. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256233/
d_nlmtb = xrv . datasets . NLMTB_Dataset ( imgpath = "path to MontgomerySet or ChinaSet_AllFiles" )

數據集字段

每個數據集都包含許多字段。使用xrv.datasets.subset_dataset和xrv.datasets.merge_dataset時，將保持這些字段。

.pathologies .labels
.labels該字段包含1,0或NAN .pathologies NAN。
.csv此字段是數據隨附的元數據CSV文件的熊貓數據框架。每行都與數據集的元素對齊，因此使用.iloc索引將起作用。

如果可能的話，每個數據集的.csv將具有CSV的一些常見字段。當列表如下時，這些將被對齊：

csv.patientid一個獨特的ID，該ID將在此數據集中單一識別樣本
csv.offset_day_int在幾天內圖像的整數偏移。這預計將是相對時間，並且沒有絕對含義，儘管對於某些數據集，這是時期的時間。
csv.age_years將患者的年齡多年。
csv.sex_male如果患者是男性
csv.sex_female如果患者是女性

數據集工具

relabel_dataset將使標籤具有與病理參數相同的順序。

 xrv . datasets . relabel_dataset ( xrv . datasets . default_pathologies , d_nih ) # has side effects

指定視圖子集（演示筆記本）

 d_kaggle = xrv . datasets . RSNA_Pneumonia_Dataset ( imgpath = "..." ,
                                               views = [ "PA" , "AP" , "AP Supine" ])

每個患者僅指定1張圖像

 d_kaggle = xrv . datasets . RSNA_Pneumonia_Dataset ( imgpath = "..." ,
                                               unique_patients = True )

每個數據集獲取摘要統計信息

 d_chex = xrv . datasets . CheX_Dataset ( imgpath = "CheXpert-v1.0-small" ,
                                   csvpath = "CheXpert-v1.0-small/train.csv" ,
                                 views = [ "PA" , "AP" ], unique_patients = False )

CheX_Dataset num_samples = 191010 views = [ 'PA' , 'AP' ]
{ 'Atelectasis' : { 0.0 : 17621 , 1.0 : 29718 },
 'Cardiomegaly' : { 0.0 : 22645 , 1.0 : 23384 },
 'Consolidation' : { 0.0 : 30463 , 1.0 : 12982 },
 'Edema' : { 0.0 : 29449 , 1.0 : 49674 },
 'Effusion' : { 0.0 : 34376 , 1.0 : 76894 },
 'Enlarged Cardiomediastinum' : { 0.0 : 26527 , 1.0 : 9186 },
 'Fracture' : { 0.0 : 18111 , 1.0 : 7434 },
 'Lung Lesion' : { 0.0 : 17523 , 1.0 : 7040 },
 'Lung Opacity' : { 0.0 : 20165 , 1.0 : 94207 },
 'Pleural Other' : { 0.0 : 17166 , 1.0 : 2503 },
 'Pneumonia' : { 0.0 : 18105 , 1.0 : 4674 },
 'Pneumothorax' : { 0.0 : 54165 , 1.0 : 17693 },
 'Support Devices' : { 0.0 : 21757 , 1.0 : 99747 }}

病理口罩（演示筆記本）

以下數據集可用面具：

 xrv . datasets . RSNA_Pneumonia_Dataset () # for Lung Opacity
xrv . datasets . SIIM_Pneumothorax_Dataset () # for Pneumothorax
xrv . datasets . NIH_Dataset () # for Cardiomegaly, Mass, Effusion, ...

示例用法：

 d_rsna = xrv . datasets . RSNA_Pneumonia_Dataset ( imgpath = "stage_2_train_images_jpg" , 
                                            views = [ "PA" , "AP" ],
                                            pathology_masks = True )
                                            
# The has_masks column will let you know if any masks exist for that sample
d_rsna . csv . has_masks . value_counts ()
False    20672
True      6012       

# Each sample will have a pathology_masks dictionary where the index 
# of each pathology will correspond to a mask of that pathology (if it exists).
# There may be more than one mask per sample. But only one per pathology.
sample [ "pathology_masks" ][ d_rsna . pathologies . index ( "Lung Opacity" )]

如果將data_aug=data_transforms傳遞到數據載體，它還可以與data_augmentation一起使用。將隨機種子匹配到對圖像和麵具的調用。

分配換檔工具（演示筆記本）

類xrv.datasets.CovariateDataset取兩個數據集和兩個代表標籤的數組。樣品將以每個站點的圖像比率返回。這裡的目的是模擬協變量轉移，以將模型集中在不正確的功能上。然後，可以在驗證數據中逆轉偏移，從而導致概括性能中的災難性故障。

比率= 0.0表示D1的圖像將具有正標籤比率= 0.5表示D1的圖像將具有正面標籤比的一半= 1.0表示D1的圖像將沒有正面標籤

在任何比率的情況下，返回的樣本數將相同。

 d = xrv . datasets . CovariateDataset ( d1 = # dataset1 with a specific condition
                                  d1_target = #target label to predict,
                                  d2 = # dataset2 with a specific condition
                                  d2_target = #target label to predict,
                                  mode = "train" , # train, valid, and test
                                  ratio = 0.9 )

引用

主TorchxrayVision論文：https：//arxiv.org/abs/2111.00595

 Joseph Paul Cohen, Joseph D. Viviano, Paul Bertin, Paul Morrison, Parsa Torabian, Matteo Guarrera, Matthew P Lungren, Akshay Chaudhari, Rupert Brooks, Mohammad Hashir, Hadrien Bertrand
TorchXRayVision: A library of chest X-ray datasets and models. 
Medical Imaging with Deep Learning
https://github.com/mlmed/torchxrayvision, 2020


@inproceedings{Cohen2022xrv,
title = {{TorchXRayVision: A library of chest X-ray datasets and models}},
author = {Cohen, Joseph Paul and Viviano, Joseph D. and Bertin, Paul and Morrison, Paul and Torabian, Parsa and Guarrera, Matteo and Lungren, Matthew P and Chaudhari, Akshay and Brooks, Rupert and Hashir, Mohammad and Bertrand, Hadrien},
booktitle = {Medical Imaging with Deep Learning},
url = {https://github.com/mlmed/torchxrayvision},
arxivId = {2111.00595},
year = {2022}
}

以及啟動圖書館開發的本文：https：//arxiv.org/abs/2002.02497

 Joseph Paul Cohen and Mohammad Hashir and Rupert Brooks and Hadrien Bertrand
On the limits of cross-domain generalization in automated X-ray prediction. 
Medical Imaging with Deep Learning 2020 (Online: https://arxiv.org/abs/2002.02497)

@inproceedings{cohen2020limits,
  title={On the limits of cross-domain generalization in automated X-ray prediction},
  author={Cohen, Joseph Paul and Hashir, Mohammad and Brooks, Rupert and Bertrand, Hadrien},
  booktitle={Medical Imaging with Deep Learning},
  year={2020},
  url={https://arxiv.org/abs/2002.02497}
}