research-article

Open access

CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings

Authors:

Pranav Rajpurkar,

Matthew P. LungrenAuthors Info & Claims

CHIL '21: Proceedings of the Conference on Health, Inference, and Learning

Pages 125 - 132

https://doi.org/10.1145/3450439.3451876

Published: 08 April 2021 Publication History

Abstract

Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to test datasets without further tuning. We found that (1) on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance, but only 3 performed significantly worse than radiologists on average, and (2) on the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists. Our results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not. Future work should investigate aspects of model training procedures and dataset collection that influence generalization in the presence of data distribution shifts.

References

[1]

Savvas Andronikou, Kieran McHugh, Nuraan Abdurahman, Bryan Khoury, Victor Mngomezulu, William E Brant, Ian Cowan, Mignon McCulloch, and Nathan Ford. 2011. Paediatric radiology seen from Africa. Part I: providing diagnostic imaging to a young population. Pediatric radiology 41, 7 (2011), 811--825.

[2]

David Chen, Sijia Liu, Paul Kingsbury, Sunghwan Sohn, Curtis B. Storlie, Elizabeth B. Habermann, James M. Naessens, David W. Larson, and Hongfang Liu. 2019. Deep learning and alternative learning strategies for retrospective real-world clinical data. npj Digital Medicine 2, 1 (Dec. 2019), 43.

[3]

Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1 (2020), 6.

[4]

Samuel Dodge and Lina Karam. 2017. A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions. In 2017 26th International Conference on Computer Communication and Networks (ICCCN). 1--7.

[5]

Tony Duan, Pranav Rajpurkar, Dillon Laird, Andrew Y. Ng, and Sanjay Basu. 2019. Clinical Value of Predicting Individual Treatment Effects for Intensive Blood Pressure Therapy: A Machine Learning Experiment to Estimate Treatment Effects from Randomized Trial Data. Circulation: Cardiovascular Quality and Outcomes 12, 3 (March 2019).

[6]

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. 2019. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv:1811.12231 [cs, q-bio, stat] (Jan. 2019).

[7]

Hans Goost, Johannes Witten, Andreas Heck, Dariusch R Hadizadeh, Oliver Weber, Ingo Gräff, Christof Burger, Mareen Montag, Felix Koerfer, and Koroush Kabir. 2012. Image and diagnosis quality of X-ray image transmission via cell phone camera: a project study evaluating quality and reliability. PLoS One 7, 10 (2012), e43402.

[8]

Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. arXiv:1903.12261 [cs, stat] (March 2019).

[9]

Shih-Cheng Huang, Tanay Kothari, Imon Banerjee, Chris Chute, Robyn L Ball, Norah Borus, Andrew Huang, Bhavik N Patel, Pranav Rajpurkar, Jeremy Irvin, et al. 2020. PENet---A scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. NPJ digital medicine 3, 1 (2020), 1--9.

[10]

Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, and Andrew Y. Ng. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 2019), 590--597.

[11]

Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Roger G. Mark, and Steven Horng. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data 6, 1 (Dec. 2019), 317.

[12]

K. Kallianos, J. Mongan, S. Antani, T. Henry, A. Taylor, J. Abuya, and M. Kohli. 2019. How far have we come? Artificial intelligence for chest radiograph interpretation. Clinical Radiology 74, 5 (May 2019), 338--345.

[13]

Satyananda Kashyap, Mehdi Moradi, Alexandros Karargyris, Joy T. Wu, Michael Morris, Babak Saboury, Eliot Siegel, and Tanveer Syeda-Mahmood. 2019. Artificial intelligence for point of care radiograph quality assessment. In Medical Imaging 2019: Computer-Aided Diagnosis, Vol. 10950. International Society for Optics and Photonics, 109503K.

[14]

Christopher J. Kelly, Alan Karthikesalingam, Mustafa Suleyman, Greg Corrado, and Dominic King. 2019. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine 17, 1 (Dec. 2019), 195.

[15]

Amirhossein Kiani, Bora Uyumazturk, Pranav Rajpurkar, Alex Wang, Rebecca Gao, Erik Jones, Yifan Yu, Curtis P. Langlotz, Robyn L. Ball, Thomas J. Montine, Brock A. Martin, Gerald J. Berry, Michael G. Ozawa, Florette K. Hazard, Ryanne A. Brown, Simon B. Chen, Mona Wood, Libby S. Allard, Lourdes Ylagan, Andrew Y. Ng, and Jeanne Shen. 2020. Impact of a deep learning assistant on the histopathologic classification of liver cancer. npj Digital Medicine 3, 1 (Dec. 2020), 23.

[16]

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. CoRR abs/1607.02533 (2016). _eprint: 1607.02533.

[17]

Paras Lakhani and Baskaran Sundaram. 2017. Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. Radiology 284, 2 (April 2017), 574--582.

[18]

Ju Gang Nam, Sunggyun Park, Eui Jin Hwang, Jong Hyuk Lee, Kwang-Nam Jin, Kun Young Lim, Thienkai Huy Vu, Jae Ho Sohn, Sangheum Hwang, Jin Mo Goo, and others. 2018. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290, 1 (2018), 218--228. Publisher: Radiological Society of North America.

[19]

Nick A. Phillips, Pranav Rajpurkar, Mark Sabini, Rayan Krishnan, Sharon Zhou, Anuj Pareek, Nguyet Minh Phu, Chris Wang, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXphoto: 10,000+ Smartphone Photos and Synthetic Photographic Transformations of Chest X-rays for Benchmarking Deep Learning Robustness. arXiv:2007.06199 [eess.IV]

[20]

Chunli Qin, Demin Yao, Yonghong Shi, and Zhijian Song. 2018. Computer-aided detection in chest radiography based on artificial intelligence: a survey. BioMedical Engineering OnLine 17, 1 (Aug. 2018), 113.

[21]

Zhi Zhen Qin, Melissa S. Sander, Bishwa Rai, Collins N. Titahong, Santat Sudrungrot, Sylvain N. Laah, Lal Mani Adhikari, E. Jane Carter, Lekha Puri, Andrew J. Codlin, and Jacob Creswell. 2019. Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems. Scientific Reports 9, 1 (Oct. 2019), 1--10.

[22]

Pranav Rajpurkar, Jeremy Irvin, Robyn L. Ball, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis P. Langlotz, Bhavik N. Patel, Kristen W. Yeom, Katie Shpanskaya, Francis G. Blankenberg, Jayne Seekins, Timothy J. Amrhein, David A. Mong, Safwan S. Halabi, Evan J. Zucker, Andrew Y. Ng, and Matthew P. Lungren. 2018. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLOS Medicine 15, 11 (Nov. 2018), e1002686.

[23]

Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Phil Chen, Amirhossein Kiani, Jeremy Irvin, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting. arXiv:2002.11379 [eess.IV]

[24]

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Alek-sander Madry. 2018. Adversarially Robust Generalization Requires More Data. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 5014--5026.

Digital Library

[25]

Adam B Schwartz, Gina S Siddiqui, John L Barbieri, Amana F Akhtar, Woojin K Kim, Ryan A Littman-Quinn, Emily S Conant, Narainder D Gupta, Bryan A Pukenas, Parvati H Ramchandani, and et al. 2014. The accuracy of mobile teleradiology in the evaluation of chest X-rays. Journal of Telemedicine and Telecare (Oct. 2014). Publisher: Journal of Telemedicine and Telecare.

[26]

George Shih, Carol C. Wu, Safwan S. Halabi, Marc D. Kohli, Luciano M. Prevedello, Tessa S. Cook, Arjun Sharma, Judith K. Amorosa, Veronica Arteaga, Maya Galperin-Aizenberg, Ritu R. Gill, Myrna C.B. Godoy, Stephen Hobbs, Jean Jeudy, Archana Laroia, Palmi N. Shah, Dharshan Vummidi, Kavitha Yaddanapudi, and Anouk Stein. 2019. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. Radiology: Artificial Intelligence 1, 1 (Jan. 2019), e180041.

[27]

Ramandeep Singh, Mannudeep K. Kalra, Chayanin Nitiwarangkul, John A. Patti, Fatemeh Homayounieh, Atul Padole, Pooja Rao, Preetham Putha, Victorine V. Muse, Amita Sharma, and Subba R. Digumarthy. 2018. Deep learning in chest radiography: Detection of findings and presence of change. PLoS ONE 13, 10 (Oct. 2018).

[28]

Hari Sowrirajan, Jingbo Yang, Andrew Y. Ng, and Pranav Rajpurkar. 2020. MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models. arXiv:2010.05352 [cs.CV]

[29]

Eric J. Topol. 2019. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine 25, 1 (Jan. 2019), 44--56.

[30]

Maya Varma, Mandy Lu, Rachel Gardner, Jared Dunnmon, Nishith Khandwala, Pranav Rajpurkar, Jin Long, Christopher Beaulieu, Katie Shpanskaya, Li Fei-Fei, Matthew P. Lungren, and Bhavik N. Patel. 2019. Automated abnormality detection in lower extremity radiographs using deep learning. Nature Machine Intelligence 1, 12 (Dec. 2019), 578--583.

[31]

DJ Vassallo, PJ Buxton, JH Kilbey, and M Trasler. 1998. The first telemedicine link for the British Forces. Journal of the Royal Army Medical Corps 144, 3 (1998), 125--130.

[32]

Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2097--2106.

[33]

John R. Zech, Marcus A. Badgeley, Manway Liu, Anthony B. Costa, Joseph J. Titano, and Eric Karl Oermann. 2018. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine 15, 11 (Nov. 2018), e1002683.

Cited By

Ke AEllsworth WBanerjee ONg ARajpurkar PGhassemi MNaumann TPierson E(2021)CheXtransferProceedings of the Conference on Health, Inference, and Learning10.1145/3450439.3451867(116-124)Online publication date: 8-Apr-2021
https://dl.acm.org/doi/10.1145/3450439.3451867

Index Terms

CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

Multi-Atlas Segmentation with Joint Label Fusion

Multi-atlas segmentation is an effective approach for automatically labeling objects of interest in biomedical images. In this approach, multiple expert-segmented example images, called atlases, are registered to a target image, and deformed atlas ...
Hand radiographs preprocessing, image representation in the finger regions and joint space width measurements for image interpretation

In this paper, the first stage of studies concerning the computer analysis of hand X-ray digital images is described. The images are preprocessed and then skeletization of the fingers is carried out. Then, the interphapangeal and metacarpophalangeal ...
Diffusion of digital radiology modalities in the Nordic countries and Japan

The Nordic countries have 23 million inhabitants. About 14 million radiology examinations are performed annually at hospitals and in primary health care. This represents about 600 examinations per 1000 inhabitants per year. Japan, on the other hand, has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHIL '21: Proceedings of the Conference on Health, Inference, and Learning

April 2021

309 pages

ISBN:9781450383592

DOI:10.1145/3450439

General Chair:
Marzyeh Ghassemi
University of Toronto and Vector Institute
,
Program Chairs:
Tristan Naumann
Microsoft Research Redmond
,
Emma Pierson
Stanford University and Microsoft Research New England

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ACM CHIL '21

Sponsor:

ACM

ACM CHIL '21: ACM Conference on Health, Inference, and Learning

April 8 - 10, 2021

Virtual Event, USA

Acceptance Rates

CHIL '21 Paper Acceptance Rate 27 of 110 submissions, 25%;

Overall Acceptance Rate 27 of 110 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
748
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)20

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ke AEllsworth WBanerjee ONg ARajpurkar PGhassemi MNaumann TPierson E(2021)CheXtransferProceedings of the Conference on Health, Inference, and Learning10.1145/3450439.3451867(116-124)Online publication date: 8-Apr-2021
https://dl.acm.org/doi/10.1145/3450439.3451867

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten