skip to main content
10.1145/3450439.3451876acmconferencesArticle/Chapter ViewAbstractPublication PageschilConference Proceedingsconference-collections
research-article
Open access

CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings

Published: 08 April 2021 Publication History

Abstract

Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to test datasets without further tuning. We found that (1) on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance, but only 3 performed significantly worse than radiologists on average, and (2) on the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists. Our results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not. Future work should investigate aspects of model training procedures and dataset collection that influence generalization in the presence of data distribution shifts.

References

[1]
Savvas Andronikou, Kieran McHugh, Nuraan Abdurahman, Bryan Khoury, Victor Mngomezulu, William E Brant, Ian Cowan, Mignon McCulloch, and Nathan Ford. 2011. Paediatric radiology seen from Africa. Part I: providing diagnostic imaging to a young population. Pediatric radiology 41, 7 (2011), 811--825.
[2]
David Chen, Sijia Liu, Paul Kingsbury, Sunghwan Sohn, Curtis B. Storlie, Elizabeth B. Habermann, James M. Naessens, David W. Larson, and Hongfang Liu. 2019. Deep learning and alternative learning strategies for retrospective real-world clinical data. npj Digital Medicine 2, 1 (Dec. 2019), 43.
[3]
Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1 (2020), 6.
[4]
Samuel Dodge and Lina Karam. 2017. A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions. In 2017 26th International Conference on Computer Communication and Networks (ICCCN). 1--7.
[5]
Tony Duan, Pranav Rajpurkar, Dillon Laird, Andrew Y. Ng, and Sanjay Basu. 2019. Clinical Value of Predicting Individual Treatment Effects for Intensive Blood Pressure Therapy: A Machine Learning Experiment to Estimate Treatment Effects from Randomized Trial Data. Circulation: Cardiovascular Quality and Outcomes 12, 3 (March 2019).
[6]
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. 2019. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv:1811.12231 [cs, q-bio, stat] (Jan. 2019).
[7]
Hans Goost, Johannes Witten, Andreas Heck, Dariusch R Hadizadeh, Oliver Weber, Ingo Gräff, Christof Burger, Mareen Montag, Felix Koerfer, and Koroush Kabir. 2012. Image and diagnosis quality of X-ray image transmission via cell phone camera: a project study evaluating quality and reliability. PLoS One 7, 10 (2012), e43402.
[8]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. arXiv:1903.12261 [cs, stat] (March 2019).
[9]
Shih-Cheng Huang, Tanay Kothari, Imon Banerjee, Chris Chute, Robyn L Ball, Norah Borus, Andrew Huang, Bhavik N Patel, Pranav Rajpurkar, Jeremy Irvin, et al. 2020. PENet---A scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. NPJ digital medicine 3, 1 (2020), 1--9.
[10]
Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, and Andrew Y. Ng. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 2019), 590--597.
[11]
Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Roger G. Mark, and Steven Horng. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data 6, 1 (Dec. 2019), 317.
[12]
K. Kallianos, J. Mongan, S. Antani, T. Henry, A. Taylor, J. Abuya, and M. Kohli. 2019. How far have we come? Artificial intelligence for chest radiograph interpretation. Clinical Radiology 74, 5 (May 2019), 338--345.
[13]
Satyananda Kashyap, Mehdi Moradi, Alexandros Karargyris, Joy T. Wu, Michael Morris, Babak Saboury, Eliot Siegel, and Tanveer Syeda-Mahmood. 2019. Artificial intelligence for point of care radiograph quality assessment. In Medical Imaging 2019: Computer-Aided Diagnosis, Vol. 10950. International Society for Optics and Photonics, 109503K.
[14]
Christopher J. Kelly, Alan Karthikesalingam, Mustafa Suleyman, Greg Corrado, and Dominic King. 2019. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine 17, 1 (Dec. 2019), 195.
[15]
Amirhossein Kiani, Bora Uyumazturk, Pranav Rajpurkar, Alex Wang, Rebecca Gao, Erik Jones, Yifan Yu, Curtis P. Langlotz, Robyn L. Ball, Thomas J. Montine, Brock A. Martin, Gerald J. Berry, Michael G. Ozawa, Florette K. Hazard, Ryanne A. Brown, Simon B. Chen, Mona Wood, Libby S. Allard, Lourdes Ylagan, Andrew Y. Ng, and Jeanne Shen. 2020. Impact of a deep learning assistant on the histopathologic classification of liver cancer. npj Digital Medicine 3, 1 (Dec. 2020), 23.
[16]
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. CoRR abs/1607.02533 (2016). _eprint: 1607.02533.
[17]
Paras Lakhani and Baskaran Sundaram. 2017. Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. Radiology 284, 2 (April 2017), 574--582.
[18]
Ju Gang Nam, Sunggyun Park, Eui Jin Hwang, Jong Hyuk Lee, Kwang-Nam Jin, Kun Young Lim, Thienkai Huy Vu, Jae Ho Sohn, Sangheum Hwang, Jin Mo Goo, and others. 2018. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290, 1 (2018), 218--228. Publisher: Radiological Society of North America.
[19]
Nick A. Phillips, Pranav Rajpurkar, Mark Sabini, Rayan Krishnan, Sharon Zhou, Anuj Pareek, Nguyet Minh Phu, Chris Wang, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXphoto: 10,000+ Smartphone Photos and Synthetic Photographic Transformations of Chest X-rays for Benchmarking Deep Learning Robustness. arXiv:2007.06199 [eess.IV]
[20]
Chunli Qin, Demin Yao, Yonghong Shi, and Zhijian Song. 2018. Computer-aided detection in chest radiography based on artificial intelligence: a survey. BioMedical Engineering OnLine 17, 1 (Aug. 2018), 113.
[21]
Zhi Zhen Qin, Melissa S. Sander, Bishwa Rai, Collins N. Titahong, Santat Sudrungrot, Sylvain N. Laah, Lal Mani Adhikari, E. Jane Carter, Lekha Puri, Andrew J. Codlin, and Jacob Creswell. 2019. Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems. Scientific Reports 9, 1 (Oct. 2019), 1--10.
[22]
Pranav Rajpurkar, Jeremy Irvin, Robyn L. Ball, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis P. Langlotz, Bhavik N. Patel, Kristen W. Yeom, Katie Shpanskaya, Francis G. Blankenberg, Jayne Seekins, Timothy J. Amrhein, David A. Mong, Safwan S. Halabi, Evan J. Zucker, Andrew Y. Ng, and Matthew P. Lungren. 2018. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLOS Medicine 15, 11 (Nov. 2018), e1002686.
[23]
Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Phil Chen, Amirhossein Kiani, Jeremy Irvin, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting. arXiv:2002.11379 [eess.IV]
[24]
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Alek-sander Madry. 2018. Adversarially Robust Generalization Requires More Data. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 5014--5026.
[25]
Adam B Schwartz, Gina S Siddiqui, John L Barbieri, Amana F Akhtar, Woojin K Kim, Ryan A Littman-Quinn, Emily S Conant, Narainder D Gupta, Bryan A Pukenas, Parvati H Ramchandani, and et al. 2014. The accuracy of mobile teleradiology in the evaluation of chest X-rays. Journal of Telemedicine and Telecare (Oct. 2014). Publisher: Journal of Telemedicine and Telecare.
[26]
George Shih, Carol C. Wu, Safwan S. Halabi, Marc D. Kohli, Luciano M. Prevedello, Tessa S. Cook, Arjun Sharma, Judith K. Amorosa, Veronica Arteaga, Maya Galperin-Aizenberg, Ritu R. Gill, Myrna C.B. Godoy, Stephen Hobbs, Jean Jeudy, Archana Laroia, Palmi N. Shah, Dharshan Vummidi, Kavitha Yaddanapudi, and Anouk Stein. 2019. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. Radiology: Artificial Intelligence 1, 1 (Jan. 2019), e180041.
[27]
Ramandeep Singh, Mannudeep K. Kalra, Chayanin Nitiwarangkul, John A. Patti, Fatemeh Homayounieh, Atul Padole, Pooja Rao, Preetham Putha, Victorine V. Muse, Amita Sharma, and Subba R. Digumarthy. 2018. Deep learning in chest radiography: Detection of findings and presence of change. PLoS ONE 13, 10 (Oct. 2018).
[28]
Hari Sowrirajan, Jingbo Yang, Andrew Y. Ng, and Pranav Rajpurkar. 2020. MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models. arXiv:2010.05352 [cs.CV]
[29]
Eric J. Topol. 2019. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine 25, 1 (Jan. 2019), 44--56.
[30]
Maya Varma, Mandy Lu, Rachel Gardner, Jared Dunnmon, Nishith Khandwala, Pranav Rajpurkar, Jin Long, Christopher Beaulieu, Katie Shpanskaya, Li Fei-Fei, Matthew P. Lungren, and Bhavik N. Patel. 2019. Automated abnormality detection in lower extremity radiographs using deep learning. Nature Machine Intelligence 1, 12 (Dec. 2019), 578--583.
[31]
DJ Vassallo, PJ Buxton, JH Kilbey, and M Trasler. 1998. The first telemedicine link for the British Forces. Journal of the Royal Army Medical Corps 144, 3 (1998), 125--130.
[32]
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2097--2106.
[33]
John R. Zech, Marcus A. Badgeley, Manway Liu, Anthony B. Costa, Joseph J. Titano, and Eric Karl Oermann. 2018. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine 15, 11 (Nov. 2018), e1002683.

Cited By

View all

Index Terms

  1. CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHIL '21: Proceedings of the Conference on Health, Inference, and Learning
      April 2021
      309 pages
      ISBN:9781450383592
      DOI:10.1145/3450439
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 April 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. chest X-ray interpretation
      2. clinical deployment
      3. distribution shifts
      4. generalizability
      5. radiology

      Qualifiers

      • Research-article

      Conference

      ACM CHIL '21
      Sponsor:

      Acceptance Rates

      CHIL '21 Paper Acceptance Rate 27 of 110 submissions, 25%;
      Overall Acceptance Rate 27 of 110 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)159
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 26 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media