research-article

Open access

VisualCheXbert: addressing the discrepancy between radiology report labels and image labels

Authors:

Steven QH Truong,

Chanh DT Nguyen,

Minh-Thanh Huynh,

Victoria A. Young,

Matthew P. Lungren,

Pranav RajpurkarAuthors Info & Claims

CHIL '21: Proceedings of the Conference on Health, Inference, and Learning

Pages 105 - 115

https://doi.org/10.1145/3450439.3451862

Published: 08 April 2021 Publication History

Abstract

Automatic extraction of medical conditions from free-text radiology reports is critical for supervising computer vision models to interpret medical images. In this work, we show that radiologists labeling reports significantly disagree with radiologists labeling corresponding chest X-ray images, which reduces the quality of report labels as proxies for image labels. We develop and evaluate methods to produce labels from radiology reports that have better agreement with radiologists labeling images. Our best performing method, called VisualCheXbert, uses a biomedically-pretrained BERT model to directly map from a radiology report to the image labels, with a supervisory signal determined by a computer vision model trained to detect medical conditions from chest X-ray images. We find that VisualCheXbert outperforms an approach using an existing radiology report labeler by an average F1 score of 0.14 (95% CI 0.12, 0.17). We also find that VisualCheXbert better agrees with radiologists labeling chest X-ray images than do radiologists labeling the corresponding radiology reports by an average F1 score across several medical conditions of between 0.12 (95% CI 0.09, 0.15) and 0.21 (95% CI 0.18, 0.24).

References

[1]

Michael David Abràmoff, Yiyue Lou, Ali Erginay, Warren Clarida, Ryan Amelon, James C Folk, and Meindert Niemeijer. 2016. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. Invest Ophthalmol Vis Sci 57, 13 (Oct 2016), 5200--5206.

[2]

Adrian Brady, Risteárd Ó Laoide, Peter McCarthy, and Ronan McDermott. 2012. Discrepancy and error in radiology: concepts, causes and consequences. The Ulster medical journal 81, 1 (2012), 3.

[3]

Lindsay P Busby, Jesse L Courtier, and Christine M Glastonbury. 2018. Bias in radiology: the how and why of misses and misinterpretations. Radiographics 38, 1 (2018), 236--247.

[4]

Aurelia Bustos, Antonio Pertusa, Jose-Maria Salinas, and Maria de la Iglesia-Vayá. 2020. PadChest: A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis 66 (Dec 2020), 101797.

[5]

Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20, 1 (2021/01/13 1960), 37--46.

[6]

Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. 2016. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc 23, 2 (Mar 2016), 304--310.

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]

[8]

Jared Dunnmon, Alexander Ratner, Nishith Khandwala, Khaled Saab, Matthew Markert, Hersh Sagreiya, Roger Goldman, Christopher Lee-Messer, Matthew Lungren, Daniel Rubin, and Christopher Ré. 2019. Cross-Modal Data Programming Enables Rapid Medical Machine Learning. arXiv:1903.11101 [cs.LG]

[9]

Bradley Efron and Robert Tibshirani. 1986. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical science (1986), 54--75.

[10]

Esteban F Gershanik, Ronilda Lacson, and Ramin Khorasani. 2011. Critical finding capture in the impression section of radiology reports. In AMIA Annual Symposium Proceedings, Vol. 2011. American Medical Informatics Association, 465.

[11]

Varun Gulshan, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ramasamy Kim, Rajiv Raman, Philip C. Nelson, Jessica L. Mega, and Dale R. Webster. 2016. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 316, 22 (1/13/2021 2016), 2402--2410.

[12]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely Connected Convolutional Networks. arXiv:1608.06993 [cs.CV]

[13]

Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, and Andrew Y. Ng. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv:1901.07031 [cs.CV]

[14]

Alistair E. W. Johnson, Tom J. Pollard, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih ying Deng, Yifan Peng, Zhiyong Lu, Roger G. Mark, Seth J. Berkowitz, and Steven Horng. 2019. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv:1901.07042 [cs.CV]

[15]

Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]

[16]

Matthew B.A. McDermott, Tzu Ming Harry Hsu, Wei-Hung Weng, Marzyeh Ghassemi, and Peter Szolovits. 2020. CheXpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output. arXiv:2006.15229 [cs.LG]

[17]

Ha Q. Nguyen, Khanh Lam, Linh T. Le, Hieu H. Pham, Dat Q. Tran, Dung B. Nguyen, Dung D. Le, Chi M. Pham, Hang T. T. Tong, Diep H. Dinh, Cuong D. Do, Luu T. Doan, Cuong N. Nguyen, Binh T. Nguyen, Que V. Nguyen, Au D. Hoang, Hien N. Phan, Anh T. Nguyen, Phuong H. Ho, Dat T. Ngo, Nghia T. Nguyen, Nhan T. Nguyen, Minh Dao, and Van Vu. 2021. VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations. arXiv:2012.15029 [eess.IV]

[18]

Luke Oakden-Rayner. 2019. Exploring large scale public medical image datasets. arXiv:1907.12720 [eess.IV]

[19]

Tobi Olatunji, Li Yao, Ben Covington, Alexander Rhodes, and Anthony Upton. 2019. Caveats in Generating Medical Imaging Labels from Radiology Reports. arXiv:1905.02283 [cs.CL]

[20]

Yifan Peng, Xiaosong Wang, Le Lu, Mohammadhadi Bagheri, Ronald Summers, and Zhiyong Lu. 2017. NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. arXiv:1712.05898 [cs.CL]

[21]

Yifan Peng, Shankai Yan, and Zhiyong Lu. 2019. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. arXiv:1906.05474 [cs.CL]

[22]

Hieu H. Pham, Tung T. Le, Dat Q. Tran, Dat T. Ngo, and Ha Q. Nguyen. 2020. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels. arXiv:1911.06475 [eess.IV]

[23]

Nick A. Phillips, Pranav Rajpurkar, Mark Sabini, Rayan Krishnan, Sharon Zhou, Anuj Pareek, Nguyet Minh Phu, Chris Wang, Mudit Jain, Nguyen Duong Du, Steven QH Truong, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness. arXiv:2007.06199 [eess.IV]

[24]

Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Phil Chen, Amirhossein Kiani, Jeremy Irvin, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting. arXiv:2002.11379 [eess.IV]

[25]

Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Jeremy Irvin, Andrew Y. Ng, and Matthew Lungren. 2020. CheXphotogenic: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays. arXiv:2011.06129 [eess.IV]

[26]

Pranav Rajpurkar, Chloe O'Connell, Amit Schechter, Nishit Asnani, Jason Li, Amirhossein Kiani, Robyn L Ball, Marc Mendelson, Gary Maartens, Daniël J van Hoving, et al. 2020. CheXaid: deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with HIV. NPJ digital medicine 3, 1 (2020), 1--8.

[27]

George Shih, Carol C. Wu, Safwan S. Halabi, Marc D. Kohli, Luciano M. Prevedello, Tessa S. Cook, Arjun Sharma, Judith K. Amorosa, Veronica Arteaga, Maya Galperin-Aizenberg, Ritu R. Gill, Myrna C.B. Godoy, Stephen Hobbs, Jean Jeudy, Archana Laroia, Palmi N. Shah, Dharshan Vummidi, Kavitha Yaddanapudi, and Anouk Stein. 2019. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. Radiology: Artificial Intelligence 1, 1 (2019), e180041.

[28]

Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng, and Matthew P. Lungren. 2020. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT. arXiv:2004.09167 [cs.CL]

[29]

Siyi Tang, Amirata Ghorbani, Rikiya Yamashita, Sameer Rehman, Jared A. Dunnmon, James Zou, and Daniel L. Rubin. 2020. Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset. arXiv:2010.08006 [cs.LG]

[30]

Yu-Xing Tang, You-Bao Tang, Yifan Peng, Ke Yan, Mohammadhadi Bagheri, Bernadette A. Redd, Catherine J. Brandon, Zhiyong Lu, Mei Han, Jing Xiao, and Ronald M. Summers. 2020. Automated abnormality classification of chest radiographs using deep convolutional neural networks. npj Digital Medicine 3, 1 (2020), 70.

[31]

Linda Wang, Zhong Qiu Lin, and Alexander Wong. 2020. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Reports 10, 1 (2020), 19549.

[32]

Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2097--2106.

[33]

Wenwu Ye, Jin Yao, Hui Xue, and Yi Li. 2020. Weakly Supervised Lesion Localization With Probabilistic-CAM Pooling. arXiv:2005.14480 [cs.CV]

[34]

William J Youden. 1950. Index for rating diagnostic tests. Cancer 3, 1 (1950), 32--35.

Cited By

Anderson PTarder-Stoll HAlpaslan MKeathley NLevin DVenkatesh SBartel ESicular SHowell SLindsey RJones R(2024)Deep learning improves physician accuracy in the comprehensive detection of abnormalities on chest X-raysScientific Reports10.1038/s41598-024-76608-214:1Online publication date: 24-Oct-2024
https://doi.org/10.1038/s41598-024-76608-2
Yang YZhang HGichoya JKatabi DGhassemi M(2024)The limits of fair medical imaging AI in real-world generalizationNature Medicine10.1038/s41591-024-03113-430:10(2838-2848)Online publication date: 28-Jun-2024
https://doi.org/10.1038/s41591-024-03113-4
Kore AAbbasi Bavil ESubasri VAbdalla MFine BDolatabadi EAbdalla M(2024)Empirical data drift detection experiments on real-world medical imaging dataNature Communications10.1038/s41467-024-46142-w15:1Online publication date: 29-Feb-2024
https://doi.org/10.1038/s41467-024-46142-w
Show More Cited By

Index Terms

VisualCheXbert: addressing the discrepancy between radiology report labels and image labels
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Semi-supervised clinical text classification with Laplacian SVMs

Graphical abstractDisplay Omitted Semi-supervised learning can exploit the vast amounts of unlabeled data stored in EMRs.Semi-supervised Laplacian SVMs outperform supervised SVMs.These methods can be used to identify patients at risk for developing ...
Transfer Learning in Classifying Prescriptions and Keyword-based Medical Notes
iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

Medical text classification is one of the primary steps of health care automation. Diagnosing disease at the right time, and going to the right doctor is important for patients. To do that, two types of medical texts were classified into some medical ...
Research on Pseudo-label Technology for Multi-label News Classification
Document Analysis and Recognition – ICDAR 2021
Abstract
Multi-label news classification exerts a significant importance with the growing size of news containing multiple semantics. However, most of the existing multi-label classification methods rely on large-scale labeled corpus while publicly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHIL '21: Proceedings of the Conference on Health, Inference, and Learning

April 2021

309 pages

ISBN:9781450383592

DOI:10.1145/3450439

General Chair:
Marzyeh Ghassemi
University of Toronto and Vector Institute
,
Program Chairs:
Tristan Naumann
Microsoft Research Redmond
,
Emma Pierson
Stanford University and Microsoft Research New England

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ACM CHIL '21

Sponsor:

ACM

ACM CHIL '21: ACM Conference on Health, Inference, and Learning

April 8 - 10, 2021

Virtual Event, USA

Acceptance Rates

CHIL '21 Paper Acceptance Rate 27 of 110 submissions, 25%;

Overall Acceptance Rate 27 of 110 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
1,138
Total Downloads

Downloads (Last 12 months)355
Downloads (Last 6 weeks)46

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Anderson PTarder-Stoll HAlpaslan MKeathley NLevin DVenkatesh SBartel ESicular SHowell SLindsey RJones R(2024)Deep learning improves physician accuracy in the comprehensive detection of abnormalities on chest X-raysScientific Reports10.1038/s41598-024-76608-214:1Online publication date: 24-Oct-2024
https://doi.org/10.1038/s41598-024-76608-2
Yang YZhang HGichoya JKatabi DGhassemi M(2024)The limits of fair medical imaging AI in real-world generalizationNature Medicine10.1038/s41591-024-03113-430:10(2838-2848)Online publication date: 28-Jun-2024
https://doi.org/10.1038/s41591-024-03113-4
Kore AAbbasi Bavil ESubasri VAbdalla MFine BDolatabadi EAbdalla M(2024)Empirical data drift detection experiments on real-world medical imaging dataNature Communications10.1038/s41467-024-46142-w15:1Online publication date: 29-Feb-2024
https://doi.org/10.1038/s41467-024-46142-w
C. Pereira SMendonça ACampilho ASousa PTeixeira Lopes C(2024)Automated image label extraction from radiology reports — A reviewArtificial Intelligence in Medicine10.1016/j.artmed.2024.102814149(102814)Online publication date: Mar-2024
https://doi.org/10.1016/j.artmed.2024.102814
Liz HHuertas-Tato JSánchez-Montañés MDel Ser JCamacho D(2023)Deep learning for understanding multilabel imbalanced Chest X-ray datasetsFuture Generation Computer Systems10.1016/j.future.2023.03.005144(291-306)Online publication date: Jul-2023
https://doi.org/10.1016/j.future.2023.03.005
Nowak SSchneider HLayer YTheis MBiesner DBlock WWulff BAttenberger USifa RSprinkart A(2023)Development of image-based decision support systems utilizing information extracted from radiological free-text report databases with text-based transformersEuropean Radiology10.1007/s00330-023-10373-034:5(2895-2904)Online publication date: 7-Nov-2023
https://doi.org/10.1007/s00330-023-10373-0
Nguyen TVo TNguyen TPham HNguyen H(2022)Learning to diagnose common thorax diseases on chest radiographs from radiology reports in VietnamesePLOS ONE10.1371/journal.pone.027654517:10(e0276545)Online publication date: 31-Oct-2022
https://doi.org/10.1371/journal.pone.0276545
Gulamali FSawant AKovatch PGlicksberg BCharney ANadkarni GOermann E(2022)Autoencoders for sample size estimation for fully connected neural network classifiersnpj Digital Medicine10.1038/s41746-022-00728-05:1Online publication date: 13-Dec-2022
https://doi.org/10.1038/s41746-022-00728-0
Yu KGhosh SLiu ZDeible CBatmanghelich K(2022)Anatomy-Guided Weakly-Supervised Abnormality Localization in Chest X-raysMedical Image Computing and Computer Assisted Intervention – MICCAI 202210.1007/978-3-031-16443-9_63(658-668)Online publication date: 16-Sep-2022
https://doi.org/10.1007/978-3-031-16443-9_63
Nikparvar BThill J(2021)Machine Learning of Spatial DataISPRS International Journal of Geo-Information10.3390/ijgi1009060010:9(600)Online publication date: 12-Sep-2021
https://doi.org/10.3390/ijgi10090600
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten