skip to main content
10.1145/3580305.3599566acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
abstract
Public Access

Mining Electronic Health Records for Real-World Evidence

Published: 04 August 2023 Publication History

Abstract

The rapid accumulation of large-scale Electronic Health Records (EHR) presents considerable opportunities to generate real-world evidence to inform clinical decision-making and accelerate drug development. However, the complexity of EHR has turned them into a formidable testing ground for cutting-edge AI algorithms. Furthermore, a significant gap still exists between algorithm development in the computer science community and clinical translation within the healthcare community. This tutorial aims to bridge this divide by fostering mutual understanding between the two communities by discussing using advanced machine learning and data mining technologies tailored to tackle real-world healthcare challenges, including 1) using EHR and trial emulation for understanding Long Covid and drug repurposing for Alzheimer's disease, and 2) risk prediction and associated fairness, interpretability, generalizability, etc., issues. We will conclude this tutorial by delving into potential opportunities for future research and unveiling the prospects of a career as a health data scientist.

References

[1]
Bing Bai, Jian Liang, Guanhua Zhang, Hao Li, Kun Bai, and Fei Wang. 2021. Why attentions may not be interpretable?. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 25--34.
[2]
Office of the Commissioner. 2023. Real-World Evidence. https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence Publisher: FDA.
[3]
John Concato and Jacqueline Corrigan-Curay. 2022. Real-world evidence-where are we now? The New England journal of medicine, Vol. 386, 18 (2022), 1680--1682.
[4]
Sen Cui, Weishen Pan, Jian Liang, Changshui Zhang, and Fei Wang. 2021a. Addressing algorithmic disparity and performance inconsistency in federated learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 26091--26102.
[5]
Sen Cui, Weishen Pan, Changshui Zhang, and Fei Wang. 2021b. Towards model-agnostic post-hoc adjustment for balancing ranking fairness and algorithm utility. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 207--217.
[6]
Dhruv Khullar, Yongkang Zhang, Chengxi Zang, Zhenxing Xu, Fei Wang, Mark G Weiner, Thomas W Carton, Russell L Rothman, Jason P Block, and Rainu Kaushal. 2023. Racial/Ethnic Disparities in Post-acute Sequelae of SARS-CoV-2 Infection in New York: an EHR-Based Cohort Study from the RECOVER Program. Journal of General Internal Medicine, Vol. 38, 5 (2023), 1127--1136.
[7]
Weishen Pan, Sen Cui, Jiang Bian, Changshui Zhang, and Fei Wang. 2021. Explaining algorithmic fairness through fairness-aware causal path decomposition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1287--1297.
[8]
Chang Su, Robert Aseltine, Riddhi Doshi, Kun Chen, Steven C Rogers, and Fei Wang. 2020. Machine learning for suicide risk prediction in children and adolescents with electronic health records. Translational psychiatry (2020), 413.
[9]
Jay K Varma, Chengxi Zang, Thomas W Carton, Jason P Block, Dhruv J Khullar, Yongkang Zhang, Mark G Weiner, Russell L Rothman, Edward J Schenck, Zhenxing Xu, et al. 2023. Excess burden of respiratory and abdominal conditions following COVID-19 infections during the ancestral and Delta variant periods in the United States: An EHR-based cohort study from the RECOVER Program. medRxiv (2023), 2023--02.
[10]
Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2020. Should health care demand interpretable artificial intelligence or accept ?black box" medicine?, 59--60 pages.
[11]
Tingyi Wanyan, Hossein Honarvar, Suraj K Jaladanki, Chengxi Zang, Nidhi Naik, Sulaiman Somani, Jessica K De Freitas, Ishan Paranjpe, Akhil Vaid, Jing Zhang, et al. 2021. Contrastive learning improves critical event prediction in COVID-19 patients. Patterns, Vol. 2, 12 (2021), 100389.
[12]
Jie Xu, Fei Wang, Chengxi Zang, Hao Zhang, Kellyann Niotis, Ava L Liberman, Cynthia M Stonnington, Makoto Ishii, Prakash Adekkanattu, Yuan Luo, et al. 2023. Comparing the effects of four common drug classes on the progression of mild cognitive impairment to dementia using electronic health records. Scientific Reports, Vol. 13, 1 (2023), 8102.
[13]
He S Yang, Yu Hou, Ljiljana V Vasovic, Peter AD Steel, Amy Chadburn, Sabrina E Racine-Brzostek, Priya Velu, Melissa M Cushing, Massimo Loda, Rainu Kaushal, et al. 2020. Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning. Clinical chemistry, Vol. 66, 11 (2020), 1396--1404.
[14]
He S Yang, Daniel D Rhoads, Jorge Sepulveda, Chengxi Zang, Amy Chadburn, and Fei Wang. 2022. Building the Model Challenges and Considerations of Developing and Implementing Machine Learning Tools for Clinical Laboratory Medicine Practice. Archives of Pathology & Laboratory Medicine (2022).
[15]
Chengxi Zang, Marianne Goodman, Zheng Zhu, Lulu Yang, Ziwei Yin, Zsuzsanna Tamas, Vikas Mohan Sharma, Fei Wang, and Nan Shao. 2022a. Development of a screening algorithm for borderline personality disorder using electronic health records. Scientific Reports, Vol. 12, 1 (2022), 1--12.
[16]
Chengxi Zang, Yu Hou, Edward Schenck, Zhenxing Xu, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Dhruv Khullar, Anna Nordvig, et al. 2023 a. Risk Factors and Predictive Modeling for Post-Acute Sequelae of SARS-CoV-2 Infection: Findings from EHR Cohorts of the RECOVER Initiative. Research Square (2023), rs-3.
[17]
Chengxi Zang and Fei Wang. 2021. SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 857--866.
[18]
Chengxi Zang, Hao Zhang, Jie Xu, Hansi Zhang, Sajjad Fouladvand, Shreyas Havaldar, Feixiong Cheng, Kun Chen, Yong Chen, Benjamin S Glicksberg, et al. 2022b. High-throughput clinical trial emulation with real world data and machine learning: a case study of drug repurposing for Alzheimer's disease. medRxiv (2022), 2022-01.
[19]
Chengxi Zang, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Edward J Schenck, Dhruv Khullar, Anna S Nordvig, Elizabeth A Shenkman, Russell L Rothman, et al. 2023 b. Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative. Nature Communications, Vol. 14, 1 (2023), 1948.
[20]
Hao Zhang, Chengxi Zang, Zhenxing Xu, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Dhruv Khullar, Yiye Zhang, Anna S Nordvig, et al. 2023 b. Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes. Nature Medicine, Vol. 29, 1 (2023), 226--235.
[21]
Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, and Fei Wang. 2019. Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2487--2495.
[22]
Yongkang Zhang, Hui Hu, Vasilios Fokaidis, Jie Xu, Chengxi Zang, Zhenxing Xu, Fei Wang, Michael Koropsak, Jiang Bian, Jaclyn Hall, et al. 2023 a. Identifying environmental risk factors for post-acute sequelae of SARS-CoV-2 infection: An EHR-based cohort study from the recover program. Environmental Advances, Vol. 11 (2023), 100352.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Check for updates

Author Tags

  1. causal inference
  2. electronic health records
  3. healthcare
  4. predictive modeling
  5. real-world data
  6. real-world evidence
  7. trial emulation

Qualifiers

  • Abstract

Funding Sources

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 313
    Total Downloads
  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)21
Reflects downloads up to 03 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media