skip to main content
survey
Public Access

Mining Electronic Health Records (EHRs): A Survey

Published: 03 January 2018 Publication History

Abstract

The continuously increasing cost of the US healthcare system has received significant attention. Central to the ideas aimed at curbing this trend is the use of technology in the form of the mandate to implement electronic health records (EHRs). EHRs consist of patient information such as demographics, medications, laboratory test results, diagnosis codes, and procedures. Mining EHRs could lead to improvement in patient health management as EHRs contain detailed information related to disease prognosis for large patient populations. In this article, we provide a structured and comprehensive overview of data mining techniques for modeling EHRs. We first provide a detailed understanding of the major application areas to which EHR mining has been applied and then discuss the nature of EHR data and its accompanying challenges. Next, we describe major approaches used for EHR mining, the metrics associated with EHRs, and the various study designs. With this foundation, we then provide a systematic and methodological organization of existing data mining techniques used to model EHRs and discuss ideas for future research.

Supplementary Material

a85-yadav-supp.pdf (yadav.zip)
Supplemental movie, appendix, image and software files for, Mining Electronic Health Records (EHRs): A Survey

References

[1]
J. Abramson et al. 2001. Making Sense of Data: A Self-Instruction Manual on the Interpretation of Epidemiological Data. Oxford University Press.
[2]
R. Agrawal et al. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Dases, VLDB, Vol. 1215. 487--499.
[3]
D. Albers and G. Hripcsak. 2010. A statistical dynamics approach to the study of human health data: Resolving population scale diurnal variation in laboratory data. Physics Letters A 374, 9 (2010), 1159--1164.
[4]
D. Albers and G. Hripcsak. 2012. Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations. Chaos 22, 1 (2012), 013111.
[5]
D. Albers et al. 2014. Dynamical phenotyping: Using temporal analysis of clinically collected physiologic data to stratify populations. PloS One 9, 6 (2014), e96443.
[6]
C. Aliferis et al. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. Journal of Machine Learning Research 11, (Jan. 2010), 171--234.
[7]
D. Anastasiu et al. 2016. Big data and recommender systems.
[8]
P. Austin et al. 2012. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods? Biometrical Journal 54, 5 (Sept. 2012), 657--673.
[9]
L. Babuin et al. 2008. Elevated cardiac troponin is an independent risk factor for short-and long-term mortality in medical intensive care unit patients. Critical Care Medicine 36, 3 (2008), 759--765.
[10]
S. Barretto et al. 2003. Linking guidelines to electronic health record design for improved chronic disease management. In Proc. of AMIA, Vol. 2003. 66.
[11]
I. Batal and M. Hauskrecht. 2010. Mining clinical data using minimal predictive rules. AMIA Annual Symposium Proceedings, Vol. 2010, 31--35.
[12]
I. Batal, L. Sacchi, R. Bellazzi, and M. Hauskrecht. 2009. Multivariate time series classification with temporal abstractions. University of Pittsburgh.
[13]
I. Batal et al. 2012. Mining recent temporal patterns for event detection in multivariate time series data. Proc. of KDD (2012), 280.
[14]
A. Bauer-Mehren et al. 2013. Network analysis of unstructured EHR data for clinical research. AMIA Summit on Translational Science 2013 (2013), 14--8.
[15]
A. Bitton and T. Gaziano. 2010. The Framingham heart study’s impact on global risk assessment. Progress in Cardiovascular Diseases 53, 1 (2010), 68--78.
[16]
W. Bobo et al. 2014. An electronic health record driven algorithm to identify incident antidepressant medication users. Journal of the American Medical Informatics Association 21, 5 (2014), 785--791.
[17]
J. L. Breault, C. R. Goodall, and P. J. Fos. 2002. Data mining a diabetic data warehouse. AIM 26, 1 (2002), 37--54.
[18]
R. Byrd et al. 2014. Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records. IJMI 83, 12 (2014), 983--992.
[19]
H. Cao et al. 2005. Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. AMIA 2005 (Jan. 2005), 106--110.
[20]
R. Carroll et al. 2011. Naïve electronic health record phenotype identification for rheumatoid arthritis. AMIA 2011 (Jan. 2011), 189--196.
[21]
R. Caruana et al. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proc. of KDD. ACM, 1721--1730.
[22]
M. R. Castro et al. 2016. Statin use, diabetes incidence and overall mortality in normoglycemic and impaired fasting glucose patients. Journal of General Internal Medicine 31, 5 (2016), 502--508.
[23]
Y. Chang et al. 2011. Predicting hospital-acquired infections by scoring system with simple parameters. PloS One 6, 8 (Jan. 2011), e23137.
[24]
Z. Che et al. 2015. Deep computational phenotyping. In Proc. of KDD. 507--516.
[25]
Z. Che et al. 2017. Exploiting convolutional neural network for risk prediction with medical feature embedding. arXiv preprint arXiv:1701.07474 (2017).
[26]
R. Chen et al. 2016. Patient stratification using electronic health records from a chronic disease management program. IEEE Journal of Biomedical and Health Informatics. IEEE.
[27]
Y. Chen et al. 2015. Building bridges across electronic health record systems through inferred phenotypic topics. JBI 55 (2015), 82--93.
[28]
Y. Cheng, F. Wang, P. Zhang, and J. Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 432--440.
[29]
Y. Cheng et al. 2016. Risk prediction with electronic health records: A deep learning approach. In Proc. of SIAM-SDM. 432--440.
[30]
E. Choi, M. T. Bahadori, and J. Sun. 2015. Doctor AI: Predicting clinical events via recurrent neural networks. arXiv preprint arXiv:1511.05942 (2015).
[31]
E. Choi et al. 2015. Doctor AI: Predicting clinical events via recurrent neural networks. arXiv preprint arXiv:1511.05942 (2015).
[32]
E. Choi et al. 2016. Multi-layer representation learning for medical concepts. arXiv preprint arXiv:1602.05568 (2016).
[33]
S. Cholleti et al. 2012. Leveraging derived data elements in data analytic models for understanding and predicting hospital readmissions. AMIA 2012 (Jan. 2012), 103--111.
[34]
C. Chung et al. 1991. Survival analysis: A survey. Journal of Quantitative Criminology 7, 1 (1991), 59--98.
[35]
M. Coleman et al. 2008. CONCORD. Lancet Oncology 9, 8 (Aug. 2008), 730--756.
[36]
G. Collins et al. 2011. Developing risk prediction models for type 2 diabetes: A systematic review of methodology and reporting. BMC Medicine 9, 1 (Jan. 2011), 103.
[37]
D. R Cox. 1992. Regression models and life-tables. In Breakthroughs in Statistics. Springer, 527--541.
[38]
D. Dasgupta and N. Chawla. 2014. Disease and medication networks: An insight into disease-drug interactions. In 2nd International Conference on Big Data and Analytics in Healthcare. Singapore.
[39]
R. Dehejia and S. Wahba. 2002. Propensity score-matching methods for nonexperimental causal studies. Review of Economics and Statistics 84, 1 (2002), 151--161.
[40]
D. Demner-Fushman et al. 2009. What can natural language processing do for clinical decision support? JBI 42, 5 (2009), 760--772.
[41]
S. Dey et al. 2015. Predicting the factors of improvement of health status of home health care patients: A holistic data mining approach. In AMIA.
[42]
J. Dormandy et al. 2005. Secondary prevention of macrovascular events in patients with type 2 diabetes in the proactive study (prospective pioglitazone clinical trial in macrovascular events): A randomised controlled trial. Lancet 366, 9493 (2005), 1279--1289.
[43]
F. Doshi-Velez et al. 2014. Comorbidity clusters in autism spectrum disorders: An electronic health record time-series analysis. Pediatrics 133, 1 (Jan. 2014), e54--63.
[44]
G. Dunteman. 1989. Principal Components Analysis. Number 69. Sage.
[45]
S. Ebadollahi et al. 2010. Predicting patients trajectory of physiological data using temporal trends in similar patients: A system for near-term prognostics. In Proc. of AMIA, Vol. 2010. AMIA, 192.
[46]
E. Ekinci et al. 2011. Dietary salt intake and mortality in patients with type 2 diabetes. Diabetes Care 34, 3 (2011), 703--709.
[47]
R. Epstein et al. 2013. Automated identification of drug and food allergies entered using non-standard terminology. JAMIA 20, 5 (2013), 962--968.
[48]
L. Evers and C. Messow. 2008. Sparse kernel methods for high-dimensional survival data. Bioinformatics 24, 14 (2008), 1632--1638.
[49]
K. Feldman and N. Chawla. 2014. Admission duration model for infant treatment (ADMIT). In Proc. of BIBM. IEEE, 583--587.
[50]
M. Field et al. 1992. Guidelines for Clinical Practice: From Development to Use. National Academies Press.
[51]
E. Funai et al. 2001. Distribution of study designs in four major US journals of obstetrics and gynecology. Gynecologic and Obstetric Investigation 51, 1 (2001), 8--11.
[52]
G. Fung et al. 2008. Privacy-preserving predictive models for lung cancer survival analysis. Practical Privacy-Preserving Data Mining (2008), 40.
[53]
E. Gatti et al. 2011. A continuous time Bayesian network model for cardiogenic heart failure. Flexible Services and Manufacturing Journal 24, 4 (Dec. 2011), 496--515.
[54]
M. Ghalwash and Z. Obradovic. 2014. A data-driven model for optimizing therapy duration for septic patients. In Proc. of SDM-DMHH.
[55]
S. Ghassempour et al. 2014. Clustering multivariate time series using hidden Markov models. International Journal of Environmental Research and Public Health 11, 3 (2014), 2741--2763.
[56]
L. Gordon and R. A. Olshen. 1985. Tree-structured survival analysis. Cancer Treatment Reports 69, 10 (1985), 1065--1069.
[57]
D. Gotz. 2016. Adaptive contextualization: Combating bias during high-dimensional visualization and data selection. In Proc. of ICIUI. ACM, 85--95.
[58]
D. Gotz and H. Stavropoulos. 2014. Decisionflow: Visual analytics for high-dimensional temporal event sequence data. TVGC 20, 12 (2014), 1783--1792.
[59]
D. Gotz et al. 2011. Visual cluster analysis in support of clinical decision intelligence. AMIA 2011 (Jan. 2011), 481--490.
[60]
D. Gotz et al. 2014. A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. JBI 48 (2014), 148--159.
[61]
D. Goyal et al. 2012. Clinically identified postpartum depression in asian american mothers. JOGNN/NAACOG 41, 3 (2012), 408--416.
[62]
D. Grimes and K. Schulz. 2002. An overview of clinical research: The lay of the land. Lancet 359, 9300 (2002), 57--61.
[63]
S. Gruber et al. 1986. Clinical epidemiology: The architecture of clinical research. Yale Journal of Biology and Medicine 59, 1 (1986), 77.
[64]
K. Haerian et al. 2012. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clinical Pharmacology and Therapeutics 92, 2 (Aug. 2012), 228--234.
[65]
Y. Hagar et al. 2014. Survival analysis with electronic health record data: Experiments with chronic kidney disease. Statistical Analysis and Data Mining 7, 5 (2014), 385--403.
[66]
D. Hanauer and N. Ramakrishnan. 2013. Modeling temporal relationships in large scale clinical associations. JAMIA 20, 2 (2013), 332--341.
[67]
D. Hanauer et al. 2009. Exploring clinical associations using ’-omics’ based enrichment analyses. PloS One 4, 4 (Jan. 2009), e5203.
[68]
D. Hanauer et al. 2013. Describing the relationship between cat bites and human depression using data from an electronic health record. PLoS One 8, 8 (2013), e70585.
[69]
M. Hauskrecht et al. 2013. Outlier detection for patient monitoring and alerting. JBI 46, 1 (2013), 47--55.
[70]
M. Hearst et al. 1998. Support vector machines. ISA 13, 4 (1998), 18--28.
[71]
C. Hennekens et al. 1987. Epidemiology in Medicine. Boston: Little Brown and Company, 1987.
[72]
V. Herasevich et al. 2013. Connecting the dots: Rule-based decision support systems in the modern EMR era. Journal of Clinical Monitoring and Computing 27, 4 (Aug. 2013), 443--448.
[73]
J. Hogan and T. Lancaster. 2004. Instrumental variables and inverse probability weighting for causal inference from longitudinal observational studies. SMMR 13, 1 (2004), 17--48.
[74]
A. Holmes et al. 2011. Discovering disease associations by integrating electronic clinical data and medical literature. PloS One 6, 6 (Jan. 2011), e21132.
[75]
G. Hripcsak. 2013. Correlating electronic health record concepts with healthcare process events. JAMIA 20, e2 (Dec. 2013), e311--318.
[76]
G. Hripcsak and D. Albers. 2013. Next-generation phenotyping of electronic health records. JAMIA 20, 1 (2013), 117--121.
[77]
G. Hripcsak et al. 1997. Automated tuberculosis detection. JAMIA 4, 5 (1997), 376--381.
[78]
G. Hripcsak et al. 2011. Bias associated with mining electronic health records. Journal of Biomedical Discovery and Collaboration 6 (2011), 48.
[79]
G. Hripcsak et al. 2015. Parameterizing time in electronic health record studies. Journal of the American Medical Informatics Association 22, 4 (2015), 794--804.
[80]
S. Huang et al. 2014. Toward personalizing treatment for depression: Predicting diagnosis and severity. Journal of the American Medical Informatics Association 21, 6 (2014), 1069--1075.
[81]
K. Ikeda et al. 1991. Effect of repeated transcatheter arterial embolization on the survival time in patients with hepatocellular carcinoma. An analysis by the Cox proportional hazard model. Cancer 68, 10 (1991), 2150--2154.
[82]
H. Ishwaran et al. 2008. Random survival forests. The Annals of Applied Statistics (2008), 841--860.
[83]
S. Iyer et al. 2014. Mining clinical text for signals of adverse drug-drug interactions. JAMIA 21, 2 (2014), 353--62.
[84]
H. Jackson et al. 2011. Data mining derived treatment algorithms from the electronic medical record improve theoretical empirical therapy for outpatient urinary tract infections. Journal of Urology 186, 6 (2011), 2257--2262.
[85]
H. Jin et al. 2008. Mining unexpected temporal associations: Applications in detecting adverse drug reactions. TITB 12, 4 (July 2008), 488--500.
[86]
L. Kalankesh et al. 2013. Taming EHR data: Using semantic similarity to reduce dimensionality. Studies in Health Technology and Informatics 192 (Jan. 2013), 52--56.
[87]
E. Kaplan and P. Meier. 1958. Nonparametric estimation from incomplete observations. JASA 53, 282 (1958), 457--481.
[88]
S. Karnik et al. 2012. Predicting atrial fibrillation and flutter using electronic health records. EMBC 2012 (Jan. 2012), 5562--5565.
[89]
M. Kattan et al. 1998. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Computers and Biomedical Research 31, 5 (1998), 363--373.
[90]
E. Kawaler et al. 2012. Learning to predict post-hospitalization VTE risk from EHR data. AMIA 2012 (Jan. 2012), 436--445.
[91]
N. Keiding et al. 1997. The role of frailty models and accelerated failure time models in describing heterogeneity due to omitted covariates. SIM 16, 2 (1997), 215--224.
[92]
J. Kelsey. 1996. Methods in Observational Epidemiology. Vol. 26. Oxford University Press.
[93]
F. Khan and V. Zubek. 2008. Support vector regression for censored data (SVRc): A novel tool for survival analysis. In Proc. of ICDM. IEEE, 863--868.
[94]
A. Khosla et al. 2010. An integrated machine learning approach to stroke prediction. In Proc. of KDD. ACM, 183--192.
[95]
J. Kirby et al. 2016. PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability. Journal of the American Medical Informatics Association 23, 6 (2016), 1046--1052.
[96]
J. Klein and M. Moeschberger. 2005. Survival Analysis: Techniques for Censored and Truncated Data. Springer Science 8 Business Media.
[97]
N. Laird and J. Ware. 1982. Random-effects models for longitudinal data. Biometrics (1982), 963--974.
[98]
T. Lasko et al. 2013. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One 8, 6 (2013), e66341.
[99]
J. Last et al. 2001. A Dictionary of Epidemiology. Vol. 141. Oxford University Press.
[100]
J. Lee. 1994. Odds ratio or relative risk for cross-sectional data? International Journal of Epidemiology 23, 1 (1994), 201--203.
[101]
B. Letham et al. 2013. An interpretable stroke prediction model using rules and Bayesian analysis. AAAI (Late-Breaking Developments).
[102]
D. Levinson and Inspector General. 2010. Adverse events in hospitals: National incidence among medicare beneficiaries. Department of Health and Human Services Office of the Inspector General (2010).
[103]
K. Levit et al. 2003. Trends in US health care spending, 2001. Health Affairs 22, 1 (2003), 154--164.
[104]
Y. Li et al. 2014. A method for controlling complex confounding effects in the detection of adverse drug reactions using electronic health records. JAMIA 21, 2 (2014), 308--314.
[105]
Y. Li et al. 2015. Constrained elastic net based knowledge transfer for healthcare information exchange. DMKD 29, 4 (2015), 1094--1112.
[106]
Y. Li et al. 2016a. A distributed ensemble approach for mining healthcare data under privacy constraints. Information Sciences 330 (2016), 245--259.
[107]
Y. Li et al. 2016b. Regularized parametric regression for high-dimensional survival analysis. In Proceedings of SIAM International Conference on Data Mining.
[108]
Y. Li et al. 2016c. Regularized weighted linear regression for high-dimensional censored data. In Proceedings of SIAM International Conference on Data Mining. SIAM.
[109]
K. Liang et al. 1990. The Cox proportional hazards model with change point: An epidemiologic application. Biometrics (1990), 783--793.
[110]
Z. Liang et al. 2014. Deep learning for healthcare decision making with EMRs. In Proc. of BIBM. 556--559.
[111]
V. Liao and M. Chen. 2013. Efficient mining gapped sequential patterns for motifs in biological sequences. BMC Systems Biology 7, Suppl. 4 (Jan. 2013), S7.
[112]
D. Lilienfeld and P. Stolley. 1994. Foundations of Epidemiology. Oxford University Press.
[113]
J. Lin and P. Haug. 2008. Exploiting missing clinical data in Bayesian network modeling for predicting medical problems. JBI 41, 1 (2008), 1--14.
[114]
Z. Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).
[115]
C. Liu et al. 2015. Temporal phenotyping from longitudinal electronic health records: A graph based framework. In Proc. of KDD. 705--714.
[116]
P. Lucas et al. 2004. Bayesian networks in biomedicine and health-care. AIM 30, 3 (March 2004), 201--214.
[117]
T. Lumley et al. 2002. A stroke prediction score in the elderly: Validation and web-based application. Journal of Clinical Epidemiology 55, 2 (2002), 129--136.
[118]
Y. Luo et al. 2016. Tensor factorization toward precision medicine. Briefings in Bioinformatics 18, 3 (2016), 511--514.
[119]
S. Mani et al. 2007. Learning causal and predictive clinical practice guidelines from data. In Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems (Medinfo’07). IOS Press, 850.
[120]
S. Mani et al. 2012. Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA 2012 (Jan. 2012), 606--615.
[121]
S. Mani et al. 2014. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. JAMIA 21, 2 (2014), 326--336.
[122]
C. J. Mann. 2003. Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emergency Medicine Journal 20, 1 (2003), 54--60.
[123]
B. Marlin et al. 2012. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In Proc. of SIGHIT. ACM, 389--398.
[124]
J. Maroco et al. 2011. Data mining methods in the prediction of dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Research Notes 4 (Jan. 2011), 299.
[125]
M. Matheny et al. 2010. Development of inpatient risk stratification models of acute kidney injury for use in electronic health records. Medical Decision Making 30, 6 (2010), 639--650.
[126]
J. Matthews. 2006. Introduction to Randomized CCT. CRC Press.
[127]
I. Melnyk et al. 2013. Detection of precursors to aviation safety incidents due to human factors. In 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW’13). IEEE, 407--412.
[128]
G. Melton and G. Hripcsak. 2005. Automated detection of adverse events using natural language processing of discharge summaries. JAMIA 12, 4 (2005), 448--457.
[129]
M. Mercer et al. 2009. Multimorbidity in primary care: Developing the research agenda. Family Practice 26, 2 (2009), 79--80.
[130]
R. Miotto et al. 2016. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports 6 (2016).
[131]
D. Moher et al. 1998. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352, 9128 (1998), 609--613.
[132]
K. Monsen et al. 2010. Discovering client and intervention patterns in home visiting data. WJNR 32, 8 (2010), 1031--1054.
[133]
K. Monsen et al. 2015. Factors explaining variability in health literacy outcomes of public health nursing clients. Public Health Nursing 32, 2 (2015), 94--100.
[134]
M. Munson et al. 2014. Data mining for identifying novel associations and temporal relationships with charcot foot. Journal of Diabetes Research 2014 (2014).
[135]
S. Nachimuthu et al. 2010. Modeling glucose homeostasis and insulin dosing in an intensive care unit using dynamic Bayesian networks. AMIA 2010 (Jan. 2010), 532--536.
[136]
G. Nadkarni et al. 2014. Development and validation of an electronic phenotyping algorithm for chronic kidney disease. In Proc. of AMIA, Vol. 2014. 907.
[137]
K. Ng et al. 2014. PARAMO: A parallel predictive modeling platform for healthcare analytic research using electronic health records. JBI 48 (2014), 160--170.
[138]
C. Ngufor et al. 2015. A heterogeneous multi-task learning for predicting RBC transfusion and perioperative outcomes. In Proc. of AIM. Springer, 287--297.
[139]
E. Nomura et al. 2003. Population-based study of relationship between hospital surgical volume and 5-year survival of stomach cancer patients in Osaka, Japan. Cancer Science 94, 11 (Nov. 2003), 998--1002.
[140]
W. Oh et al. 2016. Type 2 diabetes mellitus trajectories and associated risk factors. Big Data 4, 1 (2016), 25--30.
[141]
H. Ory. 1977. Association between oral contraceptives and myocardial infarction: A review. JAMA 237, 24 (1977), 2619--2622.
[142]
A. Oztekin et al. 2009. Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodology. IJMI 78, 12 (Dec. 2009), e84--96.
[143]
S. Pakhomov et al. 2011. The role of the electronic medical record in the assessment of health related quality of life. AMIA 2011 (Jan. 2011), 1080--1088.
[144]
M. Panahiazar et al. 2015. Using EHRs and machine learning for heart failure survival analysis. In MEDINFO, Vol. 216. IOS Press, 40.
[145]
Y. Park and J. Ghosh. 2014. A hierarchical ensemble of -trees for predicting expensive hospital visits. In Brain Informatics and Health. Springer, 178--187.
[146]
J. Pathak et al. 2012. Applying semantic web technologies for phenome-wide scan using an electronic health record linked biobank. JBS 3 (2012), 10.
[147]
J. Pathak et al. 2013. Electronic health records-driven phenotyping: Challenges, recent advances, and perspectives. JAMIA 20, e2 (Dec. 2013), e206--211.
[148]
D. Patnaik et al. 2011. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In Proc. of KDD. 360--368.
[149]
C. Paxton et al. 2013a. Developing predictive models using electronic medical records: Challenges and pitfalls. In AMIA, Vol. 2013. 1109.
[150]
C. Paxton et al. 2013b. Developing predictive models using electronic medical records: Challenges and pitfalls. AMIA 2013 (Jan. 2013), 1109--1115.
[151]
L. Peelen et al. 2010. Using hierarchical dynamic Bayesian networks to investigate dynamics of organ failure in patients in the intensive care unit. JBI 43, 2 (Apr. 2010), 273--286.
[152]
D. Peikes et al. 2008. Propensity score matching: A note of caution for evaluators of social programs. The American Statistician 62, 3 (2008), 222--231.
[153]
P. Peissig et al. 2012. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. JAMIA 19, 2 (2012), 225--234.
[154]
A. Perotte and G. Hripcsak. 2013. Temporal properties of diagnosis code time series in aggregate. JBHI 17, 2 (2013), 477--483.
[155]
R. Pivovarov et al. 2014. Temporal trends of hemoglobin A1c testing. JAMIA 21, 6 (2014), 1038--1044.
[156]
K. Polkinghorne et al. 2004. Vascular access and all-cause mortality: A propensity score analysis. JASN 15, 2 (2004), 477--486.
[157]
L. Pruinelli et al. 2014. Data mining methodologies to discover best practices for diabetic patients with health disparities. In AMIA.
[158]
L. Pruinelli et al. 2015. Clustering health data to discover EBP interventions for sepsis prevention and treatment for health disparities. In AMIA.
[159]
L. Pruinelli et al. 2016. A data mining approach to determine sepsis guideline impact on inpatient mortality and complications. AMIA Summits on Translational Science Proceedings 2016 (2016), 194.
[160]
A. Ramirez et al. 2012. Predicting warfarin dosage in European-Americans and African-Americans using DNA samples linked to an electronic health record. Pharmacogenomics 13, 4 (March 2012), 407--418.
[161]
S. Rana et al. 2015. A predictive framework for modeling healthcare data with evolving clinical interventions. Statistical Analysis and Data Mining 8, 3 (2015), 162--182.
[162]
R. Ranganath et al. 2016. Deep survival analysis. In Machine Learning for Healthcare Conference. 101--114.
[163]
C. Reddy and C. Aggarwal. 2015. Healthcare Data Analytics. Vol. 36. CRC Press.
[164]
C. Rihal et al. 2002. Incidence and prognostic importance of acute renal failure after percutaneous coronary intervention. Circulation 105, 19 (2002), 2259--2264.
[165]
J. Robins et al. 2000. Marginal structural models and causal inference in epidemiology. LWW.
[166]
J. Robinson et al. 2011. Lack of association between 25 (OH) D levels and incident type 2 diabetes in older women. Diabetes Care 34, 3 (2011), 628--634.
[167]
J. Rodrigues. 2009. Health Information Systems: Concepts, Methodologies, Tools, and Applications: Concepts, Methodologies, Tools, and Applications. Vol. 1. IGI Global.
[168]
F. Roque et al. 2011. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Computational Biology 7, 8 (Aug. 2011), e1002141.
[169]
C. Rose et al. 2005. A dynamic Bayesian network for handling uncertainty in a decision support system adapted to the monitoring of patients treated by hemodialysis. In 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05). IEEE, 5--pp.
[170]
P. Rossing et al. 1996. Predictors of mortality in insulin dependent diabetes: 10 year observational follow up study. BMJ 313, 7060 (1996), 779--784.
[171]
K. Rothman and S. Greenland. 1986. Modern epidemiology.Boston: Little brown and company. Study Design for Comparison of Orthoses 273 (1986), 101--106.
[172]
L. Sacchi et al. 2007. Data mining with temporal abstractions: Learning rules from time series. DMKD 15, 2 (June 2007), 217--247.
[173]
D. Sackett. 2000. Evidence-Based Medicine. Wiley Online Library.
[174]
D. L Sackett. 1979. Bias in analytic research. Journal of Chronic Diseases 32, 1 (1979), 51--63.
[175]
M. Sandri et al. 2014. Dynamic Bayesian networks to predict sequences of organ failures in patients admitted to ICU. JBI 48 (Apr. 2014), 106--113.
[176]
C. Sarkar and J. Srivastava. 2013. Impact of density of lab data in EHR for prediction of potentially preventable events. In IEEE ICHI. 529--534.
[177]
C. Sarkar et al. 2012. Improved feature selection for hematopoietic cell transplantation outcome prediction using rank aggregation. In Proc. of FedCSIS. IEEE, 221--226.
[178]
A. Sarvestani et al. 2010. Predicting breast cancer survivability using data mining techniques. In 2010 2nd International Conference on Software Technology and Engineering, Vol. 2. IEEE, V2--227--V2--231.
[179]
A. Sathyanarayana et al. 2014. Clinical decision making: A framework for predicting rx response. In Proc. of ICDM-W. IEEE, 1185--1188.
[180]
J. Schrom et al. 2013. Quantifying the effect of statin use in pre-diabetic phenotypes discovered through association rule mining. AMIA 2013 (Jan. 2013), 1249--1257.
[181]
P. Schulam et al. 2015. Clustering longitudinal clinical marker trajectories from electronic health data: Applications to phenotyping and endotype discovery. In IEEE, Proc. of AAAI.
[182]
M. Schuster et al. 1998. How good is the quality of health care in the united states? Milbank Quarterly 76, 4 (1998), 517--563.
[183]
D. Sengupta and P. Naik. 2013. SN algorithm: Analysis of temporal clinical data for mining periodic patterns and impending augury. JCB 3, 1 (Jan. 2013), 24.
[184]
Y. Shahar. 1997. A framework for knowledge-based temporal abstraction. Artificial Intelligence 90, 1--2 (Feb. 1997), 79--133. 00043702
[185]
M. Sharma et al. 2015. Feature-based factorized bilinear similarity model for cold-start top-n item recommendation. In SIAM-SDM. 190--198.
[186]
H. Shiao and V. Cherkassky. 2014. Learning using privileged information (LUPI) for modeling survival data. In Proc. of IJCNN. IEEE, 1042--1049.
[187]
B. Shickel et al. 2017. Deep EHR: A survey of recent advances on deep learning techniques for electronic health record (EHR) analysis. arXiv Preprint arXiv:1706.03446 (2017).
[188]
A. Shin et al. 2010. Diagnostic analysis of patients with essential hypertension using association rule mining. Healthcare Informatics Research 16, 2 (June 2010), 77--81.
[189]
C. Shivade et al. 2014. A review of approaches to identifying patient phenotype cohorts using electronic health records. JAMIA 21, 2 (2014), 221--230.
[190]
P. Shivaswamy et al. 2007. A support vector approach to censored targets. In Proc. of ICDM. IEEE, 655--660.
[191]
G. Simon et al. 2011a. A simple statistical model and association rule filtering for classification. In Proc. of KDD. ACM Press, New York, 823.
[192]
G. Simon et al. 2013. Survival association rule mining towards type 2 diabetes risk assessment. AMIA 2013 (Jan. 2013), 1293--1302.
[193]
N. Simon et al. 2011b. Regularization paths for Coxs proportional hazards model via coordinate descent. Journal of Statistical Software 39, 5 (2011), 1--13.
[194]
M. Skevofilakas et al. 2010. A hybrid decision support system for the risk assessment of retinopathy development as a long term complication of type 1 diabetes mellitus. EMBC 2010 (Jan. 2010), 6713--6716.
[195]
P. Snow et al. 1994. Artificial neural networks in the diagnosis and prognosis of prostate cancer: A pilot study. Journal of Urology 152, 5 Pt 2 (1994), 1923--1926.
[196]
S. Somanchi et al. 2015. Early prediction of cardiac arrest (code blue) using electronic medical records. In Proc. of KDD. 2119--2126.
[197]
P. Stang et al. 2010. Advancing the science for active surveillance: Rationale and design for the observational medical outcomes partnership. Annals of Internal Medicine 153, 9 (2010), 600--606.
[198]
G. Stiglic et al. 2015. Comprehensible predictive modeling using regularized logistic regression and comorbidity based features. PloS One 10, 12 (2015), e0144439.
[199]
J. Sun et al. 2014. Predicting changes in hypertension control using electronic health records from a chronic disease management program. JAMIA 21, 2 (2014), 337--344.
[200]
Y. Sverchkov et al. 2012. A multivariate probabilistic method for comparing two clinical datasets. In Proc. of SIGHIT. ACM, 795--800.
[201]
P. Tan et al. 2006. Introduction to Data Mining. Vol. 1. Pearson Addison Wesley Boston.
[202]
T. Therneau and C. Crowson. 2014. Using time dependent covariates and time dependent coefficients in the Cox model. Red 2 (2014), 1.
[203]
R. Tibshirani et al. 1997. The lasso method for variable selection in the Cox model. Statistics in Medicine 16, 4 (1997), 385--395.
[204]
C. Torio et al. 2006. Trends in Potentially Preventable Hospital Admissions Among Adults and Children, 2005--2010: Statistical Brief# 151.
[205]
V. Van Bellev et al. 2007. Support vector machines for survival analysis. In Proc. of CIMED. 1--8.
[206]
M. van der Heijden et al. 2014. Learning Bayesian networks for clinical time series analysis. JBI 48 (Apr. 2014), 94--105.
[207]
P. Vellanki et al. 2014. Nonparametric discovery of learning patterns and autism subgroups from therapeutic data. In ICPR. IEEE, 1828--1833.
[208]
M. Verduijn et al. 2007. Temporal abstraction for feature extraction: A comparative case study in prediction from intensive care monitoring data. AIM 41, 1 (Sept. 2007), 1--12.
[209]
L. Vernick et al. 1984. Selection of neighborhood controls: Logistics and fieldwork. Journal of Chronic Diseases 37, 3 (1984), 177--182.
[210]
S. Vilar et al. 2012. Enhancing adverse drug event detection in electronic health records using molecular structure similarity: Application to pancreatitis. PloS One 7, 7 (Jan. 2012), e41471.
[211]
B. Vinzamuri and C. Reddy. 2013. Cox regression with correlation based regularization for electronic health records. In Proc. of ICDM. IEEE, 757--766.
[212]
B. Vinzamuri et al. 2014. Active learning based survival regression for censored data. In Proc. of CIKM. ACM, 241--250.
[213]
S. Wacholder et al. 1992. Selection of controls in case-control studies: II. Types of controls. AJE 135, 9 (1992), 1029--1041.
[214]
J. J. Walline. 2001. Designing Clinical Research: An Epidemiologic Approach. LWW.
[215]
F. Wang et al. 2013a. A framework for mining signatures from event sequences and its applications in healthcare data. IEEE PAMI 35, 2 (2013), 272--285.
[216]
F. Wang et al. 2014a. Clinical risk prediction with multilinear sparse logistic regression. In SIGKDD. ACM, 145--154.
[217]
X. Wang et al. 2013b. Exploring patient risk groups with incomplete knowledge. In Proc. of ICDM. IEEE, 1223--1228.
[218]
X. Wang et al. 2014b. Unsupervised learning of disease progression models. In Proc. of KDD. ACM, 85--94.
[219]
Y. Wang et al. 2015a. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In Proc. of KDD. 1265--1274.
[220]
Z. Wang et al. 2015b. Dynamic poisson autoregression for influenza-like-illness case count prediction. In Proc. of KDD. ACM, 1285--1294.
[221]
B. Wells et al. 2008. Predicting 6-year mortality risk in patients with type 2 diabetes. Diabetes Care 31, 12 (2008), 2301--2306.
[222]
V. West et al. 2015. Innovative information visualization of electronic health record data: A systematic review. JAMIA 22, 2 (2015), 330--339.
[223]
B. Westra et al. 2011. Interpretable predictive models for knowledge discovery from home-care electronic health records. JHE 2, 1 (2011), 55--74.
[224]
B. Westra et al. 2017. Secondary analysis of an electronic surveillance system combined with multi-focal interventions for early detection of sepsis. ACI 8, 1 (2017), 47--66.
[225]
R. White et al. 2013. Web-scale pharmacovigilance: Listening to signals from the crowd. JAMIA 20, 3 (May 2013), 404--408.
[226]
A. Wilcox and G. Hripcsak. 2003. The role of domain knowledge in automating medical text report classification. JAMIA 10, 4 (2003), 330--338.
[227]
P. Wilson et al. 2008. Prediction of first events of coronary heart disease and stroke with consideration of adiposity. Circulation 118, 2 (2008), 124--130.
[228]
J. Wolff. 2002. Prevalence, expenditures and complications of multiple chronic conditions in the elderly. Archives of Internal Medicine 162, 20 (2002), 2269--2276.
[229]
J. Wooldridge. 1992. Some alternatives to the box-Cox regression model. International Economic Review (1992), 935--955.
[230]
A. Wright et al. 2010. An automated technique for identifying associations between medications, laboratory results and problems. JBI 43, 6 (Dec. 2010), 891--901.
[231]
Z. Xing et al. 2014. Bayesian modeling of temporal properties of infectious disease in a college student population. Journal of Applied Statistics 41, 6 (2014), 1358--1382.
[232]
H. Xu et al. 2011. Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA 2011 (Jan. 2011), 1564--1572.
[233]
P. Yadav. 2017. Causal pattern mining in highly heterogeneous and temporal EHRs data. University of Minnesota.
[234]
P. Yadav et al. 2015a. Forensic style analysis with survival trajectories. In Proc. of ICDM. IEEE, 1069--1074.
[235]
P. Yadav et al. 2015b. Modelling trajectories for diabetes complications. In Proceedings of the 4th Workshop on Data Mining for Medicine and Healthcare. 2015 SIAM International Conference on Data Mining.
[236]
P. Yadav et al. 2016a. Causal inference in observational data. arXiv preprint arXiv:1611.04660 (2016).
[237]
P. Yadav et al. 2016b. Interrogation of bronchoalveolar lavage fluid in acute respiratory distress syndrome using multiplexed proseek immunoassay. In C49. Respiratory Failure: Clinical and Translational Aspects Of Vili And Lung Protective Mv. American Thoracic Society, A5246--A5246.
[238]
S. Yu et al. 2008. Privacy-preserving Cox regression for survival analysis. In Proc. of KDD. ACM, 1034--1042.
[239]
H. Zhai et al. 2014. Developing and evaluating a machine learning based algorithm to predict the need of pediatric intensive care unit transfer for newly hospitalized children. Resuscitation 85, 8 (Aug. 2014), 1065--1071.
[240]
C. Zhang and S. Zhang. 2002. Association Rule Mining: Models and Algorithms. Springer-Verlag.
[241]
H. Zhang and W. Lu. 2007. Adaptive lasso for Cox’s proportional hazards model. Biometrika 94, 3 (2007), 691--703.
[242]
P. Zhang et al. 2014. Towards personalized medicine: Leveraging patient similarity and drug similarity analytics. AMIA Summits on Translational Science Proceedings, vol. 2014. American Medical Informatics Association, 132.
[243]
P. Zhang et al. 2015. Label propagation prediction of drug-drug interactions based on clinical side effects. Scientific Reports 5 (2015).
[244]
W. Zhang et al. 2013. Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Computational Biology 9, 3 (2013), e1002975.
[245]
D. Zhao et al. 2011. Combining knowledge and EHR data to develop a weighted Bayesian network for pancreatic cancer prediction. JBI 44, 5 (2011), 859--868.
[246]
K. Zolfaghar et al. 2013. Risk-o-meter: An intelligent clinical risk calculator. In Proc. KDD. ACM, 1518--1521.

Cited By

View all
  • (2024)Avaliação de grandes modelos de linguagem na extração de informações clínicaEvaluating of large language models in extracting clinical informationEvaluación de modelos de lenguaje en la extracción de información clínicaJournal of Health Informatics10.59681/2175-4411.v16.iEspecial.2024.130616:EspecialOnline publication date: 19-Nov-2024
  • (2024)Important Concerns With Comorbidities and Type 2 Diabetes in Clinical Decision Support Systems Based on Mobile SolutionsImpact of Digital Solutions for Improved Healthcare Delivery10.4018/979-8-3693-5237-3.ch008(231-256)Online publication date: 18-Oct-2024
  • (2024)A Study on Blockchain's Transformation of Healthcare SystemsLightweight Digital Trust Architectures in the Internet of Medical Things (IoMT)10.4018/979-8-3693-2109-6.ch009(145-165)Online publication date: 31-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 50, Issue 6
November 2018
752 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3161158
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2018
Accepted: 01 July 2017
Revised: 01 July 2017
Received: 01 March 2017
Published in CSUR Volume 50, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. EHRs
  2. Healthcare analytics
  3. data mining
  4. healthcare informatics
  5. machine learning

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,203
  • Downloads (Last 6 weeks)137
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media