skip to main content
research-article

Chronic disease prediction using administrative data and graph theory: : The case of type 2 diabetes

Published: 01 December 2019 Publication History

Highlights

Administrative healthcare data is used to identify high-risk chronic patients.
Graph theory and social network analysis concepts were used to understand the disease progression.
Prediction framework showed between 82% to 87% accuracy using different methods.
Three predictive methods — regression, parameter optimization and tree classification were used.
Binary tree classification showed higher performance compared to the other two.

Abstract

Clinical diagnosis and regular monitoring of the population at risk of chronic diseases is clinically and financially resource-intensive. Mining administrative data could be an effective alternative way to identify this high-risk cohort. In this research, we apply data mining and network analysis technique on hospital admission and discharge data to understand the disease or comorbidity footprints of chronic patients. Based on this understanding we have developed a chronic disease risk prediction framework. The framework is then tested on Australian healthcare context to predict type 2 diabetes (T2D) risk. The dataset contained approximately 1.4 million admission records from 0.75 million patients. From this, we filtered and sampled the records of 2300 patients having comorbidities including T2D and another 2300 patients having comorbidities other than T2D. Along with demographic and behavioral risk factors for prediction, we propose several graph theory and social network-based measures which indicate the prevalence of comorbidities, transition patterns, and clustering membership. We use an exploratory approach to understand the relative impact of these risk factors and evaluate the prediction performance using three different predictive methods—regression, parameter optimization, and tree classification. All three prediction methods gave the highest ranking to the graph theory-based ‘comorbidity prevalence’ and ‘transition pattern match’ scores showing the effectiveness of the proposed network theory-based measures. Overall, the prediction accuracy between 82% to 87% shows the potential of the framework utilizing administrative data. The proposed framework could be useful for governments and health insurers to identify high-risk chronic disease cohorts. Developing preventive strategies then, over a period of time, can reduce the burden of acute care hospitalization.

References

[1]
ACCD. (2015). Australian consortium for classification development. In: ICD-10-AM/ACHI/ACS. (Vol. 2019). https://www.accd.net.au/Icd10.aspx Last accessed: 18-06-2019.
[2]
Aksoy, A. (2015). NUMERICAL MODELS| Parameter Estimation.
[3]
American Diabetes Association, National diabetes statistics report, 2014. Estimates of diabetes and its burden in the epidemiologic estimation methods, 2014, Natl Diabetes Stat Rep, 2009–2012.
[4]
M. Baglioni, S. Pieroni, F. Geraci, F. Mariani, S. Molinaro, M. Pellegrini, et al., A new framework for distilling higher quality information from health data via social network analysis, in: Data mining workshops (ICDMW), 2013 IEEE 13th international conference on, IEEE, 2013, pp. 48–55.
[5]
A.-L. Barabási, Network medicine—from obesity to the “diseasome”, New England Journal of Medicine 357 (2007) 404–407.
[6]
V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment (2008) P10008. 2008.
[7]
L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and regression trees, CRC press, Boca Raton, Florida, 1984.
[8]
M.J. Breslow, O. Badawi, Severity scoring in the critically ill: Part 1—interpretation and accuracy of outcome prediction scoring systems, CHEST Journal 141 (2012) 245–252.
[9]
P.R. Burton, D.G. Clayton, L.R. Cardon, N. Craddock, P. Deloukas, A. Duncanson, et al., Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls, Nature 447 (2007) 661–678.
[10]
CDC, Smoking and diabetes, in: Tips from former smokers, 2016, 2015.
[11]
M.E. Charlson, P. Pompei, K.L. Ales, C.R. MacKenzie, A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation, Journal of Chronic Diseases 40 (1987) 373–383.
[12]
D.A. Davis, N.V. Chawla, N. Blumm, N. Christakis, A.-L. Barabási, Predicting individual disease risk based on medical history, in: Proceedings of the 17th ACM conference on Information and knowledge management, ACM, 2008, pp. 769–778.
[13]
D.A. Davis, N.V. Chawla, N.A. Christakis, A.-L. Barabási, Time to CARE: A collaborative engine for practical disease prediction, Data Mining and Knowledge Discovery 20 (2010) 388–415.
[14]
Department of Health (Australian Government) (2016). Hospital Casemix Protocol (HCP). In: HCP and PHDB Annual Report Statistics. http://www.health.gov.au/internet/main/publishing.nsf/Content/health-casemix-data-collections-about-HCP. Last accessed: 18-06-2019.
[15]
A. Elixhauser, C. Steiner, D.R. Harris, R.M. Coffey, Comorbidity measures for use with administrative data, Medical Care 36 (1998) 8–27.
[16]
G.D. Ferrier, C.K. Lovell, Measuring cost efficiency in banking: Econometric and linear programming evidence, Journal of Econometrics 46 (1990) 229–245.
[17]
F. Folino, C. Pizzuti, M. Ventura, A comorbidity network approach to predict disease risk, in: Information technology in bio- and medical informatics, ITBAM 2010, Springer, 2010, pp. 102–109.
[18]
A. Garland, R. Fransoo, K. Olafson, C. Ramsey, M. Yogendren, D. Chateau, et al., The epidemiology and outcomes of critical illness in manitoba, University of Manitoba, Winnipeg, Manitoba. Faculty of Medicine, Department of Community Health Sciences, 2012.
[19]
E.W. Gregg, B.L. Cadwell, Y.J. Cheng, C.C. Cowie, D.E. Williams, L. Geiss, et al., Trends in the prevalence and ratio of diagnosed to undiagnosed diabetes according to obesity levels in the US, Diabetes Care 27 (2004) 2806–2812.
[20]
M.I. Harris, Undiagnosed NIDDM: Clinical and public health issues, Diabetes Care 16 (1993) 642–652.
[21]
Healthcare Cost and Utilization Project (HCUP), HCUP elixhauser comorbidity software, in: Agency for healthcare research and quality, Rockville, MD, 2017.
[22]
T. Ideker, R. Sharan, Protein networks in disease, Genome Research 18 (2008) 644–652.
[23]
K.E. Joynt, E.J. Orav, A.K. Jha, Thirty-day readmission rates for Medicare beneficiaries by race and site of care, JAMA 305 (2011) 675–681.
[24]
T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation, Prentice Hall, Upper Saddle River, 2000.
[25]
V. Kapur, R.E. Sandblom, R. Hert, B. James, D. Sean, The medical cost of undiagnosed sleep apnea, Sleep 22 (1999) 749.
[26]
A. Khan, U. Srinivasan, S. Uddin, Development and exploration of polymedication network from Pharmaceutical and Medicare Benefits Scheme data, in: Proceedings of the Australasian computer science week multiconference, ACM, 2019, p. 34.
[27]
A. Khan, S. Uddin, U. Srinivasan, Adapting graph theory and social network measures on healthcare data: A new framework to understand chronic disease progression, in: Proceedings of the Australasian computer science week multiconference, ACM, 2016, p. 66.
[28]
A. Khan, S. Uddin, U. Srinivasan, Comorbidity network for chronic disease: A novel approach to understand type 2 diabetes progression, International Journal of Medical Informatics 115 (2018) 1–9.
[29]
R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Ijcai, 14, 1995, pp. 1137–1145.
[30]
B.E. Landon, N.L. Keating, M.L. Barnett, J.-P. Onnela, S. Paul, A.J. O'Malley, et al., Variation in patient-sharing networks of physicians across the United States, JAMA 308 (2012) 265–273.
[31]
A.H. Lauruschkat, B. Arnrich, A.A. Albert, J.A. Walter, B. Amann, U.P. Rosendahl, et al., Prevalence and risks of undiagnosed diabetes mellitus in patients undergoing coronary artery bypass grafting, Circulation 112 (2005) 2397–2402.
[32]
V.I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, in: Soviet physics doklady, 10, 1966, p. 707.
[33]
W.Y. Loh, Classification and regression trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (2011) 14–23.
[34]
J. Loscalzo, I. Kohane, A.L. Barabasi, Human disease classification in the postgenomic era: A complex systems approach to human pathobiology, Molecular Systems Biology 3 (2007) 124.
[35]
H. Luijks, T. Schermer, H. Bor, C. van Weel, T. Lagro-Janssen, M. Biermans, et al., Prevalence and incidence density rates of chronic comorbidity in type 2 diabetes patients: An exploratory cohort study, BMC Medicine 10 (2012) 128.
[36]
E.J. MacKenzie, J.A. Morris Jr, S.L. Edelstein, Effect of pre-existing disease on length of hospital stay in trauma patients, Journal of Trauma and Acute Care Surgery 29 (1989) 757–765.
[37]
S. Palaniappan, R. Awang, Intelligent heart disease prediction system using data mining techniques, in: Computer systems and applications, 2008. AICCSA 2008. IEEE/ACS international conference on, IEEE, 2008, pp. 108–115.
[38]
S. Piri, D. Delen, T. Liu, W. Paiva, Development of a new metric to identify rare patterns in association analysis: The case of analyzing diabetes complications, Expert Systems with Applications 94 (2018) 112–125.
[39]
H. Quan, V. Sundararajan, P. Halfon, A. Fong, B. Burnand, J.-C. Luthi, et al., Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data, Medical Care (2005) 1130–1139.
[40]
W. Rathmann, B. Haastert, A.a. Icks, H. Löwel, C. Meisinger, R. Holle, et al., High prevalence of undiagnosed diabetes mellitus in Southern Germany: Target populations for efficient screening. The KORA survey 2000, Diabetologia 46 (2003) 182–189.
[41]
M.T. Sharabiani, P. Aylin, A. Bottle, Systematic review of comorbidity indices for administrative data, Medical Care 50 (2012) 1109–1118.
[42]
G. Taubert, B.R. Winkelmann, T. Schleiffer, W. März, R. Winkler, R. Gök, et al., Prevalence, predictors, and consequences of unrecognized diabetes mellitus in 3266 patients scheduled for coronary angiography, American Heart Journal 145 (2003) 285–291.
[43]
A. Tenenbaum, M. Motro, E.Z. Fisman, V. Boyko, L. Mandelzweig, H. Reicher-Reiss, et al., Clinical impact of borderline and undiagnosed diabetes mellitus in patients with coronary artery disease, The American Journal of Cardiology 86 (2000) 1363–1366.
[44]
G.E. Umpierrez, S.D. Isaacs, N. Bazargan, X. You, L.M. Thaler, A.E. Kitabchi, Hyperglycemia: An independent marker of in-hospital mortality in patients with undiagnosed diabetes, The Journal of Clinical Endocrinology & Metabolism 87 (2002) 978–982.
[45]
United States Department of Health and Human Services, The health consequences of smoking—50 years of progress, 17, 2014, Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health, https://www.ncbi.nlm.nih.gov/pubmed/24455788.
[46]
D.T. Wong, W.A. Knaus, Predicting outcome in critical care: The current status of the APACHE prognostic scoring system, Canadian Journal of Anaesthesia 38 (1991) 374–383.
[47]
World Health Organization (2014). WHO | International Classification of Diseases (ICD). In.
[48]
D. Yach, C. Hawkes, C.L. Gould, K.J. Hofman, The global burden of chronic diseases: Overcoming impediments to prevention and control, JAMA 291 (2004) 2616–2622.
[49]
S. Zoungas, M. Woodward, Q. Li, M.E. Cooper, P. Hamet, S. Harrap, et al., Impact of age, age at diagnosis and duration of diabetes on the risk of macrovascular and microvascular complications and death in type 2 diabetes, Diabetologia 57 (2014) 2465–2474.

Cited By

View all

Index Terms

  1. Chronic disease prediction using administrative data and graph theory: The case of type 2 diabetes
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Expert Systems with Applications: An International Journal
        Expert Systems with Applications: An International Journal  Volume 136, Issue C
        Dec 2019
        453 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 December 2019

        Author Tags

        1. Disease prediction
        2. Electronic medical records
        3. Medical information systems
        4. Network theory
        5. Prediction theory
        6. Type 2 diabetes

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 31 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media