skip to main content
10.1145/3167132.3167162acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Discovering, selecting and exploiting feature sequence records of study participants for the classification of epidemiological data on hepatic steatosis

Published: 09 April 2018 Publication History

Abstract

In longitudinal epidemiological studies, participants undergo repeated medical examinations and are thus represented by a potentially large number of short examination outcome sequences. Some of those sequences may contain important information in various forms, such as patterns, with respect to the disease under study, while others may be on features of little relevance to the outcome. In this work, we propose a framework for Discovery, Selection and Exploitation (DiSelEx) of longitudinal epidemiological data, aiming to identify informative patterns among these sequences. DiSelEx combines sequence clustering with supervised learning to identify sequence groups that contribute to class separation. Newly derived and old features are evaluated and selected according to their redundancy and informativeness regarding the target variable. The selected feature set is then used to learn a classification model on the study data. We evaluate DiSelEx on cohort participants for the disorder "hepatic steatosis" and report on the impact on predictive performance when using sequential data in comparison to utilizing only the basic classifier.1

References

[1]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD'96: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 226--231.
[2]
Usama M. Fayyad and Keki B. Irani. 1993. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In IJCAI-93: Proceedings of the 13th International Joint Conference on Artificial Intelligence. IJCAI Organization, 1022--1029.
[3]
Mark A. Hall. 2000. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In ICML '00: Proceedings of the 17th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., 359--366.
[4]
Tommy Hielscher, Myra Spiliopoulou, Henry Völzke, and Jens-Peter Kühn. 2014. Mining Longitudinal Epidemiological Data to Understand a Reversible Disorder. In IDA2014: Proceedings of the 13th International Symposium on Intelligent Data Analysis. Springer, 120--130.
[5]
Tommy Hielscher, Myra Spiliopoulou, Henry Völzke, and Jens-Peter Kühn. 2014. Using Participant Similarity for the Classification of Epidemiological Data on Hepatic Steatosis. In CBMS2014: Proceedings of the 27th IEEE Int. Symposium on Computer-Based Medical Systems. IEEE, 1--7.
[6]
T. Hielscher, M. Spiliopoulou, H. Völzke, and J. P. Kühn. 2016. Identifying Relevant Features for a Multi-factorial Disorder with Constraint-Based Subspace Clustering. In CBMS2016: Proceedings of the 29th IEEE Int. Symposium on Computer-Based Medical Systems. IEEE, 207--212.
[7]
Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and Anthony Bagnall. 2014. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28, 4 (July 2014), 851--881.
[8]
Isak Karlsson, Panagiotis Papapetrou, and Henrik Boström. 2016. Generalized random shapelet forests. Data Mining and Knowledge Discovery 30, 5 (Sep 2016), 1053--1085.
[9]
Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In DMKD '03: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. ACM, 2--11.
[10]
Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15, 2 (October 2007), 107--144.
[11]
Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. 2013. Understanding variable importances in forests of randomized trees. In NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems. Curran Associates Inc., 431--439.
[12]
Uli Niemann, Tommy Hielscher, Myra Spiliopoulou, Henry Völzke, and Jens-Peter Kühn. 2015. Can we classify the participants of a longitudinal epidemiological study from their previous evolution?. In CBMS2015: Proceedings of the 28th IEEE Int. Symposium on Computer-Based Medical Systems. IEEE, 121--126.
[13]
Uli Niemann, Henry Völzke, Jens-Peter Kühn, and Myra Spiliopoulou. 2014. Learning and inspecting classification rules from longitudinal epidemiological data to identify predictive features on hepatic steatosis. J. of Expert Systems with Applications 41, 11 (September 2014), 5405--5415.
[14]
Kalia Orphanou, Athena Stassopoulou, and Elpida Keravnou. 2014. Temporal abstraction and temporal Bayesian networks in clinical domains: A survey. Artificial Intelligence in Medicine 60, 3 (March 2014), 133--149.
[15]
M. Pechenizkiy, E. Vasilyeva, I. Žliobaite, A. Tesanovic, and G. Manev. 2010. Heart failure hospitalization prediction in remote patient management systems. In CBMS2010: Proceedings of the 23rd IEEE International Symposium on Computer-Based Medical Systems. IEEE, 44--49.
[16]
Anima Singh, Girish Nadkarni, Omri Gottesman, Stephen B. Ellis, Erwin P. Bottinger, and John V. Guttag. 2015. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J. of Biomedical Informatics 53 (November 2015), 220--228.
[17]
J. Sun, D. Sow, J. Hu, and S. Ebadollahi. 2010. A System for Mining Temporal Physiological Data Streams for Advanced Prognostic Decision Support. In ICDM2010: Proceedings of the 10th IEEE International Conference on Data Mining. IEEE, 1061--1066.
[18]
H. Völzke, D. Alte, ..., R. Biffar, U. John, and W. Hoffmann. 2011. Cohort Profile: the Study of Health In Pomerania. Int. J. of Epidemiology 40, 2 (April 2011), 294--307.
[19]
D. Randall Wilson and Tony R. Martinez. 1997. Improved Heterogeneous Distance Functions. J. of Artificial Intelligence Research 6 (January 1997), 1--34.
[20]
J. Zhao, A. Henriksson, L. Asker, and H. Boström. 2014. Detecting adverse drug events with multiple representations of clinical measurements. In BIBM2014: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 536--543.
[21]
Jing Zhao, Panagiotis Papapetrou, Lars Asker, and Henrik Boström. 2017. Learning from heterogeneous temporal data in electronic health records. J. of Biomedical Informatics 65 (January 2017), 105--119.

Cited By

View all
  • (2019)A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical recordsBMC Medical Informatics and Decision Making10.1186/s12911-018-0717-419:1Online publication date: 10-Jan-2019

Index Terms

  1. Discovering, selecting and exploiting feature sequence records of study participants for the classification of epidemiological data on hepatic steatosis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
      April 2018
      2327 pages
      ISBN:9781450351911
      DOI:10.1145/3167132
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 April 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. classification
      2. epidemiological studies
      3. feature selection
      4. hepatic steatosis
      5. medical data mining
      6. patient similarity
      7. time-series clustering

      Qualifiers

      • Research-article

      Conference

      SAC 2018
      Sponsor:
      SAC 2018: Symposium on Applied Computing
      April 9 - 13, 2018
      Pau, France

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical recordsBMC Medical Informatics and Decision Making10.1186/s12911-018-0717-419:1Online publication date: 10-Jan-2019

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media