Computational Health Informatics in the Big Data Age: A Survey

Published: 14 June 2016 Publication History


The explosive growth and widespread accessibility of digital health data have led to a surge of research activity in the healthcare and data sciences fields. The conventional approaches for health data management have achieved limited success as they are incapable of handling the huge amount of complex data with high volume, high velocity, and high variety. This article presents a comprehensive overview of the existing challenges, techniques, and future directions for computational health informatics in the big data age, with a structured analysis of the historical and state-of-the-art methods. We have summarized the challenges into four Vs (i.e., volume, velocity, variety, and veracity) and proposed a systematic data-processing pipeline for generic big data in health informatics, covering data capturing, storing, sharing, analyzing, searching, and decision support. Specifically, numerous techniques and algorithms in machine learning are categorized and compared. On the basis of this material, we identify and discuss the essential prospects lying ahead for computational health informatics in this big data age.


Cited By

View all

Index Terms

  1. Computational Health Informatics in the Big Data Age: A Survey



    Kalman Balogh

    Global information and communications technology (ICT) resources have been changing lives; fast networks and mobile devices offer "always-on" services, extending the possibilities of personal computers (PCs). Besides social networks and the Internet of Things (IoT), healthcare services-both independently and in relation with the former two-are among the fastest emerging areas, exploiting and challenging the development of ICT. This survey gives an overview on the anticipated and expected progress of healthcare during this decade; for each processing phase of health data, it presents an extensive list of methods and systems existing in 2014. According to the forecasts quoted by the authors, the amount of healthcare data is growing exponentially; according to IDC, in 2020, it will be more than an order of magnitude (ten times) of the amount of 2014. Beyond size, new requirements should be fulfilled: handling of heterogeneous data types, executing distributed transactions, enabling global access but preserving security and confidentiality, improving and spreading global standards and interchanging data among proprietary systems, the online analysis of data, dedicated decision support systems both for medical practitioners and clinical remote expert teams, microbiology and genetic data in more personalized medicine, integrating personal diagnostic sensors, mobile patient-advisor services in e-health, and archiving requirements and reusability of longitudinal data. These requirements cannot be fulfilled by traditional relational databases alone; the handling of big data stored in hybrid clouds expands the capabilities of those with working but still immature technologies. At the start, these tools were completely independent of the traditional ones. They handle diverse kinds of new unstructured data types (including biometric data, texts about patient care, professional textbooks and articles, and so on) distributed to file servers of data centers worldwide, but do not support transactions and sophisticated queries. Most of the solutions introduced in the survey are based on these NoSQL systems. In some cases, the method and tool chosen for meeting the needs of a special medical area are rather ad hoc. The authors mention that the product repertoire of both the new Internet giants (Amazon, Google, Facebook) and the traditional vendors (IBM, Oracle/Teradata/Sun/Siebel, Microsoft, Dell/EMC, SAP) converge: they support heterogeneous systems with interfaces to NoSQL, traditional SQL, and NewSQL tools and products in hybrid clouds. These technologies and major application areas-including healthcare, microbiology, and genetics-are introduced for instance in Chen et al. [1]. In my opinion, the main value of the survey is its introduction to a broad range of methods and developed tools applied in nontraditional healthcare and microbiology. I propose this overview for those ICT professionals who are developing applications in these fields. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.


    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 49, Issue 1
    March 2017
    705 pages
    • Editor:
    • Sartaj Sahni
    Issue’s Table of Contents
    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2016
    Accepted: 01 March 2016
    Revised: 01 December 2015
    Received: 01 May 2015
    Published in CSUR Volume 49, Issue 1


    Author Tags

    1. 4V challenges
    2. Big data analytics
    3. clinical decision support
    4. computational health informatics
    5. data mining
    6. machine learning
    7. survey


