research-article

Design and Development of a Big Data Platform for Disease Burden Based on the Spark Engine

Authors:

Shangcheng Zhou Academic Editor:

Amandeep KaurAuthors Info & Claims

Computational Intelligence and Neuroscience, Volume 2023

https://doi.org/10.1155/2023/8963053

Published: 01 January 2023 Publication History

Abstract

Objective. This study attempts to build a big data platform for disease burden that can realize the deep coupling of artificial intelligence and public health. This is a highly open and shared intelligent platform, including big data collection, analysis, and result visualization. Methods. Based on data mining theory and technology, the current situation of multisource data on disease burden was analyzed. Putting forward the disease burden big data management model, functional modules, and technical framework, Kafka technology is used to optimize the transmission efficiency of the underlying data. This will be an efficient and highly scalable data analysis platform through embedding embedded Sparkmlib in the Hadoop ecosystem. Results. With the concept of “Internet + medical integration,” the overall architecture design of the big data platform for disease burden management was proposed based on the Spark engine and Python language. The main system composition and application scenarios are given at four levels: multisource data collection, data processing, data analysis, and the application layer, according to application scenarios and use requirements. Conclusion. The big data platform of disease burden management helps to promote the multisource convergence of disease burden data and provides a new path for the standardized paradigm of disease burden measurement. Provide methods and ideas for the deep integration of medical big data and the formation of a broader standard paradigm.

References

[1]

Z. Zhao and Q. Hu, “The application of a computer monitoring system using IoT technology,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–11, 2022.

Digital Library

[2]

G. H. Kim, C. M. Jun, H. C. Jung, and J. Ho Yoon, “Providing service model based on concept and requirements of spatial big data,” Journal of Korean Society for Geospatial Information System, vol. 24, no. 4, pp. 89–96, 2016.

[3]

J. Wang, C. Zeng, Z. Wang, and K. Jiang, “An improved smart key frame extraction algorithm for vehicle target recognition,” Computers & Electrical Engineering, vol. 97, 2022.

Digital Library

[4]

C. Li, C. Liao, X. Meng, H. Chen, W. Chen, B. Wei, and P. Zhu, “Effective analysis of inpatient satisfaction: the random forest algorithm,” Patient Preference and Adherence, vol. 15, pp. 691–703, 2021.

[5]

R. Martínez-Castaño, J. C. Pichel, and D. E. Losada, “A big data platform for real time analysis of signs of depression in social media,” International Journal of Environmental Research and Public Health, vol. 17, no. 13, p. 4752, 2020.

[6]

S. Asadianfam, M. Shamsi, and A. Rasouli Kenari, “Big data platform of traffic violation detection system: identifying the risky behaviors of vehicle drivers,” Multimedia Tools and Applications, vol. 79, no. 33-34, pp. 24645–24684, 2020.

Digital Library

[7]

A. A. Munshi and A. Alhindi, “Big data platform for educational analytics,” IEEE Access, vol. 9, pp. 52883–52890, 2021.

[8]

A. A. Munshi and Y. A. R. I. Mohamed, “Data Lake lambda architecture for smart grids big data analytics,” IEEE Access, vol. 6, pp. 40463–40471, 2018.

[9]

P. Tampakis, E. Chondrodima, A. Tritsarolis, A. Pikrakis, Y. Theodoridis, K. Pristouris, H. Nakos, P. Kalampokis, and T. Dalamagas, “i4sea: a big data platform for sea area monitoring and analysis of fishing vessels activity,” Geo-Spatial Information Science, vol. 25, no. 2, pp. 132–154, 2022.

[10]

C. Xu, X. Du, Z. Yan, and X. Fan, “ScienceEarth: a big data platform for remote sensing data processing,” Remote Sensing, vol. 12, no. 4, p. 607, 2020.

[11]

P. E. Bourne, V. Bonazzi, M. Dunn, E. D. Green, M. Guyer, G. Komatsoulis, J. Larkin, and B. Russell, “The NIH big data to knowledge (BD2K) initiative,” Journal of the American Medical Informatics Association, vol. 22, no. 6, p. 1114, 2015.

[12]

P. Vicini, O. Fields, E. Lai, E. D. Litwack, A.-M. Martin, T. M. Morgan, M. A. Pacanowski, M. Papaluca, O. D. Perez, M. S. Ringel, M. Robson, H. Sakul, J. Vockley, T. Zaks, M. Dolsten, and M. Søgaard, “Precision medicine in the age of big data: the present and future role of large-scale unbiased sequencing in drug discovery and development,” Clinical Pharmacology & Therapeutics, vol. 99, no. 2, pp. 198–207, 2016.

[13]

A. G. Vaithinathan and V. Asokan, “Public health and precision medicine share a goal,” Journal of Evidence-based Medicine, vol. 10, no. 2, pp. 76–80, 2017.

[14]

K. Tomczak, P. Czerwińska, and M. Wiznerowicz, “The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge,” Contemporary Oncology, vol. 19, no. 1, pp. A68–A77, 2015.

[15]

G. S. Ginsburg and K. A. Phillips, “Precision medicine: from science to value,” Health Affairs, vol. 37, no. 5, pp. 694–701, 2018.

[16]

H. S. Cha, J. M. Jung, S. Y. Shin, Y. M. Jang, P. Park, J. W. Lee, S. H. Chung, and K. S Choi, “The korea cancer big data platform (K-cbp) for cancer research,” International Journal of Environmental Research and Public Health, vol. 16, no. 13, p. 2290, 2019.

[17]

H. Khazaei, C. Mcgreger, and M. Eklund, “Toward a Big Data Healthcare Analytics System: A Mathematical Modeling Perspective,” in Proceedings of the 2014 IEEE World Congress on Services, IEEE, Anchorage, AK, USA, 27 June 2014 - 02 July 2014.

Digital Library

[18]

http://www.gov.cn/zhengce/content/2016-06/24/content_5085091.html.

[19]

http://wjw.ah.gov.cn/public/7001/52074961.html.

[20]

http://www.gov.cn/xinwen/2021-12/28/content_5664872.htm.

[21]

W. Lie, B. Jiang, and W. Zhao, “Obstetric Imaging Diagnostic Platform Based on Cloud Computing Technology Under the Background of Smart Medical Big Data and Deep Learning,” IEEE Access, vol. 8, pp. 78265–78278, 2020.

[22]

Y. Lu, W. Huang, L. Wang, F. Song, Y. Peng, and J. Peng, “Data-enabled Digestive Medicine: A New Big Data Analytics Platform,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 18, no. 3, pp. 922–931, 2019.

Digital Library

[23]

C. Jo, “Cost-of-illness studies: concepts, scopes, and methods,” Clinical and Molecular Hepatology, vol. 20, no. 4, p. 327, 2014.

[24]

D. Brecht, “European burden of disease network: strengthening the collaboration.[J],” The European Journal of Public Health, vol. 30, no. 1, 2020.

[25]

GBD 2019 Risk Factors Collaborators, “Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019,” Lancet (London, England), vol. 396, pp. 1223–1249, 2020.

[26]

H. Aoki, T. Kitano, and D. Kitagawa, “Disease burden of congenital cytomegalovirus infection in Japan,” Journal of Infection and Chemotherapy, vol. 27, no. 2, pp. 161–164, 2021.

[27]

R. Qiu, M. Hadzikadic, S. Yu, and L. Yao, “Estimating disease burden using Internet data,” Health Informatics Journal, vol. 25, no. 4, pp. 1863–1877, 2019.

[28]

M. P. Jacob, D Thomas Js, R. Bunch Dustin, C. Andreas, N. Price, R. Kris, J. Torre Charles, B. William, L. Hsiao Allen, M. Krumholz Harlan, and L. Schulz Wade, “Health care and precision medicine research: analysis of a scalable data science platform,” Journal of Medical Internet Research, vol. 21, no. 4, 2019.

[29]

J. Neto, A. M. Moreira, and G. Vargas-Solar, “TRANSMUT-SPARK: Transformation Mutation for Apache Spark,” Software Engineering, 2021.

[30]

V. S. Sharma, A. Afthanorhan, N. C. Barwar, S. Singh, and H. Malik, “A dynamic repository approach for small file management with fast access time on Hadoop cluster: hash based extended Hadoop archive,” IEEE Access, vol. 10, pp. 36856–36867, 2022.

[31]

X. Wen and J. Hu, “SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming[J],” The Journal of Supercomputing, vol. 76, no. 10, 2020.

[32]

G. Bello-Orgaz, J. J. Jung, and D. Camacho, “Social big data: Recent achievements and new challenges,” Information Fusion, vol. 28, pp. 45–59, 2016.

Digital Library

[33]

X. Cai, F. Li, P. Li, L. Ju, and Z. Jia, “SLA-aware energy-efficient scheduling scheme for Hadoop YARN,” The Journal of Supercomputing, vol. 73, no. 8, pp. 3526–3546, 2017.

Digital Library

[34]

B Dong, Q. Zheng, F. Tian, K.-M. Chao, N. Godwin, T. Ma, and H. Xu, “Performance models and dynamic characteristics analysis for HDFS write and read operations: a systematic view,” Journal of Systems and Software, vol. 93, pp. 132–151, 2014.

[35]

N. Papadakis, P. Kefalas, and M. Stilianakakis, “A tool for access to relational databases in natural language,” Expert Systems with Applications, vol. 38, no. 6, pp. 7894–7900, 2011.

Digital Library

[36]

A. S. Alammary, “Arabic questions classification using modified TF-IDF,” IEEE Access, vol. 9, pp. 95109–95122, 2021.

[37]

S. Mai, B. Mahmoud, E. G Sally, A. Reham, and E Ali, “An optimized FP-growth algorithm for discovery of association rules,” The Journal of Supercomputing, vol. 78, no. 4, 2021.

[38]

K. Li, S. Martin, C. R. Rojas, S. Chatterjee, and M. Jansson, “Alternating Strategies with Internal ADMM for Low-Rank Matrix reconstruction,” Signal Processing, vol. 121, pp. 153–159, 2016.

Digital Library

[39]

D. Yang, G. Li, and G. Cheng, “On the efficiency of chaos optimization algorithms for global optimization,” Chaos, Solitons & Fractals, vol. 34, no. 4, pp. 1366–1375, 2007.

[40]

N. Bhushana Samyuel and B. A. Shimray, “Securing IoT Device Communication against Network Flow Attacks with Recursive Internetworking Architecture (RINA),” ICT Express, vol. 7, no. 1, pp. 110–114, 2020.

[41]

M. S. Wiewiórka, D. P. Wysakowicz, M. J. Okoniewski, and T. Gambin, “Benchmarking distributed data warehouse solutions for storing genomic variant information,” Database, vol. 2017, 2017.

[42]

A. Boron, “Front-end circuit for energetic signal data acquisition,” Przeglad Elektrotechniczny, vol. 845, pp. 119–121, 2008.

Index Terms

Design and Development of a Big Data Platform for Disease Burden Based on the Spark Engine
1. Applied computing
  1. Life and medical sciences
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis
Big Data Processing Using Spark in Cloud

Comments

Information & Contributors

Information

Published In

cover image Computational Intelligence and Neuroscience

Computational Intelligence and Neuroscience Volume 2023, Issue

2023

2916 pages

ISSN:1687-5265

EISSN:1687-5273

Issue’s Table of Contents

Copyright © 2023 Chengcheng Li et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 January 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents