Design and Development of a Big Data Platform for Disease Burden Based on the Spark Engine
Abstract
Objective. This study attempts to build a big data platform for disease burden that can realize the deep coupling of artificial intelligence and public health. This is a highly open and shared intelligent platform, including big data collection, analysis, and result visualization. Methods. Based on data mining theory and technology, the current situation of multisource data on disease burden was analyzed. Putting forward the disease burden big data management model, functional modules, and technical framework, Kafka technology is used to optimize the transmission efficiency of the underlying data. This will be an efficient and highly scalable data analysis platform through embedding embedded Sparkmlib in the Hadoop ecosystem. Results. With the concept of “Internet + medical integration,” the overall architecture design of the big data platform for disease burden management was proposed based on the Spark engine and Python language. The main system composition and application scenarios are given at four levels: multisource data collection, data processing, data analysis, and the application layer, according to application scenarios and use requirements. Conclusion. The big data platform of disease burden management helps to promote the multisource convergence of disease burden data and provides a new path for the standardized paradigm of disease burden measurement. Provide methods and ideas for the deep integration of medical big data and the formation of a broader standard paradigm.
References
[1]
Z. Zhao and Q. Hu, “The application of a computer monitoring system using IoT technology,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–11, 2022.
[2]
G. H. Kim, C. M. Jun, H. C. Jung, and J. Ho Yoon, “Providing service model based on concept and requirements of spatial big data,” Journal of Korean Society for Geospatial Information System, vol. 24, no. 4, pp. 89–96, 2016.
[3]
J. Wang, C. Zeng, Z. Wang, and K. Jiang, “An improved smart key frame extraction algorithm for vehicle target recognition,” Computers & Electrical Engineering, vol. 97, 2022.
[4]
C. Li, C. Liao, X. Meng, H. Chen, W. Chen, B. Wei, and P. Zhu, “Effective analysis of inpatient satisfaction: the random forest algorithm,” Patient Preference and Adherence, vol. 15, pp. 691–703, 2021.
[5]
R. Martínez-Castaño, J. C. Pichel, and D. E. Losada, “A big data platform for real time analysis of signs of depression in social media,” International Journal of Environmental Research and Public Health, vol. 17, no. 13, p. 4752, 2020.
[6]
S. Asadianfam, M. Shamsi, and A. Rasouli Kenari, “Big data platform of traffic violation detection system: identifying the risky behaviors of vehicle drivers,” Multimedia Tools and Applications, vol. 79, no. 33-34, pp. 24645–24684, 2020.
[7]
A. A. Munshi and A. Alhindi, “Big data platform for educational analytics,” IEEE Access, vol. 9, pp. 52883–52890, 2021.
[8]
A. A. Munshi and Y. A. R. I. Mohamed, “Data Lake lambda architecture for smart grids big data analytics,” IEEE Access, vol. 6, pp. 40463–40471, 2018.
[9]
P. Tampakis, E. Chondrodima, A. Tritsarolis, A. Pikrakis, Y. Theodoridis, K. Pristouris, H. Nakos, P. Kalampokis, and T. Dalamagas, “i4sea: a big data platform for sea area monitoring and analysis of fishing vessels activity,” Geo-Spatial Information Science, vol. 25, no. 2, pp. 132–154, 2022.
[10]
C. Xu, X. Du, Z. Yan, and X. Fan, “ScienceEarth: a big data platform for remote sensing data processing,” Remote Sensing, vol. 12, no. 4, p. 607, 2020.
[11]
P. E. Bourne, V. Bonazzi, M. Dunn, E. D. Green, M. Guyer, G. Komatsoulis, J. Larkin, and B. Russell, “The NIH big data to knowledge (BD2K) initiative,” Journal of the American Medical Informatics Association, vol. 22, no. 6, p. 1114, 2015.
[12]
P. Vicini, O. Fields, E. Lai, E. D. Litwack, A.-M. Martin, T. M. Morgan, M. A. Pacanowski, M. Papaluca, O. D. Perez, M. S. Ringel, M. Robson, H. Sakul, J. Vockley, T. Zaks, M. Dolsten, and M. Søgaard, “Precision medicine in the age of big data: the present and future role of large-scale unbiased sequencing in drug discovery and development,” Clinical Pharmacology & Therapeutics, vol. 99, no. 2, pp. 198–207, 2016.
[13]
A. G. Vaithinathan and V. Asokan, “Public health and precision medicine share a goal,” Journal of Evidence-based Medicine, vol. 10, no. 2, pp. 76–80, 2017.
[14]
K. Tomczak, P. Czerwińska, and M. Wiznerowicz, “The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge,” Contemporary Oncology, vol. 19, no. 1, pp. A68–A77, 2015.
[15]
G. S. Ginsburg and K. A. Phillips, “Precision medicine: from science to value,” Health Affairs, vol. 37, no. 5, pp. 694–701, 2018.
[16]
H. S. Cha, J. M. Jung, S. Y. Shin, Y. M. Jang, P. Park, J. W. Lee, S. H. Chung, and K. S Choi, “The korea cancer big data platform (K-cbp) for cancer research,” International Journal of Environmental Research and Public Health, vol. 16, no. 13, p. 2290, 2019.
[17]
H. Khazaei, C. Mcgreger, and M. Eklund, “Toward a Big Data Healthcare Analytics System: A Mathematical Modeling Perspective,” in Proceedings of the 2014 IEEE World Congress on Services, IEEE, Anchorage, AK, USA, 27 June 2014 - 02 July 2014.
[18]
[21]
W. Lie, B. Jiang, and W. Zhao, “Obstetric Imaging Diagnostic Platform Based on Cloud Computing Technology Under the Background of Smart Medical Big Data and Deep Learning,” IEEE Access, vol. 8, pp. 78265–78278, 2020.
[22]
Y. Lu, W. Huang, L. Wang, F. Song, Y. Peng, and J. Peng, “Data-enabled Digestive Medicine: A New Big Data Analytics Platform,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 18, no. 3, pp. 922–931, 2019.
[23]
C. Jo, “Cost-of-illness studies: concepts, scopes, and methods,” Clinical and Molecular Hepatology, vol. 20, no. 4, p. 327, 2014.
[24]
D. Brecht, “European burden of disease network: strengthening the collaboration.[J],” The European Journal of Public Health, vol. 30, no. 1, 2020.
[25]
GBD 2019 Risk Factors Collaborators, “Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019,” Lancet (London, England), vol. 396, pp. 1223–1249, 2020.
[26]
H. Aoki, T. Kitano, and D. Kitagawa, “Disease burden of congenital cytomegalovirus infection in Japan,” Journal of Infection and Chemotherapy, vol. 27, no. 2, pp. 161–164, 2021.
[27]
R. Qiu, M. Hadzikadic, S. Yu, and L. Yao, “Estimating disease burden using Internet data,” Health Informatics Journal, vol. 25, no. 4, pp. 1863–1877, 2019.
[28]
M. P. Jacob, D Thomas Js, R. Bunch Dustin, C. Andreas, N. Price, R. Kris, J. Torre Charles, B. William, L. Hsiao Allen, M. Krumholz Harlan, and L. Schulz Wade, “Health care and precision medicine research: analysis of a scalable data science platform,” Journal of Medical Internet Research, vol. 21, no. 4, 2019.
[29]
J. Neto, A. M. Moreira, and G. Vargas-Solar, “TRANSMUT-SPARK: Transformation Mutation for Apache Spark,” Software Engineering, 2021.
[30]
V. S. Sharma, A. Afthanorhan, N. C. Barwar, S. Singh, and H. Malik, “A dynamic repository approach for small file management with fast access time on Hadoop cluster: hash based extended Hadoop archive,” IEEE Access, vol. 10, pp. 36856–36867, 2022.
[31]
X. Wen and J. Hu, “SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming[J],” The Journal of Supercomputing, vol. 76, no. 10, 2020.
[32]
G. Bello-Orgaz, J. J. Jung, and D. Camacho, “Social big data: Recent achievements and new challenges,” Information Fusion, vol. 28, pp. 45–59, 2016.
[33]
X. Cai, F. Li, P. Li, L. Ju, and Z. Jia, “SLA-aware energy-efficient scheduling scheme for Hadoop YARN,” The Journal of Supercomputing, vol. 73, no. 8, pp. 3526–3546, 2017.
[34]
B Dong, Q. Zheng, F. Tian, K.-M. Chao, N. Godwin, T. Ma, and H. Xu, “Performance models and dynamic characteristics analysis for HDFS write and read operations: a systematic view,” Journal of Systems and Software, vol. 93, pp. 132–151, 2014.
[35]
N. Papadakis, P. Kefalas, and M. Stilianakakis, “A tool for access to relational databases in natural language,” Expert Systems with Applications, vol. 38, no. 6, pp. 7894–7900, 2011.
[36]
A. S. Alammary, “Arabic questions classification using modified TF-IDF,” IEEE Access, vol. 9, pp. 95109–95122, 2021.
[37]
S. Mai, B. Mahmoud, E. G Sally, A. Reham, and E Ali, “An optimized FP-growth algorithm for discovery of association rules,” The Journal of Supercomputing, vol. 78, no. 4, 2021.
[38]
K. Li, S. Martin, C. R. Rojas, S. Chatterjee, and M. Jansson, “Alternating Strategies with Internal ADMM for Low-Rank Matrix reconstruction,” Signal Processing, vol. 121, pp. 153–159, 2016.
[39]
D. Yang, G. Li, and G. Cheng, “On the efficiency of chaos optimization algorithms for global optimization,” Chaos, Solitons & Fractals, vol. 34, no. 4, pp. 1366–1375, 2007.
[40]
N. Bhushana Samyuel and B. A. Shimray, “Securing IoT Device Communication against Network Flow Attacks with Recursive Internetworking Architecture (RINA),” ICT Express, vol. 7, no. 1, pp. 110–114, 2020.
[41]
M. S. Wiewiórka, D. P. Wysakowicz, M. J. Okoniewski, and T. Gambin, “Benchmarking distributed data warehouse solutions for storing genomic variant information,” Database, vol. 2017, 2017.
[42]
A. Boron, “Front-end circuit for energetic signal data acquisition,” Przeglad Elektrotechniczny, vol. 845, pp. 119–121, 2008.
Index Terms
- Design and Development of a Big Data Platform for Disease Burden Based on the Spark Engine
Index terms have been assigned to the content through auto-classification.
Recommendations
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Comments
Information & Contributors
Information
Published In
Copyright © 2023 Chengcheng Li et al.
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Publisher
Hindawi Limited
London, United Kingdom
Publication History
Published: 01 January 2023
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025