skip to main content
research-article

Application of Textual Representation Methods for Clinical Numerical Data in Early Sepsis Diagnosis

Published: 01 September 2024 Publication History

Abstract

Sepsis is a severe infectious disease with high incidence and mortality rates worldwide. Early diagnosis of sepsis in newly admitted intensive care unit patients is crucial to reduce mortality and improve patient outcomes. The manual diagnostic methods heavily rely on subjective clinical experience, while traditional machine learning methods require time-consuming feature engineering and the performance is limited by the knowledge acquired from scarce datasets. Therefore, to address the aforementioned issues, this study proposes a novel textual representation method for clinical numerical data, leveraging pre-trained language models from the field of natural language processing for sepsis prediction. Specifically, this study innovatively transforms structured clinical numerical data of patients into unstructured textual descriptions. This transformation reframes sepsis prediction into a text classification task, leveraging the rich prior semantic knowledge embedded in pre-trained language models to enhance prediction performance. The proposed method is validated using real ICU clinical data. When employing RoBERTa-base, it achieved an F1 score of 79.03%, which represents an improvement of five percentage points compared with commonly used machine learning classifiers. The experiments confirmed that the proposed method enhances the performance of early sepsis diagnosis and introduces new insights for clinical diagnosis of sepsis.

References

[1]
Agnello, L., Vidali, M., Padoan, A., Lucis, R., Mancini, A., Guerranti, R., Plebani, M., Ciaccio, M. and Carobene, A. (2023). Machine learning algorithms in sepsis, Clinica Chimica Acta 553: 117738.
[2]
Angus, D.C., Linde-Zwirble, W.T., Lidicker, J., Clermont, G., Carcillo, J. and Pinsky, M.R. (2001). Epidemiology of severe sepsis in the united states: analysis of incidence, outcome, and associated costs of care, Critical Care Medicine 29(7): 1303–1310.
[3]
Aşuroğlu, T. and Oğul, H. (2021). A deep learning approach for sepsis monitoring via severity score estimation, Computer Methods and Programs in Biomedicine 198: 105816.
[4]
Bedoya, A.D., Futoma, J., Clement, M.E., Corey, K., Brajer, N., Lin, A., Simons, M.G., Gao, M., Nichols, M., Balu, S., Heller, K., Sendak, M. and O’Brien, C. (2020). Machine learning for early detection of sepsis: an internal and temporal validation study, JAMIA Open 3(2): 252–260.
[5]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. and Amodei, D. (2020). Language models are few-shot learners, Advances in Neural Information Processing Systems 33: 1877–1901.
[6]
Burdick, H., Pino, E., Gabel-Comeau, D., Gu, C., Roberts, J., Le, S., Slote, J., Saber, N., Pellegrini, E., Green-Saxena, A., Hoffman, J. and Das, R. (2020). Validation of a machine learning algorithm for early severe sepsis prediction: A retrospective study predicting severe sepsis up to 48 h in advance using a diverse dataset from 461 US hospitals, BMC Medical Informatics and Decision Making 20(276): 1–10.
[7]
Cabot, J.H. and Ross, E.G. (2023). Evaluating prediction model performance, Surgery 174(3): 723–726.
[8]
Cichosz, P. (2023). Bag of words and embedding text representation methods for medical article classification, International Journal of Applied Mathematics and Computer Science 33(4): 603–621.
[9]
Clark, K., Luong, M.-T., Le, Q.V. and Manning, C.D. (2020). Electra: Pre-training text encoders as discriminators rather than generators, arXiv: 2003.10555.
[10]
Coban, O., Yağanoğlu, M. and Bozkurt, F. (2024). Domain effect investigation for BERT models fine-tuned on different text categorization tasks, Arabian Journal for Science and Engineering 49(3): 3685–3702.
[11]
Deng, H.-F., Sun, M.-W.,Wang, Y., Zeng, J., Yuan, T., Li, T., Li, D.-H., Chen, W., Zhou, P., Wang, Q. and Jiang, H. (2022). Evaluating machine learning models for sepsis prediction: A systematic review of methodologies, Iscience 25(1): 103651.
[12]
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, pp. 4171–4186.
[13]
Di Leo, G. and Sardanelli, F. (2020). Statistical significance: p-value, 0.05 threshold, and applications to radiomics-Reasons for a conservative approach, European Radiology Experimental 4(1): 1–8.
[14]
Dong, B., Wang, Z., Li, Z., Duan, Z., Xu, J., Pan, T., Zhang, R., Liu, N., Li, X., Wang, J., Liu, C., Dong, L., Mao, C., Gao, J. and Wang, J. (2023). Toward a stable and low-resource PLM-based medical diagnostic system via prompt tuning and MoE structure, Scientific Reports 13(1): 12595.
[15]
Du, J.A., Sadr, N. and de Chazal, P. (2019). Automated prediction of sepsis onset using gradient boosted decision trees, 2019 Computing in Cardiology (CinC), Baltimore, USA, p. 1.
[16]
Duncan, C.F., Youngstein, T., Kirrane, M.D. and Lonsdale, D.O. (2021). Diagnostic challenges in sepsis, Current Infectious Disease Reports 23(22): 1–14.
[17]
Evans, L., Rhodes, A., Alhazzani, W., Antonelli, M., Coopersmith, C.M., French, C., Machado, F.R., Mcintyre, L., Ostermann, M., Prescott, H.C., Schorr, C., Simpson, S., Wiersinga, W.J., Alshamsi, F., Angus, D.C., Arabi, Y., Azevedo, L., Beale, R., Beilman, G., Belley-Cote, E., Burry, L., Cecconi, M., Centofanti, J., Coz Y.A., De Waele, J., Dellinger, R.P., Doi, K., Du, B., Estenssoro, E., Ferrer, R., Gomersall, C., Hodgson, C., Hylander M.M., Iwashyna, T., Jacob, S., Kleinpell, R., Klompas, M., Koh, Y., Kumar, A., Kwizera, A., Lobo, S., Masur, H., McGloughlin, S., Mehta, S.,Mehta, Y., Mer, M., Nunnally, M., Oczkowski, S., Osborn, T., Papathanassoglou, E., Perner, A., Puskarich, M., Roberts, J., Schweickert, W., Seckel, M., Sevransky, J., Sprung, C.L., Welte, T., Zimmerman, J. and Levy, M. (2021). Surviving sepsis campaign: International guidelines for management of sepsis and septic shock 2021, Critical Care Medicine 49(11): e1063–e1143.
[18]
Faix, J.D. (2013). Biomarkers of sepsis, Critical Reviews in Clinical Laboratory Sciences 50(1): 23–36.
[19]
Fleuren, L.M., Klausch, T.L., Zwager, C.L., Schoonmade, L.J., Guo, T., Roggeveen, L.F., Swart, E.L., Girbes, A.R., Thoral, P., Ercole, A., Hoogendoorn, M. and Elbers, P.W.G. (2020). Machine learning for the prediction of sepsis: A systematic review and meta-analysis of diagnostic test accuracy, Intensive Care Medicine 46: 383–400.
[20]
Gao, J., Lu, Y., Domingo, I.R., Alaei, K. and Pishgar, M. (2024). Predicting sepsis mortality using machine learning methods, medRxiv: 2024.03.14.24304184.
[21]
García-Gallo, J.E., Fonseca-Ruiz, N., Celi, L. and Duitama-Muñoz, J. (2020). A machine learning-based model for 1-year mortality prediction in patients admitted to an intensive care unit with a diagnosis of sepsis, Medicina Intensiva 44(3): 160–170.
[22]
Gunsolus, I.L., Sweeney, T.E., Liesenfeld, O. and Ledeboer, N.A. (2019). Diagnosing and managing sepsis by probing the host response to infection: Advances, opportunities, and challenges, Journal of Clinical Microbiology 57(7): 10–1128.
[23]
Lelubre, C. and Vincent, J.-L. (2018). Mechanisms and treatment of organ failure in sepsis, Nature Reviews Nephrology 14(7): 417–427.
[24]
Levy, M.M., Fink, M.P., Marshall, J.C., Abraham, E., Angus, D., Cook, D., Cohen, J., Opal, S.M., Vincent, J.-L. and Ramsay, G. (2003). 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference, Critical Care Medicine 31(4): 1250–1256.
[25]
Li, K., Shi, Q., Liu, S., Xie, Y. and Liu, J. (2021). Predicting in-hospital mortality in ICU patients with sepsis using gradient boosting decision tree, Medicine 100(19): e25813.
[26]
Li, X., Zhang, H. and Zhou, X.-H. (2020). Chinese clinical named entity recognition with variant neural structures based on BERT methods, Journal of Biomedical Informatics 107: 103422.
[27]
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach, arXiv: 1907.11692.
[28]
Luo, X., Deng, Z., Yang, B. and Luo, M.Y. (2024). Pre-trained language models in medicine: A survey, Artificial Intelligence in Medicine 154: 102904.
[29]
Perez-Melo, S. and Kibria, B.G. (2020). On some test statistics for testing the regression coefficients in presence of multicollinearity: A simulation study, Stats 3(1): 40–55.
[30]
Rafiei, A., Rezaee, A., Hajati, F., Gheisari, S. and Golzan, M. (2021). SSP: Early prediction of sepsis using fully connected LSTM-CNN model, Computers in Biology and Medicine 128: 104110.
[31]
Rhee, C. and Klompas, M. (2020). Sepsis trends: increasing incidence and decreasing mortality, or changing denominator?, Journal of Thoracic Disease 12(Suppl 1): S89.
[32]
Rubens, M., Saxena, A., Ramamoorthy, V., Das, S., Khera, R., Hong, J., Armaignac, D., Veledar, E., Nasir, K. and Gidel, L. (2020). Increasing sepsis rates in the United States: Results from national inpatient sample, 2005 to 2014, Journal of Intensive Care Medicine 35(9): 858–868.
[33]
Sangeetha, S., Kumar, M.S., K, D., Rajadurai, H., Maheshwari, V. and Dalu, G.T. (2022). An empirical analysis of an optimized pretrained deep learning model for COVID-19 diagnosis, Computational and Mathematical Methods in Medicine 2022(1): 9771212.
[34]
Shankar-Hari, M., Phillips, G.S., Levy, M.L., Seymour, C.W., Liu, V.X., Deutschman, C.S., Angus, D.C., Rubenfeld, G.D. and Singer, M. (2016). Developing a new definition and assessing new clinical criteria for septic shock: For the third international consensus definitions for sepsis and septic shock (sepsis-3), Journal of the American Medical Association 315(8): 775–787.
[35]
Singer, M., Deutschman, C.S., Seymour, C.W., Shankar-Hari, M., Annane, D., Bauer, M., Bellomo, R., Bernard, G.R., Chiche, J.-D. and Coopersmith, C.M. (2016). The third international consensus definitions for sepsis and septic shock (sepsis-3), Journal of the American Medical Association 315(8): 801–810.
[36]
Srinivasu, P.N., Sirisha, U., Sandeep, K., Praveen, S.P., Maguluri, L.P. and Bikku, T. (2024). An interpretable approach with explainable AI for heart stroke prediction, Diagnostics 14(2): 128.
[37]
Stubbs, D.J., Yamamoto, A.K. and Menon, D.K. (2013). Imaging in sepsis-associated encephalopathy—Insights and opportunities, Nature Reviews Neurology 9(10): 551–561.
[38]
Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., Liu, J., Chen, X., Zhao, Y. and Lu, Y. (2021). ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation, arXiv: 2107.02137.
[39]
Sun, Y., Wang, S., Li, Y., Feng, S., Chen, X., Zhang, H., Tian, X., Zhu, D., Tian, H. and Wu, H. (2019). ERNIE: Enhanced representation through knowledge integration, arXiv: 1904.09223.
[40]
Sun, Y., Wang, S., Li, Y., Feng, S., Tian, H., Wu, H. and Wang, H. (2020). ERNIE 2.0: A continual pre-training framework for language understanding, Proceedings of the AAAI Conference on Artificial Intelligence, New York, USA, pp. 8968–8975.
[41]
van der Vegt, A.H., Scott, I.A., Dermawan, K., Schnetler, R.J., Kalke, V.R. and Lane, P.J. (2023). Deployment of machine learning algorithms to predict sepsis: systematic review and application of the salient clinical AI implementation framework, Journal of the American Medical Informatics Association 30(7): 1349–1361.
[42]
van Doorn, W.P., Stassen, P.M., Borggreve, H.F., Schalkwijk, M.J., Stoffers, J., Bekers, O. and Meex, S.J. (2021). A comparison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis, PLoS One 16(1): e0245157.
[43]
Verdonk, F., Blet, A. and Mebazaa, A. (2017). The new sepsis definition: Limitations and contribution to research and diagnosis of sepsis, Current Opinion in Anesthesiology 30(2): 200–204.
[44]
Wang, D., Li, J., Sun, Y., Ding, X., Zhang, X., Liu, S., Han, B., Wang, H., Duan, X. and Sun, T. (2021). A machine learning model for accurate prediction of sepsis in ICU patients, Frontiers in Public Health 9: 754348.
[45]
Wang, Y., Deng, J., Wang, T., Zheng, B., Hu, S., Liu, X. and Meng, H. (2023). Exploiting prompt learning with pre-trained language models for Alzheimer’s disease detection, ICASSP 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes, Greece, pp. 1–5.
[46]
Watkins, R.R., Bonomo, R.A. and Rello, J. (2022). Managing sepsis in the era of precision medicine: Challenges and opportunities, Expert Review of Anti-Infective Therapy 20(6): 871–880.
[47]
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models, Advances in Neural Information Processing Systems 35: 24824–24837.
[48]
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R. and Le, Q.V. (2019). XLNet: Generalized autoregressive pretraining for language understanding, arXiv: 1906.08237.
[49]
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M. and Liu, Q. (2019a). ERNIE: Enhanced language representation with informative entities, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 1441–1451.
[50]
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M. and Liu, Q. (2019b). ERNIE: Enhanced language representation with informative entities. arXiv: 1905.07129.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Applied Mathematics and Computer Science
International Journal of Applied Mathematics and Computer Science  Volume 34, Issue 4
Special issue: Future Perspectives for AI in Complex Health Modelling, Guest editors: Marcin WOŹNIAK, Yogesh KUMAR and Muhammad Fazal IJAZ
Sep 2024
160 pages
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Publisher

Walter de Gruyter & Co.

United States

Publication History

Published: 01 September 2024

Author Tags

  1. sepsis diagnosis
  2. text representation
  3. pre-trained language models
  4. machine learning.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media