skip to main content
10.1145/3462203.3475898acmconferencesArticle/Chapter ViewAbstractPublication PagesgooditConference Proceedingsconference-collections
research-article

Health Misinformation Detection in Web Content: A Structural-, Content-based, and Context-aware Approach based on Web2Vec

Published: 09 September 2021 Publication History

Abstract

In recent years, we have witnessed the proliferation of large amounts of online content generated directly by users with virtually no form of external control, leading to the possible spread of misinformation. The search for effective solutions to this problem is still ongoing, and covers different areas of application, from opinion spam to fake news detection. A more recently investigated scenario, despite the serious risks that incurring disinformation could entail, is that of the online dissemination of health information.
Early approaches in this area focused primarily on user-based studies applied to Web page content. More recently, automated approaches have been developed for both Web pages and social media content, particularly with the advent of the COVID-19 pandemic. These approaches are primarily based on handcrafted features extracted from online content in association with Machine Learning. In this scenario, we focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages.
Therefore, this work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.

References

[1]
Majed M. Al-Jefri et al. Using machine learning for automatic identification of evidence-based health information on the Web. In ACM Int. Conf. Proceeding Series, volume Part F1286, pages 167--174, 2017.
[2]
Dzmitry Bahdanau et al. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[3]
Rakesh Bal et al. Analysing the extent of misinformation in cancer related tweets. In Proceedings of the International AAAI Conference on Web and Social Media, volume 14, pages 924--928, 2020.
[4]
Colin R Blyth et al. Binomial confidence intervals. Journal of the American Statistical Association, 78(381):108--116, 1983.
[5]
Célia Boyer and Ljiljana Dolamic. Automated detection of HONcode website conformity compared to manual detection: An evaluation. J. of Medical Internet Research, 17(6):e135, 2015.
[6]
Lars Buitinck et al. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pages 108--122, 2013.
[7]
Nitesh V Chawla et al. SMOTE: Synthetic Minority Over-ampling TEchnique. J. of Artificial Intelligence Research, 16:321--357, 2002.
[8]
Wen-Ying Sylvia Chou et al. Addressing health-related misinformation on social media. JAMA, 320(23):2417--2418, 2018.
[9]
Limeng Cui et al. DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation. Proceedings of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 492--502, 2020.
[10]
Arkin Dharawat et al. Drink bleach or do what now? Covid-HeRA: A dataset for risk-informed health decision making in the presence of COVID-19 misinformation. arXiv preprint arXiv:2010.08743, 2020.
[11]
Nicola Diviani et al. Exploring the role of health literacy in the evaluation of online health information: insights from a mixed-methods study. Patient education and counseling, 99(6):1017--1025, 2016.
[12]
Gunther Eysenbach et al. From intermediation to disintermediation and apomediation: new models for consumers to access and assess the credibility of health information in the age of web 2.0. In Medinfo 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics. IOS Press, 2007.
[13]
Yang Fan et al. Neural feedback text clustering with BiLSTM-CNN-Kmeans. IEEE Access, 6:57460--57469, 2018.
[14]
Jian Feng et al. Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning. IEEE Access, 8, 2020.
[15]
Marcos Fernández-Pichel et al. Reliability prediction for health-related content: A replicability study. In Proceedings of ECIR 2021, Lucca, Italy, 2021.
[16]
Sherry Girgis et al. Deep learning algorithms for detecting fake news in online text. In Proceedings of ICCES 2018, pages 93-97. IEEE, 2018.
[17]
Anna Glazkova et al. g2tmn at Constraint@ AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection. arXiv preprint arXiv:2012.11967, 2020.
[18]
Lorraine Goeuriot et al. Overview of the CLEF eHealth Evaluation Lab 2020. In Int. Conf. of the Cross-Language Evaluation Forum for European Languages, pages 255-271. Springer, 2020.
[19]
Hema Karande et al. Stance detection with bert embeddings for credibility analysis of information on social media. PeerJ Computer Science, 7:e467, 2021.
[20]
Yeolib Kim. Trust in health information websites: A systematic literature review on the antecedents of trust. Health Informatics Journal, 22(2):355--369, 2016.
[21]
Laura Kinkead et al. Autodiscern: rating the quality of online health information with hierarchical encoder attention-based neural networks. BMC Medical Informatics and Decision Making, 20(1):1--13, 2020.
[22]
Jinhyuk Lee et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234--1240, 2020.
[23]
Pooja Malhotra et al. Breast cancer knowledge on line portal: an intelligent decision support system perspective. In Australasian Conf. on Information Systems 2003, pages 1-11. Edith Cowan University, 2003.
[24]
Christine Marton. How women with mental health conditions evaluate the quality of information on mental health web sites: a qualitative approach. J. of Hospital Librarianship, 10(3):235--250, 2010.
[25]
Corine S Meppelink et al. Reliable or not? an automated classification of webpages about early childhood vaccination using supervised machine learning. Patient Education and Counseling, 2020.
[26]
Miriam J Metzger et al. Credibility for the 21st century: Integrating perspectives on source, message, and media credibility in the contemporary media environment. Annals of the Int. Communication Association, 27(1):293--335, 2003.
[27]
Xing Pan et al. A review of cognitive models in human reliability analysis. Quality and Reliability Engineering Int., 33(7):1299--1316, 2017.
[28]
Nidhi A Patel and Rakesh Patel. A survey on fake review detection using machine learning techniques. In 4th Int. Conf. on Computing Communication and Automation (ICCCA), pages 1-6. IEEE, 2018.
[29]
Fay Cobb Payton et al. Online HIV prevention information. Internet Research, 2014.
[30]
Kashyap Popat et al. Credibility assessment of textual claims on the web. In Proceedings of the 25th ACM Int. on Conf. on Information and Knowledge Management, pages 2173--2178, 2016.
[31]
Hamman Samuel and Osmar Zaïane. Medfact: Towards improving veracity of medical information in social media using applied machine learning. In Lecture Notes in Computer Science, volume 10832, pages 108--120, 2018.
[32]
Laura Sbaffi and Jennifer Rowley. Trust and credibility in web-based health information: a review and agenda for future research. J. of Medical Internet Research, 19(6):e218, 2017.
[33]
Arabella Scantlebury et al. Experiences, practices and barriers to accessing health information: A qualitative study. Int. J. of Medical Informatics, 103:103--108, 2017.
[34]
Julia Schwarz and Meredith Morris. Augmenting web pages and search results to support credibility assessment. In Proceedings of the SIGCHI Conf. on Human Factors in Computing Systems, pages 1245--1254, 2011.
[35]
William M Silberg et al. Assessing, controlling, and assuring the quality of medical information on the internet: Caveant lector et viewor---let the reader and viewer beware. JAMA, 277(15):1244--1245, 1997.
[36]
Parikshit Sondhi et al. Reliability prediction of webpages in the medical domain. In Proceedings of ECIR 2012, pages 219-231. Springer, 2012.
[37]
Shijie others Song. The role of health literacy on credibility judgment of online health misinformation. In 2019 IEEE Int. Conf. on Healthcare Informatics (ICHI), pages 1-3. IEEE, 2019.
[38]
Kristine Sørensen et al. Health literacy in europe: comparative results of the european health literacy survey (hls-eu). European J. of Public Health, 25(6):1053--1058, 2015.
[39]
Hanna Suominen et al. Overview of the CLEF eHealth Evaluation Lab 2018. In Int. Conf. of the Cross-Language Evaluation Forum for European Languages, pages 286-301. Springer, 2018.
[40]
Marco Viviani and Gabriella Pasi. Credibility in social media: opinions, news, and health information---a survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(5):e1209, 2017.
[41]
Peter Williams et al. Health information on the internet: a qualitative study of nhs direct online users. In Aslib proceedings. MCB UP Ltd, 2003.
[42]
Jue Xie. Sustaining quality assessment processes in user-centred health information portals. AMCIS 2009 Proceedings, page 189, 2009.
[43]
Jue Xie and Frada Burstein. Using machine learning to support resource quality assessment: An adaptive attribute-based approach for health information portals. Lecture Notes in Computer Science, 6637:526--537, 2011.
[44]
Xinyi Zhou and Reza Zafarani. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys, 53(5):1--40, 2020.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GoodIT '21: Proceedings of the Conference on Information Technology for Social Good
September 2021
345 pages
ISBN:9781450384780
DOI:10.1145/3462203
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Credibility
  2. Deep Learning
  3. Health Misinformation
  4. Machine Learning
  5. Social Web

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

GoodIT '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media