skip to main content
10.1145/3477495.3531812acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search

Published: 07 July 2022 Publication History

Abstract

When searching the web for answers to health questions, people can make incorrect decisions that have a negative effect on their lives if the search results contain misinformation. To reduce health misinformation in search results, we need to be able to detect documents with correct answers and promote them over documents containing misinformation. Determining the correct answer has been a difficult hurdle to overcome for participants in the TREC Health Misinformation Track. In the 2021 track, automatic runs were not allowed to use the known answer to a topic's health question, and as a result, the top automatic run had a compatibility-difference score of 0.043 while the top manual run, which used the known answer, had a score of 0.259. The compatibility-difference measures the ability of methods to rank correct and credible documents before incorrect and non-credible documents. By using an existing set of health questions and their known answers, we show it is possible to learn which web hosts are trustworthy, from which we can predict the correct answers to the 2021 health questions with an accuracy of 76%. Using our predicted answers, we can promote documents that we predict contain this answer and achieve a compatibility-difference score of 0.129, which is a three-fold increase in performance over the best previous automatic method.

Supplementary Material

MP4 File (SIGIR22-sp1493.mp4)
This is the pre-recorded presentation video of the short paper "Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search". In this video, we will go through some major parts of the paper, including background, methods, experiment, and some key results. To better understand our work, please refer to our short paper for more details.

References

[1]
Mustafa Abualsaud, Christina Lioma, Maria Maistro, Mark D. Smucker, Guido, and Zuccon. 2020. Overview of the TREC 2019 Decision Track. In TREC.
[2]
Mustafa Abualsaud and Mark D Smucker. 2019. Exposure and order effects of misinformation on health search decisions. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Rome.
[3]
Leif Azzopardi. 2021. Cognitive Biases in Search: A Review and Reflection of Cognitive Biases in Information Retrieval. ACM, 27--37. https://doi.org/10.1145/ 3406522.3446023
[4]
Giannis Bekoulis, Christina Papagiannopoulou, and Nikos Deligiannis. 2021. A review on fact extraction and verification. ACM Computing Surveys (CSUR) 55, 1 (2021), 1--35.
[5]
Charles LA Clarke, Mark D Smucker, and Maria Maistro. 2021. Overview of the TREC 2021 Health Misinformation Track. In TREC.
[6]
Charles LA Clarke, Alexandra Vtyurina, and Mark D Smucker. 2021. Assessing Top-Preferences. ACM Transactions on Information Systems (TOIS) 39, 3 (2021), 1--21.
[7]
Dina Demner-Fushman, Yassine Mrabet, and Asma Ben Abacha. 2020. Consumer health information and question answering: helping consumers find answers to their health-related information needs. Journal of the American Medical Informatics Association 27, 2 (2020), 194--201.
[8]
Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-based trust: estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment 8, 9 (2015), 938--949.
[9]
Tim Draws, Nava Tintarev, Ujwal Gadiraju, Alessandro Bozzon, and Benjamin Timmermans. 2021. This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affect User Attitudes on Debated Topics. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 295--305.
[10]
Robert Epstein, Ronald E Robertson, David Lazer, and Christo Wilson. 2017. Suppressing the search engine manipulation effect (SEME). Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1--22.
[11]
Amira Ghenai, Mark D. Smucker, and Charles L.A. Clarke. 2020. A Think-Aloud Study to Understand Factors Affecting Online Health Search. ACM, 273--282. https://doi.org/10.1145/3343413.3377961
[12]
Anat Hashavit, Hongning Wang, Raz Lin, Tamar Stern, and Sarit Kraus. 2021. Understanding and Mitigating Bias in Online Health Search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 265--274.
[13]
Annie Y.S. Lau and Enrico W. Coiera. 2007. Do People Experience Cognitive Biases while Searching for Information? Journal of the American Medical Informatics Association 14, 5 (09 2007), 599--608. https://doi.org/10.1197/jamia. M2411 arXiv:https://academic.oup.com/jamia/article-pdf/14/5/599/2139239/14- 5--599.pdf
[14]
Annie Y.S. Lau and Enrico W. Coiera. 2009. Can Cognitive Biases during Consumer Health Information Searches Be Reduced to Improve Decision Making? Journal of the American Medical Informatics Association 16, 1 (1 2009), 54--65. https://doi.org/10.1197/jamia.M2557 arXiv:https://academic.oup.com/jamia/article-pdf/16/1/54/2572282/16--1--54.pdf
[15]
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. Association for Computing Machinery, New York, NY, USA, 2356--2362. https: //doi.org/10.1145/3404835.3463238
[16]
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 708--718. https://doi.org/10.18653/v1/2020.findings-emnlp.63
[17]
Frances A Pogacar, Amira Ghenai, Mark D Smucker, and Charles LA Clarke. 2017. The positive and negative influence of search results on people's decisions about the efficacy of medical treatments. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 209--216.
[18]
Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum. 2017. Where the truth lies: Explaining the credibility of emerging claims on the web and social media. In Proceedings of the 26th International Conference on World Wide Web Companion. 1003--1012.
[19]
Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum. 2018. CredEye: A credibility lens for analyzing and explaining misinformation. In Companion Proceedings of the The Web Conference 2018. 155--158.
[20]
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, and Jimmy Lin. 2021. Vera: Prediction techniques for reducing harmful misinformation in consumer health search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2066--2070.
[21]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv e-prints (2019). arXiv:1910.10683
[22]
Qiurong Song and Jiepu Jiang. 2022. How Misinformation Density Affects Health Information Search. In The World Wide Web Conference (Virtual Event, Lyon, France) (WWW '22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3485447.3512141
[23]
Ryen W. White and Ahmed Hassan. 2014. Content Bias in Online Health Search. ACM Trans. Web 8, 4, Article 25 (nov 2014), 33 pages. https://doi.org/10.1145/ 2663355
[24]
Ryen W White and Eric Horvitz. 2015. Belief dynamics and biases in web search. ACM Transactions on Information Systems (TOIS) 33, 4 (2015), 1--46.

Cited By

View all
  • (2024)Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and ValidationJMIR AI10.2196/426303(e42630)Online publication date: 2-May-2024
  • (2024)A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to ExpertiseInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00483-y17:1Online publication date: 15-Apr-2024

Index Terms

  1. Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2022
    3569 pages
    ISBN:9781450387323
    DOI:10.1145/3477495
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. health misinformation
    2. stance detection
    3. web search

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    SIGIR '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and ValidationJMIR AI10.2196/426303(e42630)Online publication date: 2-May-2024
    • (2024)A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to ExpertiseInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00483-y17:1Online publication date: 15-Apr-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media