skip to main content
10.1145/3487553.3524672acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

Detection of Infectious Disease Outbreaks in Search Engine Time Series Using Non-Specific Syndromic Surveillance with Effect-Size Filtering

Published: 16 August 2022 Publication History

Abstract

Novel infectious disease outbreaks, including most recently that of the COVID-19 pandemic, could be detected by non-specific syndromic surveillance systems. Such systems, utilizing a variety of data sources ranging from Electronic Health Records to internet data such as aggregated search engine queries, create alerts when unusually high rates of symptom reports occur. This is especially important for the detection of novel diseases, where their manifested symptoms are unknown.
Here we improve upon a set of previously-proposed non-specific syndromic surveillance methods by taking into account both how unusual a preponderance of symptoms is and their effect size.
We demonstrate that our method is as accurate as previously-proposed methods for low dimensional data and show its effectiveness for high-dimensional aggregated data by applying it to aggregated time-series health-related search engine queries. We find that in 2019 the method would have raised alerts related to several disease outbreaks earlier than health authorities did. During the COVID-19 pandemic the system identified the beginning of pandemic waves quickly, through combinations of symptoms which varied from wave to wave.
Thus, the proposed method could be used as a practical tool for decision makers to detect new disease outbreaks using time series derived from search engine data even in the absence of specific information on the diseases of interest and their symptoms.

References

[1]
Eiji Aramaki, Sachiko Maskawa, and Mizuki Morita. 2011. Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Edinburgh, Scotland, UK., 1568–1576. https://aclanthology.org/D11-1145
[2]
Centers for Disease Control and Prevention. 2021. National Notifiable Diseases Surveillance System. https://www.cdc.gov/nndss/index.html. Accessed: 2021-08-22.
[3]
Centers for Disease Control and Prevention. 2021. Outbreak of Lung Injury Associated with the Use of E-Cigarette, or Vaping, Products. https://www.cdc.gov/tobacco/basic_information/e-cigarettes/severe-lung-disease.html. Accessed: 2021-08-22.
[4]
Cohen J. 1988. Statistical power analysis for the social sciences. Hillsdale: Lawrence Erlbaum Associates.
[5]
Sara R. Collins, Munira Z. Gunja, and Gabriella N. Aboulafia. 2019. U.S. Health Insurance Coverage in 2020: A Looming Crisis in Affordability. https://www.commonwealthfund.org/publications/issue-briefs/2020/aug/looming-crisis-health-coverage-2020-biennial. Accessed: 2021-08-19.
[6]
Patrick Copeland, Raquel Romano, Tom Zhang, Greg Hecht, Dan Zigmond, and Christian Stefansen. 2013. Google disease trends: An update. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41763.pdf(2013).
[7]
Michael Edelstein, Anders Wallensten, Inga Zetterqvist, and Anette Hulth. 2014. Detecting the norovirus season in Sweden using search engine data–Meeting the needs of hospital infection control teams. PloS ONE 9, 6 (2014), e100309.
[8]
Gunther Eysenbach. 2006. Infodemiology: Tracking flu-related searches on the web for syndromic surveillance. AMIA 2006 Symposium Proceedings(2006), 244–248.
[9]
Florida Department of Health. 2021. Hepatitis A in Florida. http://www.floridahealth.gov/diseases-and-conditions/vaccine-preventable-disease/hepatitis-a/surveillance-data/. Accessed: 2021-08-22.
[10]
Solweig Gerbier, Olga Yarovaya, Quentin Gicquel, Anne-Laure Millet, Véronique Smaldore, Véronique Pagliaroli, Stefan Darmoni, and Marie-Hélène Metzger. 2011. Evaluation of natural language processing from emergency department computerized medical records for intra-hospital syndromic surveillance. BMC Medical Informatics and Decision Making 11, 50 (2011), 50.
[11]
Google. 2021. COVID-19 Search Trends symptoms dataset. https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/docs/table-search-trends.md. Accessed: 2021-07-01.
[12]
Caroline Guerrisi, Clément Turbelin, Thierry Blanchon, Thomas Hanslik, Isabelle Bonmarin, Daniel Levy-Bruhl, Daniela Perrotta, Daniela Paolotti, Ronald Smallenburg, Carl Koppeschaar, 2016. Participatory syndromic surveillance of influenza in Europe. The Journal of Infectious Diseases 214, suppl_4 (2016), S386–S392.
[13]
Kelly J Henning. 2004. What is syndromic surveillance?Morbidity and Mortality Weekly Report 53(Suppl) (2004), 5–11.
[14]
Hayate Iso, Shoko Wakamiya, and Eiji Aramaki. 2016. Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 76–86. https://aclanthology.org/C16-1008
[15]
Rebecca Katz, Larissa May, Julia Baker, and Elisa Test. 2011. Redefining syndromic surveillance. Journal of Epidemiology and Global Health1 (2011), 21–31.
[16]
Jalayer Khalilzadeh and Asli DA Tasci. 2017. Large sample size, significance level, and the effect size: Solutions to perils of using big data for academic research. Tourism Management 62(2017), 89–96.
[17]
Moritz Kulessa, Eneldo Loza Mencía, and Johannes Fürnkranz. 2021. A Unifying Framework and Comparative Evaluation of Statistical and Machine Learning Approaches to Non-Specific Syndromic Surveillance. Computers 10, 3 (2021), 32.
[18]
Vasileios Lampos, Maimuna S. Majumder, Elad Yom-Tov, Michael Edelstein, Simon Moura, Yohhei Hamada, Molebogeng X. Rangaka, Rachel A. McKendry, and Ingemar J. Cox. 2021. Tracking COVID-19 using online search. NPJ Digital Medicine 4, 17 (2021).
[19]
V. Lampos, A. C. Miller, S. Crossan, and C. Stefansen. 2015. Advances in nowcasting influenza-like illness rates using search query logs. Scientific Reports 5(2015), 12760.
[20]
Ross Lazarus, Ken P Kleinman, Inna Dashevsky, Alfred DeMaria, and Richard Platt. 2001. Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): The example of lower respiratory infection. BMC Public Health 1, 9 (2001), 9.
[21]
Nicole P. Lindsey, Stacey W. Martin, J. Erin Staples, and Marc Fischer. 2020. Notes from the field: Multistate outbreak of eastern equine encephalitis virus—United States, 2019. Morbidity and Mortality Weekly Report69 (2020), 50–51.
[22]
Microsoft. 2020. Bing search dataset for Coronavirus Intent. https://github.com/microsoft/BingCoronavirusQuerySet. Accessed: 2021-08-22.
[23]
New York State Department of Health. 2013. Costs to develop and maintain a state biosurveillance system: The New York example. https://www.cidrap.umn.edu/practice/costs-develop-and-maintain-state-biosurveillance-system-new-york-example. Accessed: 2021-08-22.
[24]
Eyal Oren, Justin Frere, Eran Yom-Tov, and Elad Yom-Tov. 2018. Respiratory syncytial virus tracking using internet search engine data. BMC public health 18, 1 (2018), 445.
[25]
Nikolaos Pandis. 2012. The effect size. American journal of orthodontics and dentofacial orthopedics 142, 5(2012), 739–740.
[26]
Avinash Patwardhan and Robert Bilkovski. 2012. Comparison: Flu prescription sales data from a retail pharmacy in the US with Google Flu trends and US ILINet (CDC) data as flu activity indicator. (2012).
[27]
Daniela Perrotta, Antonino Bella, Caterina Rizzo, and Daniela Paolotti. 2017. Participatory online surveillance as a supplementary tool to sentinel doctors for influenza-like illness surveillance in Italy. PLoS ONE 12, 1 (2017), e0169801.
[28]
Pew Research Center, Washington, D.C. 2021. Internet/broadband fact sheet. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/. Accessed: 2021-08-22.
[29]
Robert Rosenthal, Harris Cooper, Larry Hedges, 1994. Parametric measures of effect size. The handbook of research synthesis 621, 2 (1994), 231–244.
[30]
Adam Sadilek, Stephanie Caty, Lauren DiPrete, Raed Mansour, Tom Schenk, Mark Bergtholdt, Ashish Jha, Prem Ramaswami, and Evgeniy Gabrilovich. 2018. Machine-learned epidemiology: real-time detection of foodborne illness at scale. NPJ digital medicine 1, 1 (2018), 1–7.
[31]
Gail M Sullivan and Richard Feinn. 2012. Using effect size—or why the P value is not enough. Journal of graduate medical education 4, 3 (2012), 279–282.
[32]
Moritz Wagner, Vasileios Lampos, Ingemar J Cox, and Richard Pebody. 2018. The added value of online user-generated content in traditional methods for influenza surveillance. Sci. Rep. 8, 1 (2018), 1–9.
[33]
Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2005. What’s strange about recent events (WSARE): an algorithm for the early detection of disease outbreaks. The Journal of Machine Learning Research 6 (2005), 1961–1998.
[34]
S. Yang, M. Santillana, and S. C. Kou. 2015. Accurate Estimation of Influenza Epidemics using Google Search Data via ARGO. PNAS 112, 47 (2015), 14473–14478.
[35]
Elad Yom-Tov. 2016. Crowdsourced health: How what you do on the Internet will improve medicine. MIT Press.

Cited By

View all

Index Terms

  1. Detection of Infectious Disease Outbreaks in Search Engine Time Series Using Non-Specific Syndromic Surveillance with Effect-Size Filtering

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '22: Companion Proceedings of the Web Conference 2022
          April 2022
          1338 pages
          ISBN:9781450391306
          DOI:10.1145/3487553
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 16 August 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Syndromic surveillance
          2. data science
          3. outbreak detection
          4. search engine data

          Qualifiers

          • Short-paper
          • Research
          • Refereed limited

          Conference

          WWW '22
          Sponsor:
          WWW '22: The ACM Web Conference 2022
          April 25 - 29, 2022
          Virtual Event, Lyon, France

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)18
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 10 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media