skip to main content
research-article

A Novel Graph Indexing Approach for Uncovering Potential COVID-19 Transmission Clusters

Published: 20 February 2023 Publication History

Abstract

The COVID-19 pandemic has caused the society lockdowns and a large number of deaths in many countries. Potential transmission cluster discovery is to find all suspected users with infections, which is greatly needed to fast discover virus transmission chains so as to prevent an outbreak of COVID-19 as early as possible. In this article, we study the problem of potential transmission cluster discovery based on the spatio-temporal logs. Given a query of patient user q and a timestamp of confirmed infection tq, the problem is to find all potential infected users who have close social contacts to user q before time tq. We motivate and formulate the potential transmission cluster model, equipped with a detailed analysis of transmission cluster property and particular model usability. To identify potential clusters, one straightforward method is to compute all close contacts on-the-fly, which is simple but inefficient caused by scanning spatio-temporal logs many times. To accelerate the efficiency, we propose two indexing algorithms by constructing a multigraph index and an advanced BCG-index. Leveraging two well-designed techniques of spatio-temporal compression and graph partition on bipartite contact graphs, our BCG-index approach achieves a good balance of index construction and online query processing to fast discover potential transmission cluster. We theoretically analyze and compare the algorithm complexity of three proposed approaches. Extensive experiments on real-world check-in datasets and COVID-19 confirmed cases in the United States validate the effectiveness and efficiency of our potential transmission cluster model and algorithms.

References

[1]
Aniruddha Adiga, Lijing Wang, Benjamin Hurt, Akhil Sai Peddireddy, Przemyslaw Porebski, Srinivasan Venkatramanan, Bryan Lewis, and Madhav Marathe. 2021. All models are useful: Bayesian ensembling for robust high resolution covid-19 forecasting. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2505–2513.
[2]
Pritom Ahmed, Mahbub Hasan, Abhijith Kashyap, Vagelis Hristidis, and Vassilis J. Tsotras. 2017. Efficient computation of top-k frequent terms over spatio-temporal ranges. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1227–1241.
[3]
Md Musfique Anwar, Chengfei Liu, and Jianxin Li. 2019. Discovering and tracking query oriented active online social groups in dynamic information network. World Wide Web 22, 4 (2019), 1819–1854.
[4]
Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. 2018. Spatio-temporal data mining: A survey of problems and methods. Computing Surveys 51, 4 (2018), 1–41.
[5]
Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Jing Han, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, and Cecilia Mascolo. 2020. Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 3474–3484.
[6]
Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: User movement in location-based social networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1082–1090.
[7]
Rui Dai, Shenkun Xu, Qian Gu, Chenguang Ji, and Kaikui Liu. 2020. Hybrid spatio-temporal graph convolutional network: Improving traffic prediction with navigation data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 3074–3082.
[8]
Chiara Francalanci, Barbara Pernici, and Gabriele Scalia. 2017. Exploratory spatio-temporal queries in evolving information. In Proceedings of the International Workshop on Mobility Analytics for Spatio-Temporal and Social Data. 138–156.
[9]
Zhenxin Fu, Yu Wu, Hailei Zhang, Yichuan Hu, Dongyan Zhao, and Rui Yan. 2020. Be aware of the hot zone: A warning system of hazard area prediction to intervene novel coronavirus COVID-19 outbreak. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2241–2250.
[10]
Fred Galvin. 1995. The list chromatic index of a bipartite multigraph. Journal of Combinatorial Theory, Series B 63, 1 (1995), 153–158.
[11]
Salah Ghamizi, Renaud Rwemalika, Maxime Cordy, Lisa Veiber, Tegawendé F. Bissyandé, Mike Papadakis, Jacques Klein, and Yves Le Traon. 2020. Data-driven simulation and optimization for Covid-19 exit strategies. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 3434–3442.
[12]
Jonathan L. Gross and Jay Yellen. 2005. Graph Theory and Its Applications. CRC Press.
[13]
Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 47–57.
[14]
Qianyue Hao, Lin Chen, Fengli Xu, and Yong Li. 2020. Understanding the urban pandemic spreading of COVID-19 with real world mobility data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 3485–3492.
[15]
Frank Harary. 1991. Graph Theory. Addison-Wesley.
[16]
Xiaoyong Jin, Yu-Xiang Wang, and Xifeng Yan. 2021. Inter-series attention model for COVID-19 forecasting. In Proceedings of the 2021 SIAM International Conference on Data Mining. 495–503.
[17]
David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 137–146.
[18]
Minseok Kim, Junhyeok Kang, Doyoung Kim, Hwanjun Song, Hyangsuk Min, Youngeun Nam, Dongmin Park, and Jae-Gil Lee. 2020. Hi-COVIDNet: Deep learning approach to predict inbound COVID-19 patients and case study in south korea. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 3466–3473.
[19]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved June 2022 from http://snap.stanford.edu/data.
[20]
Hui-Jia Li, Zhan Bu, Zhen Wang, and Jie Cao. 2019. Dynamical clustering in electronic commerce systems via optimization and leadership expansion. IEEE Transactions on Industrial Informatics 16, 8 (2019), 5327–5334.
[21]
Hui-Jia Li, Lin Wang, Yan Zhang, and Matjaž Perc. 2020. Optimization of identifiability for efficient community detection. New Journal of Physics 22, 6 (2020), 063035.
[22]
Hui-Jia Li, Wenzhe Xu, Shenpeng Song, Wen-Xuan Wang, and Matjaž Perc. 2021. The dynamics of epidemic spreading on signed networks. Chaos, Solitons & Fractals 151 (2021), 111294.
[23]
Ting Li, Junbo Zhang, Kainan Bao, Yuxuan Liang, Yexin Li, and Yu Zheng. 2020. Autost: Efficient neural architecture search for spatio-temporal prediction. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 794–802.
[24]
Yang Liu, Zhonglei Gu, and Jiming Liu. 2021. Uncovering transmission patterns of COVID-19 outbreaks: A region-wide comprehensive retrospective study in Hong Kong. EClinicalMedicine 36 (2021), 100929.
[25]
Yang Liu, Zhonglei Gu, Shang Xia, Benyun Shi, X.-N. Zhou, Yong Shi, and Jiming Liu. 2020. What are the underlying transmission patterns of covid-19 outbreak?–an age-specific social contact characterization. EClinicalMedicine 22 (2020), 100354.
[26]
Yuyu Luo, Wenbo Li, Tianyu Zhao, Xiang Yu, Lixi Zhang, Guoliang Li, and Nan Tang. 2020. Deeptrack: Monitoring and exploring spatio-temporal data: A case of tracking COVID-19. Proceedings of the VLDB Endowment 13, 12 (2020), 2841–2844.
[27]
Nikos Mamoulis, Huiping Cao, George Kollios, Marios Hadjieleftheriou, Yufei Tao, and David W. Cheung. 2004. Mining, indexing, and querying historical spatiotemporal data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 236–245.
[28]
Yannis Manolopoulos, Apostolos N. Papadopoulos, Apostolos N. Papadopoulos, and Yannis Theodoridis. 2006. R-Trees: Theory and Applications: Theory and Applications. Springer Science & Business Media.
[29]
Lukas M. Marti, Michael P. Dal Santo, and Ronald Keryuan Huang. 2016. Modeling significant locations. US Patent 9,267,805.
[30]
Mirco Nanni, Gennady Andrienko, Albert-László Barabási, Chiara Boldrini, Francesco Bonchi, Ciro Cattuto, Francesca Chiaromonte, Giovanni Comandé, Marco Conti, Mark Coté, Frank Dignum, Virginia Dignum, Josep Domingo-Ferrer, Paolo Ferragina, Fosca Giannotti, Riccardo Guidotti, Dirk Helbing, Kimmo Kaski, Janos Kertesz, Sune Lehmann, Bruno Lepri, Paul Lukowicz, Stan Matwin, David Megias Jimenez, Anna Monreale, Katharina Morik, Nuria Oliver, Andrea Passarella, Andrea Passerini, Dino Pedreschi, Alex Pentland, Fabio Pianesi, Francesca Pratesi, Salvatore Rinzivillo, Salvatore Ruggieri, Arno Siebes, Vicenc Torra, Roberto Trasarti, Jeroen van den Hoven, and Alessandro Vespignani. 2021. Give more data, awareness and control to individual citizens, and they will help COVID-19 containment. Ethics and Information Technology 23, 1 (2021), 1–6.
[31]
Bing Ni, Qiaomu Shen, Jiayi Xu, and Huamin Qu. 2017. Spatio-temporal flow maps for visualizing movement and contact patterns. Visual Informatics 1, 1 (2017), 57–64.
[32]
Maya Okawa, Tomoharu Iwata, Takeshi Kurashima, Yusuke Tanaka, Hiroyuki Toda, and Naonori Ueda. 2019. Deep mixture point processes: Spatio-temporal event prediction with rich contextual information. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 373–383.
[33]
Dimitris Papadias, Yufei Tao, P. Kanis, and Jun Zhang. 2002. Indexing spatio-temporal data warehouses. In Proceedings of the IEEE International Conference on Data Engineering. 166–175.
[34]
Zhe Peng, Jinbin Huang, Haixin Wang, Shihao Wang, Xiaowen Chu, Xinzhi Zhang, Li Chen, Xin Huang, Xiaoyi Fu, Yike Guo, and Jianliang Xu. 2021. BU-Trace: A permissionless mobile system for privacy-preserving intelligent contact tracing. In Proceedings of the DASFAA 2021 International Workshops: BDQM, GDMA, MLDLDSA, MobiSocial, and MUST. 381–397.
[35]
Zhe Peng, Cheng Xu, Haixin Wang, Jinbin Huang, Jianliang Xu, and Xiaowen Chu. 2021. P2B-trace: Privacy-preserving blockchain-based contact tracing to combat pandemics. In Proceedings of the 2021 International Conference on Management of Data. 2389–2393.
[36]
Putsadee Pornphol and Suphamit Chittayasothorn. 2020. System dynamics model of COVID-19 pandemic situation: The case of phuket Thailand. In Proceedings of the International Conference on Computer Modeling and Simulation. 77–81.
[37]
Amray Schwabe, Joel Persson, and Stefan Feuerriegel. 2021. Predicting COVID-19 spread from large-scale mobility data. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3531–3539.
[38]
Yufei Tao and Dimitris Papadias. 2001. The mv3r-tree: A spatio-temporal access method for timestamp and interval queries. In Proceedings of the International Conference on Very Large Data Bases. 431–440.
[39]
The New York Times. 2021. Coronavirus (Covid-19) Data in the United States. Retrieved June 2022 from https://github.com/nytimes/covid-19-data.
[40]
Vincent S. Tseng, Josh Jia-Ching Ying, Stephen T. C. Wong, Diane J. Cook, and Jiming Liu. 2020. Computational intelligence techniques for combating COVID-19: A survey. IEEE Computational Intelligence Magazine 15, 4 (2020), 10–22.
[41]
Bowen Wang, Yanjing Sun, Trung Q. Duong, Long D. Nguyen, and Lajos Hanzo. 2020. Risk-aware identification of highly suspected COVID-19 cases in social IoT: A joint graph theory and reinforcement learning approach. IEEE Access 8 (2020), 115655–115661.
[43]
Guojun Wu, Yichen Ding, Yanhua Li, Jie Bao, Yu Zheng, and Jun Luo. 2017. Mining spatio-temporal reachable regions over massive trajectory data. In Proceedings of the IEEE International Conference on Data Engineering. 1283–1294.
[44]
Huanhuan Wu, James Cheng, Yi Lu, Yiping Ke, Yuzhen Huang, Da Yan, and Hejun Wu. 2015. Core decomposition in large temporal graphs. In Proceedings of the IEEE International Conference on Big Data. 649–658.
[45]
Huanhuan Wu, Yuzhen Huang, James Cheng, Jinfeng Li, and Yiping Ke. 2016. Reachability and time-based path queries in temporal graphs. In Proceedings of the IEEE International Conference on Data Engineering. 145–156.
[46]
Marcin Wylot, Philippe Cudré-Mauroux, Manfred Hauswirth, and Paul Groth. 2017. Storing, tracking, and querying provenance in linked data. IEEE Transactions on Knowledge and Data Engineering 29, 8 (2017), 1751–1764.
[47]
Xiaopeng Xiong, Mohamed F. Mokbel, and Walid G. Aref. 2005. Sea-cnn: Scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases. In Proceedings of the IEEE International Conference on Data Engineering. 643–654.
[48]
Zhe Xu, Lei Shi, Yijin Wang, Jiyuan Zhang, Lei Huang, Chao Zhang, Shuhong Liu, Peng Zhao, Hongxia Liu, Li Zhu, and Y. Tai. 2020. Pathological findings of COVID-19 associated with acute respiratory distress syndrome. The Lancet Respiratory Medicine 8, 4 (2020), 420–422.
[49]
Dingqi Yang, Bingqing Qu, Jie Yang, and Philippe Cudre-Mauroux. 2019. Revisiting user mobility and social relationships in LBSNs: A hypergraph embedding approach. In Proceedings of the World Wide Web Conference. 2147–2157.
[50]
Dingqi Yang, Daqing Zhang, Longbiao Chen, and Bingqing Qu. 2015. Nationtelescope: Monitoring and visualizing large-scale collective behavior in LBSNs. Journal of Network and Computer Applications 55 (2015), 170–180.
[51]
Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2016. Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Transactions on Intelligent Systems and Technology 7, 3 (2016), 1–23.
[52]
Zhao Yang and Nathalie Japkowicz. 2017. Meta-morisita index: Anomaly behaviour detection for large scale tracking data with spatio-temporal marks. In Proceedings of the IEEE International Conference on Data Mining Workshops. 675–682.
[53]
Tianming Zhang, Yunjun Gao, Lu Chen, Wei Guo, Shiliang Pu, Baihua Zheng, and Christian S. Jensen. 2019. Efficient distributed reachability querying of massive temporal graphs. The VLDB Journal 28, 6 (2019), 871–896.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 2
February 2023
355 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3572847
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2023
Online AM: 24 May 2022
Accepted: 08 May 2022
Revised: 24 December 2021
Received: 23 June 2021
Published in TKDD Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph index
  2. transmission cluster
  3. COVID-19

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong RGC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 349
    Total Downloads
  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)6
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media