skip to main content
research-article

Scaling Up k-Clique Densest Subgraph Detection

Published: 30 May 2023 Publication History

Abstract

In this paper, we study the k-clique densest subgraph problem, which detects the subgraph that maximizes the ratio between the number of k-cliques and the number of vertices in it. The problem has been extensively studied in the literature and has many applications in a wide range of fields such as biology and finance. Existing solutions rely heavily on repeatedly computing all the k-cliques, which are not scalable to handle large k values on large-scale graphs. In this paper, by adapting the idea of "pivoting", we propose the SCT*-Index to compactly organize the k-cliques. Based on the SCT*-Index, our SCTL algorithm can directly obtain the k-cliques from the index and efficiently achieve near-optimal approximation. To further improve SCTL, we propose SCTL* that includes novel graph reductions and batch-processing optimizations to reduce the search space and decrease the number of visited k-cliques, respectively. As evaluated in our experiments, SCTL* significantly outperform existing approaches by up to two orders of magnitude. In addition, we propose a sampling-based approximate algorithm that can provide reasonable approximations for any k value on billion-scale graphs. Extensive experiments on 12 real-world graphs validate both the efficiency and effectiveness of the proposed techniques.

Supplemental Material

MP4 File
Presentation video for the paper "Scaling Up k-Clique Densest Subgraph Detection" in SIGMOD 2023.

References

[1]
Venkat Anantharam and Justin Salez. 2016. The densest subgraph problem in sparse random graphs. The Annals of Applied Probability, Vol. 26, 1 (2016), 305--327.
[2]
Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In International workshop on algorithms and models for the web-graph. Springer, 25--37.
[3]
Albert Angel, Nick Koudas, Nikos Sarkas, and Divesh Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. arXiv preprint arXiv:1203.0060 (2012).
[4]
Yuichi Asahiro, Refael Hassin, and Kazuo Iwama. 2002. Complexity of finding dense subgraphs. Discrete Applied Mathematics, Vol. 121, 1--3 (2002), 15--26.
[5]
Yuichi Asahiro, Kazuo Iwama, Hisao Tamaki, and Takeshi Tokuyama. 2000. Greedily finding a dense subgraph. Journal of Algorithms, Vol. 34, 2 (2000), 203--221.
[6]
Gary D Bader and Christopher WV Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, Vol. 4, 1 (2003), 1--27.
[7]
Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Densest subgraph in streaming and mapreduce. arXiv preprint arXiv:1201.6567 (2012).
[8]
Oana Denisa Balalau, Francesco Bonchi, TH Hubert Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 379--388.
[9]
Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 173--182.
[10]
Digvijay Boob, Yu Gao, Richard Peng, Saurabh Sawlani, Charalampos Tsourakakis, Di Wang, and Junxing Wang. 2020. Flowless: Extracting densest subgraphs without flow computations. In Proceedings of The Web Conference 2020. 573--583.
[11]
Mauro Brunato, Holger H Hoos, and Roberto Battiti. 2007. On effectively finding maximal quasi-cliques in graphs. In International conference on learning and intelligent optimization. Springer, 41--55.
[12]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.
[13]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.
[14]
Jie Chen and Yousef Saad. 2010. Dense subgraph extraction with application to community detection. IEEE Transactions on knowledge and data engineering, Vol. 24, 7 (2010), 1216--1230.
[15]
Boris V Cherkassky and Andrew V Goldberg. 1997. On implementing the push-relabel method for the maximum flow problem. Algorithmica, Vol. 19, 4 (1997), 390--410.
[16]
Guangyu Cui, Yu Chen, De-Shuang Huang, and Kyungsook Han. 2008. An algorithm for finding functional modules and protein complexes in protein-protein interaction networks. Journal of Biomedicine and Biotechnology, Vol. 2008 (2008).
[17]
Yizhou Dai, Miao Qiao, and Lijun Chang. 2022. Anchored Densest Subgraph. In Proceedings of the 2022 International Conference on Management of Data. 1200--1213.
[18]
Maximilien Danisch, Oana Balalau, and Mauro Sozio. 2018. Listing k-cliques in sparse real-world graphs. In Proceedings of the 2018 World Wide Web Conference. 589--598.
[19]
Maximilien Danisch, T-H Hubert Chan, and Mauro Sozio. 2017. Large scale density-friendly graph decomposition via convex programming. In Proceedings of the 26th International Conference on World Wide Web. 233--242.
[20]
Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E Lee, and John H Thornton Jr. 2009. Migration motif: a spatial-temporal pattern mining approach for financial markets. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[21]
Alessandro Epasto, Silvio Lattanzi, and Mauro Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th international conference on world wide web. 300--310.
[22]
David Eppstein, Maarten Löffler, and Darren Strash. 2013. Listing all maximal cliques in large sparse real-world graphs. Journal of Experimental Algorithmics (JEA), Vol. 18 (2013), 3--1.
[23]
David Eppstein and Darren Strash. 2011. Listing all maximal cliques in large sparse real-world graphs. In International Symposium on Experimental Algorithms. Springer, 364--375.
[24]
Yixiang Fang, Kaiqiang Yu, Reynold Cheng, Laks V. S. Lakshmanan, and Xuemin Lin. 2019. Efficient Algorithms for Densest Subgraph Discovery. Proc. VLDB Endow., Vol. 12, 11 (2019), 1719--1732.
[25]
Uriel Feige, Michael Seltser, et al. 1997. On the densest k-subgraph problem. Citeseer.
[26]
Eugene Fratkin, Brian T Naughton, Douglas L Brutlag, and Serafim Batzoglou. 2006. MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics, Vol. 22, 14 (2006), e150--e157.
[27]
Giorgio Gallo, Michael D Grigoriadis, and Robert E Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput., Vol. 18, 1 (1989), 30--55.
[28]
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st international conference on Very large data bases. Citeseer, 721--732.
[29]
Andrew V Goldberg. 1984. Finding a maximum density subgraph. (1984).
[30]
Andrew V Goldberg and Robert E Tarjan. 1988. A new approach to the maximum-flow problem. Journal of the ACM (JACM), Vol. 35, 4 (1988), 921--940.
[31]
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 895--904.
[32]
Haiyan Hu, Xifeng Yan, Yu Huang, Jiawei Han, and Xianghong Jasmine Zhou. 2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, Vol. 21, suppl_1 (2005), i213--i221.
[33]
Shuguang Hu, Xiaowei Wu, and TH Hubert Chan. 2017. Maintaining densest subsets efficiently in evolving hypergraphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 929--938.
[34]
Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning. PMLR, 427--435.
[35]
Shweta Jain and C Seshadhri. 2020a. The power of pivoting for exact clique counting. In Proceedings of the 13th International Conference on Web Search and Data Mining. 268--276.
[36]
Shweta Jain and C Seshadhri. 2020b. Provably and efficiently approximating near-cliques using the Turán shadow: PEANUTS. In Proceedings of The Web Conference 2020. 1966--1976.
[37]
Victor E Lee, Ning Ruan, Ruoming Jin, and Charu Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Managing and mining graph data. Springer, 303--336.
[38]
Qingyuan Linghu, Fan Zhang, Xuemin Lin, Wenjie Zhang, and Ying Zhang. 2020. Global Reinforcement of Social Networks: The Anchored Coreness Problem. In SIGMOD.
[39]
Chenhao Ma, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022a. Finding locally densest subgraphs: a convex programming approach. Proceedings of the VLDB Endowment, Vol. 15, 11 (2022), 2719--2732.
[40]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022b. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery. In Proceedings of the 2022 International Conference on Management of Data. 845--859.
[41]
Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.
[42]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.
[43]
Lu Qin, Rong-Hua Li, Lijun Chang, and Chengqi Zhang. 2015. Locally densest subgraph discovery. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 965--974.
[44]
Ahmet Erdem Sariyuce, C Seshadhri, Ali Pinar, and Umit V Catalyurek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web. 927--937.
[45]
Bintao Sun, Maximilien Danisch, TH Chan, and Mauro Sozio. 2020. KClist: A simple algorithm for finding k-clique densest subgraphs in large graphs. Proceedings of the VLDB Endowment (PVLDB) (2020).
[46]
Robert Endre Tarjan. 1975. Efficiency of a good but not linear set union algorithm. Journal of the ACM (JACM), Vol. 22, 2 (1975), 215--225.
[47]
Nikolaj Tatti and Aristides Gionis. 2015. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web. 1089--1099.
[48]
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science, Vol. 363, 1 (2006), 28--42.
[49]
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web. 1122--1132.
[50]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 104--112.
[51]
Takeaki Uno. 2010. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, Vol. 56, 1 (2010), 3--16.
[52]
Nate Veldt, Austin R Benson, and Jon Kleinberg. 2021. The generalized mean densest subgraph problem. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1604--1614.
[53]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2020a. Efficient bitruss decomposition for large-scale bipartite graphs. In ICDE. IEEE.
[54]
Kai Wang, Shuting Wang, Xin Cao, and Lu Qin. 2020b. Efficient radius-bounded community search in geo-social networks. IEEE Transactions on Knowledge and Data Engineering (2020).
[55]
Nan Wang, Jingbo Zhang, Kian-Lee Tan, and Anthony KH Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proceedings of the VLDB Endowment, Vol. 4, 2 (2010), 58--68.
[56]
Haiyuan Yu, Alberto Paccanaro, Valery Trifonov, and Mark Gerstein. 2006. Predicting interactions in protein networks by completing defective cliques. Bioinformatics, Vol. 22, 7 (2006), 823--829.
[57]
Long Yuan, Lu Qin, Wenjie Zhang, Lijun Chang, and Jianye Yang. 2017. Index-based densest clique percolation community search in networks. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 5 (2017), 922--935.
[58]
Fan Zhang, Wenjie Zhang, Ying Zhang, Lu Qin, and Xuemin Lin. 2017b. OLAK: an efficient algorithm to prevent unraveling in social networks. PVLDB, Vol. 10, 6 (2017).
[59]
Fan Zhang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2017a. Finding critical users for social network engagement: The collapsed k-core problem. In AAAI.
[60]
Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, Hasan Davulcu, and Hanghang Tong. 2017c. Hidden: hierarchical dense subgraph detection with application to financial fraud detection. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 570--578.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. densest subgraph discovery
  2. graph mining
  3. near-clique detection

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Australian Research Council

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)214
  • Downloads (Last 6 weeks)25
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media