skip to main content
research-article

KClist++: a simple algorithm for finding k-clique densest subgraphs in large graphs

Published: 01 June 2020 Publication History

Abstract

The problem of finding densest subgraphs has received increasing attention in recent years finding applications in biology, finance, as well as social network analysis. The k-clique densest subgraph problem is a generalization of the densest subgraph problem, where the objective is to find a subgraph maximizing the ratio between the number of k-cliques in the subgraph and its number of nodes. It includes as a special case the problem of finding subgraphs with largest average number of triangles (k = 3), which plays an important role in social network analysis. Moreover, algorithms that deal with larger values of k can effectively find quasi-cliques. The densest subgraph problem can be solved in polynomial time with algorithms based on maximum flow, linear programming or a recent approach based on convex optimization. In particular, the latter approach can scale to graphs containing tens of billions of edges. While finding a densest subgraph in large graphs is no longer a bottleneck, the k-clique densest subgraph remains challenging even when k = 3. Our work aims at developing near-optimal and exact algorithms for the k-clique densest subgraph problem on large real-world graphs. We give a surprisingly simple procedure that can be employed to find the maximal k-clique densest subgraph in large-real world graphs. By leveraging appealing properties of existing results, we combine it with a recent approach for listing all k-cliques in a graph and a sampling scheme, obtaining the state-of-the-art approaches for the aforementioned problem. Our theoretical results are complemented with an extensive experimental evaluation showing the effectiveness of our approach in large real-world graphs.

References

[1]
J. Abello, M. G. Resende, and S. Sudarsky. Massive quasi-clique detection. In Latin American Symposium on Theoretical Informatics, pages 598--612. Springer, 2002.
[2]
V. Anantharam, J. Salez, et al. The densest subgraph problem in sparse random graphs. The Annals of Applied Probability, 26(1):305--327, 2016.
[3]
A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. PVLDB, 5(6):574--585, 2012.
[4]
Y. Asahiro, R. Hassin, and K. Iwama. Complexity of finding dense subgraphs. Discrete Applied Mathematics, 121(1):15--26, 2002.
[5]
B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and MapReduce. PVLDB, 5(5):454--465, 2012.
[6]
O. D. Balalau, F. Bonchi, T. Chan, F. Gullo, and M. Sozio. Finding subgraphs with maximum total density and limited overlap. In WSDM, pages 379--388, 2015.
[7]
M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In Approximation Algorithms for Combinatorial Optimization, pages 84--95. Springer, 2000.
[8]
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA, pages 937--946, 2002.
[9]
M. Danisch, O. Balalau, and M. Sozio. Listing k-cliques in sparse real-world graphs. In Proceedings of the 2018 World Wide Web Conference, WWW '18, pages 589--598, Republic and Canton of Geneva, Switzerland, 2018. International World Wide Web Conferences Steering Committee.
[10]
M. Danisch, T. H. Chan, and M. Sozio. Large scale density-friendly graph decomposition via convex programming. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017 [10], pages 233--242.
[11]
X. Du, R. Jin, L. Ding, V. E. Lee, and J. H. T. Jr. Migration motif: a spatial - temporal pattern mining approach for financial markets. In SIGKDD, pages 1135--1144, 2009.
[12]
A. Epasto, S. Lattanzi, and M. Sozio. Efficient densest subgraph computation in evolving graphs. In WWW, pages 300--310, 2015.
[13]
I. Finocchi, M. Finocchi, and E. G. Fusco. Clique counting in MapReduce: algorithms and experiments. Journal of Experimental Algorithmics (JEA), 20:1--7, 2015.
[14]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. Bioinformatics, 22(14):e150--e157, 2006.
[15]
A. V. Goldberg. Finding a maximum density subgraph. University of California Berkeley, CA, 1984.
[16]
B. Hajek. Performance of global load balancing by local adjustment. IEEE Transactions on Information Theory, 36(6):1398--1414, 1990.
[17]
B. Hajek et al. Balanced loads in infinite networks. The Annals of Applied Probability, 6(1):48--75, 1996.
[18]
S. Hu, X. Wu, and T. H. Chan. Maintaining densest subsets efficiently in evolving hypergraphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017, pages 929--938, 2017.
[19]
M. Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML, pages 427--435, 2013.
[20]
S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In Proceedings of the 26th International Conference on World Wide Web, pages 441--449. International World Wide Web Conferences Steering Committee, 2017.
[21]
S. Jain and C. Seshadhri. The power of pivoting for exact clique counting. In WSDM, pages 268--276, 2020.
[22]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 813--826, 2009.
[23]
V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data, pages 303--336. Springer, 2010.
[24]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[25]
M. Mitzenmacher, J. Pachocki, R. Peng, C. Tsourakakis, and S. C. Xu. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 815--824. ACM, 2015.
[26]
R. A. Rossi, D. F. Gleich, and A. H. Gebremedhin. Parallel maximum clique algorithms with applications to network analysis. SIAM Journal on Scientific Computing, 37(5):C589--C616, 2015.
[27]
J. G. Siek, L.-Q. Lee, and A. Lumsdaine. Boost Graph Library: User Guide and Reference Manual, The. Pearson Education, 2001.
[28]
M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In SIGKDD, pages 939--948, 2010.
[29]
N. Tatti and A. Gionis. Density-friendly graph decomposition. In WWW, pages 1089--1099, 2015.
[30]
C. Tsourakakis. The k-clique densest subgraph problem. In Proceedings of the 24th international conference on world wide web, pages 1122--1132. International World Wide Web Conferences Steering Committee, 2015.
[31]
C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In SIGKDD, pages 104--112, 2013.
[32]
N. Wang, J. Zhang, K.-L. Tan, and A. K. Tung. On triangulation-based dense neighborhood graph discovery. PVLDB, 4(2):58--68, 2010.

Cited By

View all
  1. KClist++: a simple algorithm for finding k-clique densest subgraphs in large graphs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 13, Issue 10
      June 2020
      193 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 June 2020
      Published in PVLDB Volume 13, Issue 10

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media