skip to main content
research-article

Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs

Published: 31 January 2018 Publication History

Abstract

How can we estimate local triangle counts accurately in a graph stream without storing the whole graph? How to handle duplicated edges in local triangle counting for graph stream? Local triangle counting, which computes the number of triangles attached to each node in a graph, is a very important problem with wide applications in social network analysis, anomaly detection, web mining, and the like.
In this article, we propose algorithms for local triangle counting in a graph stream based on edge sampling: Mascot for a simple graph, and MultiBMascot and MultiWMascot for a multigraph. To develop Mascot, we first present two naive local triangle counting algorithms in a graph stream, called Mascot-C and Mascot-A. Mascot-C is based on constant edge sampling, and Mascot-A improves its accuracy by utilizing more memory spaces. Mascot achieves both accuracy and memory-efficiency of the two algorithms by unconditional triangle counting for a new edge, regardless of whether it is sampled or not. Extending the idea to a multigraph, we develop two algorithms MultiBMascot and MultiWMascot. MultiBMascot enables local triangle counting on the corresponding simple graph of a streamed multigraph without explicit graph conversion; MultiWMascot considers repeated occurrences of an edge as its weight and counts each triangle as the product of its three edge weights. In contrast to the existing algorithm that requires prior knowledge on the target graph and appropriately set parameters, our proposed algorithms require only one parameter of edge sampling probability.
Through extensive experiments, we show that for the same number of edges sampled, Mascot provides the best accuracy compared to the existing algorithm as well as Mascot-C and Mascot-A. We also demonstrate that MultiBMascot on a multigraph is comparable to Mascot-C on the counterpart simple graph, and MultiWMascot becomes more accurate for higher degree nodes. Thanks to Mascot, we also discover interesting anomalous patterns in real graphs, including core-peripheries in the web, a bimodal call pattern in a phone call history, and intensive collaboration in DBLP.

References

[1]
Nesreen K. Ahmed, Nick G. Duffield, Jennifer Neville, and Ramana Rao Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 1446--1455.
[2]
Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. oddball: Spotting anomalies in weighted graphs. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’10). Part II. 410--421.
[3]
Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.
[4]
Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 529--538.
[5]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. 2010. Fast incremental and personalized PageRank. PVLDB 4, 3 (2010), 173--184.
[6]
Bahman Bahmani, Ravi Kumar, Mohammad Mahdian, and Eli Upfal. 2012. PageRank on an evolving graph. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 24--32.
[7]
Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms.  623--632.
[8]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 16--24.
[9]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. ACM Transactions on Knowledge Discovery from Data 4, 3 (2010).
[10]
Jonathan W. Berry, Bruce Hendrickson, Randall A. LaViolette, and Cynthia A. Phillips. 2011. Tolerating the community detection resolution limit with edge weighting. Physical Review E 83, 5 (2011), 056119.
[11]
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 253--262.
[12]
Shumo Chu and James Cheng. 2011. Triangle listing in massive networks and its applications. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 672--680.
[13]
Jonathan Cohen. 2009. Graph twiddling in a MapReduce world. Computing in Science and Engineering 11, 4 (2009), 29--41.
[14]
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2016. TRIÈST: Counting local and global triangles in fully-dynamic streams with fixed memory size. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 825--834.
[15]
Prasanna Kumar Desikan, Nishith Pathak, Jaideep Srivastava, and Vipin Kumar. 2005. Incremental page rank computation on evolving graphs. In Proceedings of the 14th International Conference on World Wide Web (WWW’05) -- Special interest tracks and posters. 1094--1095.
[16]
Jean-Pierre Eckmann and Elisha Moses. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. In Proceedings of the National Academy of Sciences 99, 9 (2002), 5825--5829.
[17]
Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. 2013. Massive graph triangulation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). 325--336.
[18]
Madhav Jha, C. Seshadhri, and Ali Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). 589--597.
[19]
Madhav Jha, C. Seshadhri, and Ali Pinar. 2015. A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. ACM Transactions on Knowledge Discovery from Data 9, 3 (2015), 15:1--15:21.
[20]
Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for counting triangles in graphs. In Proceedings of the 11th Annual International Conference on Computing and Combinatorics (COCOON’05). 710--716.
[21]
Daniel M. Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting arbitrary subgraphs in data streams. In Proccedings of the 39th International Colloquium on Automata, Languages, and Programming (ICALP’12). Part II. 598--609.
[22]
U. Kang, Brendan Meeder, and Christos Faloutsos. 2011. Spectral analysis for billion-scale graphs: Discoveries and implementation. In Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’11). Part II. 13--25.
[23]
U. Kang, Brendan Meeder, Evangelos E. Papalexakis, and Christos Faloutsos. 2014. HEigen: Spectral analysis for billion-scale graphs. IEEE Transactions on Knowledge and Data Engineering 26, 2 (2014), 350--362.
[24]
Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. 2014. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In Proceedings of the International Conference on Management of Data (SIGMOD’14). 637--648.
[25]
Konstantin Kutzkov and Rasmus Pagh. 2013. On the streaming complexity of computing local clustering coefficients. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13). 677--686.
[26]
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.
[27]
Yongsub Lim and U. Kang. 2015. MASCOT: Memory-efficient and accurate sampling for counting local triangles in graph streams. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 685--694.
[28]
Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.
[29]
Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: Simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). 1106--1114.
[30]
Rasmus Pagh and Francesco Silvestri. 2014. The input/output complexity of triangle enumeration. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’14). 224--233.
[31]
Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting and a MapReduce implementation. Information Processing Letters 112, 7 (2012), 277--281.
[32]
Ha-Myung Park and Chin-Wan Chung. 2013. An efficient mapreduce algorithm for counting triangles in a very large graph. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 539--548.
[33]
Ha-Myung Park, Sung-Hyon Myaeng, and U. Kang. 2016. PTE: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1115--1124.
[34]
Ha-Myung Park, Francesco Silvestri, U. Kang, and Rasmus Pagh. 2014. MapReduce triangle enumeration with guarantees. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). 1739--1748.
[35]
A. Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. Counting and sampling triangles from a graph stream. PVLDB 6, 14 (2013), 1870--1881.
[36]
Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. 2011. Estimating pagerank on graph streams. Journal of the ACM 58, 3 (2011), 13.
[37]
Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, Proceedings of the 4th InternationalWorkshop (WEA’05). 606--609.
[38]
Gene D. Sprouse. 2007. Editorial: Which wei wang? Physical Review Special Topics—Accelerators and Beams 10, 12 (2007), 120001.
[39]
Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 1287--1301.
[40]
Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 1222--1230.
[41]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). 607--614.
[42]
Charalampos E. Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. FENNEL: streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM’14). 333--342.
[43]
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. DOULION: Counting triangles in massive graphs with a coin. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28--July 1, 2009. 837--846.
[44]
Howard T. Welser, Eric Gleave, Danyel Fisher, and Marc A. Smith. 2007. Visualizing the signatures of social roles in online discussion groups. Journal of Social Structure 8 (2007), 752--753.
[45]
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2014. Uncovering social network sybils in the wild. ACM Transactions on Knowledge Discovery from Data 8, 1 (2014), 2:1--2:29.

Cited By

View all

Index Terms

  1. Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 1
        Special Issue (IDEA) and Regular Papers
        February 2018
        363 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3178542
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 31 January 2018
        Accepted: 01 November 2016
        Revised: 01 October 2016
        Received: 01 December 2015
        Published in TKDD Volume 12, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Local triangle counting
        2. anomaly detection
        3. edge sampling
        4. graph stream mining

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • AFOSR/AOARD
        • High Performance Big Data Analytics Platform Performance Acceleration Technologies Development
        • Research Resettlement Fund for the new faculty of Seoul National University
        • Institute for Information communications Technology Promotion (IITP) grant funded by the Korea government (MSIP)

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)25
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 25 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media