skip to main content
10.1145/2939672.2939757acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

PTE: Enumerating Trillion Triangles On Distributed Systems

Published: 13 August 2016 Publication History

Abstract

How can we enumerate triangles from an enormous graph with billions of vertices and edges? Triangle enumeration is an important task for graph data analysis with many applications including identifying suspicious users in social networks, detecting web spams, finding communities, etc. However, recent networks are so large that most of the previous algorithms fail to process them. Recently, several MapReduce algorithms have been proposed to address such large networks; however, they suffer from the massive shuffled data resulting in a very long processing time. In this paper, we propose PTE (Pre-partitioned Triangle Enumeration), a new distributed algorithm for enumerating triangles in enormous graphs by resolving the structural inefficiency of the previous MapReduce algorithms. PTE enumerates trillions of triangles in a billion scale graph by decreasing three factors: the amount of shuffled data, total work, and network read.
Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph with 6.3 billion vertices and 72 billion edges, which any previous triangle computation algorithm fail to process.

Supplementary Material

MP4 File (kdd2016_park_trillion_triangles_01-acm.mp4)

References

[1]
Jesse Alpert and Nissan Hajaj. http://googleblog.blogspot.kr/2008/07/we-knew-web-was-big.html, 2008.
[2]
Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe. PATRIC: a parallel algorithm for counting triangles in massive networks. In CIKM, 2013.
[3]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 2010.
[4]
Jonathan W Berry, Bruce Hendrickson, Randall A LaViolette, and Cynthia A Phillips. Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E, 83(5):056119, 2011.
[5]
Bin-Hui Chou and Einoshin Suzuki. Discovering community-oriented roles of nodes in a social network. In DaWaK, pages 52--64, 2010.
[6]
Jonathan Cohen. Graph twiddling in a mapreduce world. CiSE, 11(4):29--41, 2009.
[7]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004.
[8]
Jean-Pierre Eckmann and Elisha Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.
[9]
Facebook. http://newsroom.fb.com/company-info, 2015.
[10]
Ilias Giechaskiel, George Panagopoulos, and Eiko Yoneki. PDTL: parallel and distributed triangle listing for massive graphs. In ICPP, 2015.
[11]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.
[12]
Herodotos Herodotou. Hadoop performance models. arXiv, 2011.
[13]
Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. Massive graph triangulation. In SIGMOD, pages 325--336, 2013.
[14]
ByungSoo Jeon, Inah Jeon, Lee Sael, and U Kang. Scout: Scalable coupled matrix-tensor factorization - algorithm and discoveries. In ICDE, 2016.
[15]
U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-ray: Visualizing and mining billion-scale graphs. In PAKDD, 2014.
[16]
U Kang, Brendan Meeder, Evangelos E. Papalexakis, and Christos Faloutsos. Heigen: Spectral analysis for billion-scale graphs. TKDE, pages 350--362, 2014.
[17]
U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. Gbase: an efficient analysis platform for large graphs. VLDB J., 21(5):637--650, 2012.
[18]
U Kang, Charalampos E. Tsourakakis, and Faloutsos Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. ICDM, 2009.
[19]
Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, pages 637--648, 2014.
[20]
Matthieu Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., pages 458--473, 2008.
[21]
Rasmus Pagh and Francesco Silvestri. The input/output complexity of triangle enumeration. In PODS, pages 224--233, 2014.
[22]
Ha-Myung Park and Chin-Wan Chung. An efficient mapreduce algorithm for counting triangles in a very large graph. In CIKM, pages 539--548, 2013.
[23]
Ha-Myung Park, Francesco Silvestri, U Kang, and Rasmus Pagh. Mapreduce triangle enumeration with guarantees. In CIKM, pages 1739--1748, 2014.
[24]
Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and Domenico Parisi. Defining and identifying communities in networks. PNAS, 101(9):2658--2663, 2004.
[25]
Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd thesis, University Karlsruhe, 2007.
[26]
Siddharth Suri and Sergei Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011.
[27]
Twitter. https://about.twitter.com/company, 2015.
[28]
Mark N. Wegman and Larry Carter. New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci., 22(3):265--279, 1981.
[29]
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. Uncovering social network sybils in the wild. TKDD, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. distributed algorithm
  3. graph algorithm
  4. network analysis
  5. scalable algorithm
  6. triangle enumeration

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media