skip to main content
10.1145/3398682.3399164acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ELite: Cost-effective Approximation of Exploration-based Graph Analysis

Published: 14 June 2020 Publication History

Abstract

Vertex-centric block synchronous processing systems, exemplified by Pregel and Giraph, have received extensive attention for graph processing. These systems allow programmers to think only about operations that take place at one vertex and provide the underlying computation framework that involves multiple iterations (supersteps) with communication between neighboring vertices between supersteps. As graphs grow in size to billions of vertices and trillions of edges, processing them in this model face challenges: (1) The poor latency of supersteps dominated by the tasks performed on high degree vertices or densely connected components; and (2) The overwhelming network communication among vertices that can be proved of high redundancy. For many applications, approximate results are acceptable, and if these can be computed rapidly, they may be preferable. Many of the existing approximate solutions suffer from algorithm-specific designs that are not generic or lacking theoretical guarantees on the results' quality. In this paper we tackle this problem using a generic approach that can be incorporated into the graph processing platform. The approach we advocate involves communicating vertex states to a subset of the neighbors at each superstep; this is called selective edge lookup. We show how this approach can be incorporated into two primitive graph operators: BFS and DFS, which can be the basis of many graph analysis workloads. Extensive experiments over real-world and synthetic graphs validate the effectiveness and efficiency of the selective edge lookup approach.

References

[1]
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proc. 8th ACM SIGOPS/EuroSys European Conf. on Comp. Syst. 29--42.
[2]
Reka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Rev. of Modern Phys. 74 (2002), 47--97.
[3]
Apache 2017. Apache Giraph. http://giraph.apache.org.
[4]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. 2010. Fast Incremental and Personalized PageRank. Proc. VLDB Endowment 4, 3 (2010), 173--184.
[5]
Surender Baswana and Telikepalli Kavitha. 2010. Faster Algorithms for All-pairs Approximate Shortest Paths in Undirected Graphs. SIAM J. on Comput. 39, 7 (2010), 2865--2896.
[6]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data 4, 3 (2010), 13:1-13:28.
[7]
Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. 2015. Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation. In Proc. 21st ACM Symp. on Parallel Algorithms and Architectures. 182--191.
[8]
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce Online. In Proc. 7th USENIX Symp. on Networked Systems Design & Implementation. USENIX Association, 21--21.
[9]
John Demme and Simha Sethumadhavan. 2012. Approximate graph clustering for program characterization. ACM Trans. Architecture and Code Optimizaiton 8, 4 (2012), 21:1-21:21.
[10]
Guy Even. 1999. Fast Approximate Graph Partitioning Algorithms. SIAM J. on Comput. 28, 6 (1999), 2187--2214.
[11]
Wai Shing Fung, Ramesh Hariharan, Nicholas J. A. Harvey, and Debmalya Panigrahi. 2011. A general framework for graph sparsification. In Proc. 43rd Annual ACM Symp. on Theory of Computing. 71--80.
[12]
David F. Gleich and Marzia Polito. 2007. Approximating Personalized PageRank with Minimal Use of Web Graph Data. Internet Mathematics 3, 3 (2007), 257--294.
[13]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proc. 10th USENIX Symp. on Operating System Design and Implementation. 17--30.
[14]
Andrey Gubichev, Srikanta Bedathur, Stephan Seufert, and Gerhard Weikum. 2010. Fast and accurate estimation of shortest paths in large graphs. In Proc. 19th ACM Int. Conf. on Information and Knowledge Management. ACM, 499--508.
[15]
Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, and Tianqi Jin. 2014. An Experimental Comparison of Pregel-like Graph Processing Systems. Proc. VLDB Endowment 7, 12 (2014), 1047--1058. http://www.vldb.org/pvldb/vol7/p1047-han.pdf
[16]
Takanori Hayashi, Takuya Akiba, and Yuichi Yoshida. 2015. Fully Dynamic Betweenness Centrality Maintenance on Massive Networks. Proc. VLDB Endowment 9, 2 (2015), 48--59.
[17]
Watts D. J. and S. H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393 (1998), 440--442.
[18]
E. T. Jaynes. 2003. Probability theory: The logic of science. Cambridge University Press.
[19]
Amlan Kusum, Keval Vora, Rajiv Gupta, and Iulian Neamtiu. 2016. Efficient Processing of Large Graphs via Input Reduction. In Proc. 25th IEEE Int. Symp. High Performance Distributed Computing. 245--257.
[20]
Nikolay Laptev, Kai Zeng, and Carlo Zaniolo. 2012. Early Accurate Results for Advanced Analytics on MapReduce. Proc. VLDB Endowment 5, 10 (2012), 1028--1039.
[21]
Adam Lugowski, David Alber, Aydm Buluç, John R. Gilbert, Steve Reinhardt, Yun Teng, and Andrew Waranis. 2012. A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis. In Proc. 2012 SIAM Int. Conf. on Data Mining. 930--941.
[22]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 135--146.
[23]
Danupon Nanongkai. 2014. Distributed approximation algorithms for weighted shortest paths. In Proc. 46th Annual ACM Symp. on Theory of Computing. 565--573.
[24]
M. E. J. Newman, D. J. Watts, and S. H. Strogatz. 2002. Random graph models of social networks. In (Sackler NAS Colloquium) Self-Organized Complexity in the Physical, Biological, and Social Sciences. National Academy of Sciences, 2566--2573.
[25]
Christopher R. Palmer, Phillip B. Gibbons, and Christos Faloutsos. 2002. ANF: a fast and scalable tool for data mining in massive graphs. In Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. 81--90.
[26]
Vibhor Rastogi, Ashwin Machanavajjhala, Laukik Chitnis, and Anish Das Sarma. 2013. Finding connected components in map-reduce in logarithmic rounds. In Proc. 29th Int. Conf. on Data Engineering. 50--61.
[27]
Liam Roditty and Virginia Vassilevska Williams. 2013. Fast approximation algorithms for the diameter and radius of sparse graphs. In Proc. 45th Annual ACM Symp. on Theory of Computing. 515--524.
[28]
Ning Ruan, Ruoming Jin, and Yan Huang. 2011. Distance Preserving Graph Simplification. In Proc. 11th IEEE Int. Conf. on Data Mining. 1200--1205.
[29]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: A User Survey. Arxiv (2017).
[30]
Semih Salihoglu and Jennifer Widom. 2013. GPS: a graph processing system. In Proc. 25th Int. Conf. on Scientific and Statistical Database Management. 22:1-22:12.
[31]
Semih Salihoglu and Jennifer Widom. 2014. Optimizing Graph Algorithms on Pregel-like Systems. Proc. VLDB Endowment 7, 7 (2014), 577--588.
[32]
Zechao Shang and Jeffrey Xu Yu. 2014. Auto-Approximation of Graph Computing. Proc. VLDB Endowment 7, 14 (2014), 1833--1844.
[33]
Christian Sommer. 2014. Shortest-path queries in static networks. ACM Comput. Surv. 46, 4 (2014), 45:1-45:31.
[34]
Mikkel Thorup and Uri Zwick. 2001. Approximate distance oracles. In Proc. 33rd Annual ACM Symp. on Theory of Computing. 183--192.
[35]
Kai Wang, Guoqing (Harry) Xu, Zhendong Su, and Yu David Liu. 2015. GraphQ: Graph Query Processing with Abstraction Refinement - Scalable and Programmable Analytics over Very Large Graphs on a Single PC. In Proc. USENIX 2015 Annual Technical Conf. 387--401.
[36]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs. Proc. VLDB Endowment 7, 14 (2014), 1981--1992.
[37]
Fang Zhou, Sébastien Mahler, and Hannu Toivonen. 2010. Network Simplification with Minimal Loss of Connectivity. In Proc. 10th IEEE Int. Conf. on Data Mining. 659--668.

Cited By

View all

Index Terms

  1. ELite: Cost-effective Approximation of Exploration-based Graph Analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
    June 2020
    93 pages
    ISBN:9781450380218
    DOI:10.1145/3398682
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Approximated Algorithms
    2. Vertex-centric Graph Computing

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    GRADES-NDA'20 Paper Acceptance Rate 9 of 15 submissions, 60%;
    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media