skip to main content
10.1145/2851141.2851145acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

Gunrock: a high-performance graph processing library on the GPU

Published: 27 February 2016 Publication History

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a novel data-centric abstraction centered on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five graph primitives (BFS, BC, SSSP, CC, and PageRank) and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

Supplementary Material

Supplemental material. (a11-wang.zip)

References

[1]
S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 12:1--12:10, Nov. 2012.
[2]
U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163--177, 2001.
[3]
M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IEEE International Symposium on Workload Characterization, IISWC-2012, pages 141--151, Nov. 2012.
[4]
D. Cederman and P. Tsigas. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008, pages 57--64, June 2008.
[5]
A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, pages 349--359, May 2014.
[6]
D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F. Werneck. PHAST: Hardware-accelerated shortest path trees. Journal of Parallel and Distributed Computing, 73:940--952, Sept. 2010.
[7]
E. Elsen and V. Vaidyanathan. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the gather-apply-scatter abstraction, 2013. http://www.github.com/RoyalCaliber/vertexAPI2.
[8]
Z. Fu, M. Personick, and B. Thompson. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 2:1--2:6, June 2014.
[9]
A. Geil, Y. Wang, and J. D. Owens. WTF, GPU! Computing Twitter's who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, Oct. 2014.
[10]
R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments, ALENEX08, pages 90--100, Jan. 2008.
[11]
A. Goel. The "who-to-follow" system at Twitter: Algorithms, impact, and further research. WWW 2014 industry track, 2014.
[12]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI '12, pages 17--30. USENIX Association, Oct. 2012.
[13]
D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing (POOSC), July 2005.
[14]
J. Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '94, pages 16--25, June 1994.
[15]
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, Mar. 2012.
[16]
Y. Jia, V. Lu, J. Hoberock, M. Garland, and J. C. Hart. Edge v. node parallelism for graph centrality metrics. In W. W. Hwu, editor, GPU Computing Gems Jade Edition, chapter 2, pages 15--28. Morgan Kaufmann, Oct. 2011.
[17]
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. IEEE Micro, 31(5):7--17, Sept. 2011.
[18]
F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 239--252, June 2014.
[19]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pages 340--349, July 2010.
[20]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, June 2010.
[21]
R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 11--18, Feb. 2014.
[22]
A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 572--583, Nov. 2014.
[23]
A. McLaughlin, J. Riedy, and D. A. Bader. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In 2015 IEEE High Performance Extreme Computing Conference, HPEC '15, Sept. 2015.
[24]
D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, Feb. 2012.
[25]
U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, Oct. 2003. 1998 European Symposium on Algorithms.
[26]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, Nov. 2013.
[27]
P. R. Pande and D. A. Bader. Computing betweenness centrality for small world networks on a GPU. In HPEC, 2011.
[28]
K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 12--25, June 2011.
[29]
S. Salihoglu and J. Widom. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 3:1--3:6, June 2014.
[30]
S. Sallinen, A. Gharaibeh, and M. Ripeanu. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR, abs/1503.04359(1503.04359v1), Mar. 2015.
[31]
A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, Mar. 2013.
[32]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 135--146, Feb. 2013.
[33]
J. G. Siek, L.-Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, Dec. 2001.
[34]
J. Soman, K. Kishore, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, IPDPSW 2010, pages 1--8, Apr. 2010.
[35]
S. Tzeng, B. Lloyd, and J. D. Owens. A GPU task-parallel model with dependency resolution. IEEE Computer, 45(8):34--41, Aug. 2012.
[36]
Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization for high-level programming models for GPU graph analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66--75, Oct. 2015.
[37]
J. Zhong and B. He. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, 25(6):1543--1552, June 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Check for updates

Badges

  • Distinguished Paper

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)716
  • Downloads (Last 6 weeks)82
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media