research-article

Public Access

Gunrock: a high-performance graph processing library on the GPU

Authors:

Yangzihao Wang,

Andrew Davidson,

John D. OwensAuthors Info & Claims

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Article No.: 11, Pages 1 - 12

https://doi.org/10.1145/2851141.2851145

Published: 27 February 2016 Publication History

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a novel data-centric abstraction centered on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five graph primitives (BFS, BC, SSSP, CC, and PageRank) and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

Supplementary Material

Supplemental material. (a11-wang.zip)

Download
107.61 KB

References

[1]

S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 12:1--12:10, Nov. 2012.

Digital Library

[2]

U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163--177, 2001.

[3]

M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IEEE International Symposium on Workload Characterization, IISWC-2012, pages 141--151, Nov. 2012.

Digital Library

[4]

D. Cederman and P. Tsigas. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008, pages 57--64, June 2008.

Digital Library

[5]

A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, pages 349--359, May 2014.

Digital Library

[6]

D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F. Werneck. PHAST: Hardware-accelerated shortest path trees. Journal of Parallel and Distributed Computing, 73:940--952, Sept. 2010.

[7]

E. Elsen and V. Vaidyanathan. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the gather-apply-scatter abstraction, 2013. http://www.github.com/RoyalCaliber/vertexAPI2.

[8]

Z. Fu, M. Personick, and B. Thompson. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 2:1--2:6, June 2014.

Digital Library

[9]

A. Geil, Y. Wang, and J. D. Owens. WTF, GPU! Computing Twitter's who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, Oct. 2014.

Digital Library

[10]

R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments, ALENEX08, pages 90--100, Jan. 2008.

Digital Library

[11]

A. Goel. The "who-to-follow" system at Twitter: Algorithms, impact, and further research. WWW 2014 industry track, 2014.

[12]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI '12, pages 17--30. USENIX Association, Oct. 2012.

Digital Library

[13]

D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing (POOSC), July 2005.

[14]

J. Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '94, pages 16--25, June 1994.

Digital Library

[15]

S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, Mar. 2012.

Digital Library

[16]

Y. Jia, V. Lu, J. Hoberock, M. Garland, and J. C. Hart. Edge v. node parallelism for graph centrality metrics. In W. W. Hwu, editor, GPU Computing Gems Jade Edition, chapter 2, pages 15--28. Morgan Kaufmann, Oct. 2011.

[17]

S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. IEEE Micro, 31(5):7--17, Sept. 2011.

Digital Library

[18]

F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 239--252, June 2014.

Digital Library

[19]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pages 340--349, July 2010.

[20]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, June 2010.

Digital Library

[21]

R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 11--18, Feb. 2014.

Digital Library

[22]

A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 572--583, Nov. 2014.

Digital Library

[23]

A. McLaughlin, J. Riedy, and D. A. Bader. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In 2015 IEEE High Performance Extreme Computing Conference, HPEC '15, Sept. 2015.

[24]

D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, Feb. 2012.

Digital Library

[25]

U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, Oct. 2003. 1998 European Symposium on Algorithms.

Digital Library

[26]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, Nov. 2013.

Digital Library

[27]

P. R. Pande and D. A. Bader. Computing betweenness centrality for small world networks on a GPU. In HPEC, 2011.

[28]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 12--25, June 2011.

Digital Library

[29]

S. Salihoglu and J. Widom. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 3:1--3:6, June 2014.

Digital Library

[30]

S. Sallinen, A. Gharaibeh, and M. Ripeanu. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR, abs/1503.04359(1503.04359v1), Mar. 2015.

[31]

A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, Mar. 2013.

Digital Library

[32]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 135--146, Feb. 2013.

Digital Library

[33]

J. G. Siek, L.-Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, Dec. 2001.

[34]

J. Soman, K. Kishore, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, IPDPSW 2010, pages 1--8, Apr. 2010.

[35]

S. Tzeng, B. Lloyd, and J. D. Owens. A GPU task-parallel model with dependency resolution. IEEE Computer, 45(8):34--41, Aug. 2012.

Digital Library

[36]

Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization for high-level programming models for GPU graph analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66--75, Oct. 2015.

Digital Library

[37]

J. Zhong and B. He. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, 25(6):1543--1552, June 2014.

Digital Library

Cited By

Xu QYang JZhang FChen ZGuan JChen KFan JShen YYang KZhang YDu X(2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665852
Cui PLiu HTang BYuan Y(2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648179
Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Show More Cited By

Recommendations

Gunrock: GPU Graph Analytics
Special Issue: Invited papers from PPoPP 2016, Part 1

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our ...
Gunrock: a high-performance graph processing library on the GPU
PPoPP '16

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level ...
Gunrock: a high-performance graph processing library on the GPU
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2016

420 pages

ISBN:9781450340922

DOI:10.1145/2851141

General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK

ACM SIGPLAN Notices Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Editor:
Matthew Fluet
Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Distinguished Paper

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency
U.S. Army
UC Lab Fees Research Program Award
National Science Foundation

Conference

PPoPP '16

Sponsor:

PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

March 12 - 16, 2016

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

333
Total Citations
View Citations
5,630
Total Downloads

Downloads (Last 12 months)716
Downloads (Last 6 weeks)82

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu QYang JZhang FChen ZGuan JChen KFan JShen YYang KZhang YDu X(2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665852
Cui PLiu HTang BYuan Y(2024)CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processorProceedings of the VLDB Endowment10.14778/3648160.364817917:6(1405-1417)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648179
Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Zhao CYu TZheng ZZhu YJin SDu BTao D(2024)SpeedCore: Space-efficient and Dependency-aware GPU Parallel Framework for Core DecompositionProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673111(555-564)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673111
Chen YYu J(2024)Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor CoresProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673055(1135-1144)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673055
Jin SLee HRo WDev KYoo JMeinerzhagen P(2024)GUMSO: Gating Unnecessary On-Chip Memory Slices for Power Optimization on GPUsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670800(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670800
Jaiyeoba OSkadron K(2024)Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/366200217:3(1-29)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3662002
Feng YWang HZhu YLiu XLu HLiu Q(2024)DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted GraphsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656600(1-13)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656600
Fu QJi YRolinger THuang H(2024)TLPGNN: A Lightweight Two-level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUsACM Transactions on Parallel Computing10.1145/364471211:2(1-28)Online publication date: 8-Jun-2024
https://dl.acm.org/doi/10.1145/3644712
Ye CLi YSun SGuo W(2024)gSWORD: GPU-accelerated Sampling for Subgraph CountingProceedings of the ACM on Management of Data10.1145/36392882:1(1-26)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639288
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents