skip to main content
other

The Combinatorial BLAS: design, implementation, and applications

Published: 01 November 2011 Publication History

Abstract

This paper presents a scalable high-performance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of high-performance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse matrix methods. Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse-grained parallelism for thousands of processors, which can be uncovered by using the right primitives. We describe the parallel Combinatorial BLAS, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. We provide an extensible library interface and some guiding principles for future development. The library is evaluated using two important graph algorithms, in terms of both performance and ease-of-use. The scalability and raw performance of the example applications, using the Combinatorial BLAS, are unprecedented on distributed memory clusters.

References

[1]
Ajwani D.,Meyer U.,Osipov V.ALENEX. SIAM; 2007:
[2]
Anderson E.,Bai Z.,Bischof C.,Demmel J.,Dongarra J.,Du Croz J., et alLAPACK's User's Guide. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics; 1992:
[3]
AsanovicK.BodikR.CatanzaroB.GebisJ.HusbandsP.KeutzerK. (2006). The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183 EECS Department, University of California at Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html.
[4]
BaderD.FeoJ.GilbertJ.KepnerJ.KoesterD.LohE. (a) HPCS scalable synthetic compact applications #2. Version 1.1.http://www.highproductivity.org/SSCABmks.htm.
[5]
BaderD.GilbertJ.KepnerJ.MadduriK.(b) HPC graph analysis benchmark. http://www.graphanalysis.org/benchmark.
[6]
Bader D. A.,Kintali S.,Madduri K.,Mihail M.Approximating betweenness centrality.Lecture Notes in Computer Science. 2007;4863:124-137
[7]
Bader D.,Madduri K.Parallel algorithms for evaluating centrality indices in real-world networks.Proceedings of the 35th International Conference on Parallel Processing (ICPP 2006); 2006a; 2006a. 539.
[8]
Bader D. A.,Madduri K.Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2.Proceedings of the 35th International Conference on Parallel Processing (ICPP 2006); 2006b; 2006b. 523.
[9]
Bader D. A.,Madduri K.SNAP, Small-world Network Analysis and Partitioning: An open-source parallel graph framework for the exploration of large-scale networks.IPDPS'08: Proceedings of the 2008 IEEE International Symposium on Parallel & Distributed Processing; 2008; 2008. 1.
[10]
Balay S.,Gropp W. D.,McInnes L. C.,Smith B. F.Modern Software Tools in Scientific Computing. Arge EBruaset AMLangtangen HP, ed. Birkhäuser Press; 1997:163-202.
[11]
Barrett B. W.,Berry J. W.,Murphy R. C.,Wheeler K. B.Implementing a portable multi-threaded graph library: The MTGL on Qthreads.IPDPS'09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing; 2009; 2009. 1.
[12]
Barton J. J.,Nackman L. R.Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc; 1994:
[13]
Berry J. W.,Hendrickson B.,Kahan S.,Konecny P.Software and algorithms for graph queries on multithreaded architectures.IPDPS'07: Proceedings of the 2007 IEEE International Symposium on Parallel & Distributed Processing; 2007; 2007. 1.
[14]
Blelloch GE.Vector Models for Data-Parallel Computing. Cambridge, MA, USA: MIT Press; 1990:
[15]
Bonachea D.GASNet specification, v1.1.Technical Report CSD-02-1207, Computer Science Division; 2002University of California Berkeley; 2002. .
[16]
Brandes U.A faster algorithm for betweenness centrality.Journal of Mathematical Sociology. 2001;25:163-177
[17]
Briggs W. L.,Henson V. E.,McCormick S. F.A Multigrid Tutorial: Second Edition. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics; 2000:
[18]
Brodman J. C.,Fraguela B. B.,Garzarán M. J.,Padua D.New abstractions for data parallel programming.HotPar'09: Proceedings of the 1st USENIX Workshop on Hot Topics in Parallelism; 2009; 2009. .
[19]
BuluçA. (2010). Linear Algebraic Primitives for Parallel Computing on Large Graphs. PhD thesis, University of California, Santa Barbara.
[20]
Buluç A,Fineman JT,Frigo M,Gilbert JR,Leiserson CE.Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks.Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA); 2009; 2009. 233.
[21]
Buluç A.,Gilbert J. R.Challenges and advances in parallel sparse matrix-matrix multiplication.ICPP'08: Proceedings of the International Conference on Parallel Processing; 2008a; 2008a. 503.
[22]
Buluç A.,Gilbert J. R.On the representation and multiplication of hypersparse matrices.IPDPS'08: Proceedings of the 2008 IEEE International Symposium on Parallel & Distributed Processing; 2008b; 2008b. 1.
[23]
BuluçA.GilbertJ. R. (2010). Highly parallel sparse matrix-matrix multiplication. Technical Report UCSB-CS-2010-10, Computer Science Department, University of California Santa Barbara. http://arxiv.org/abs/1006.2183.
[24]
Burrus N.,Duret-Lutz A.,Duret-Lutz R.,Geraud T.,Lesage D.,Poss R.A static C++ object-oriented programming (SCOOP) paradigm mixing benefits of traditional OOP and generic programming.Proceedings of the Workshop on Multiple Paradigm with OO Languages (MPOOL); 2003; 2003. .
[25]
Cardelli L.,Wegner P.On understanding types, data abstraction, and polymorphism.ACM Computing Surveys. 1985;17:471-523
[26]
Cohen J.Graph twiddling in a MapReduce world.Computing in Science and Engineering. 2009;11:29-41
[27]
Coplien J. O.Curiously recurring template patterns.C++ Report. 1995;7:24-27
[28]
Dean J.,Ghemawat S.MapReduce: Simplified data processing on large clusters.Communications of the ACM. 2008;51:107-113
[29]
Dongarra J.Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Bai ZDemmel JDongarra JRuhe Avan der Vorst H, ed. Philadelphia, PA: SIAM; 2000:
[30]
Duff I. S.,Heroux M. A.,Pozo R.An overview of the Sparse Basic Linear Algebra Subprograms: The new standard from the BLAS technical forum.ACM Transactions on Mathematical Software. 2002;28:239-267
[31]
Edmonds N.,Hoefler T.,Lumsdaine A.A Space-Efficient Parallel Algorithm for Computing Betweenness Centrality in Distributed Memory.Accepted at the 2010 International Conference on High Performance Computing (HiPC'10); 2010; 2010. .
[32]
Fineman J.,Robinson E.Graph Algorithms in the Language of Linear Algebra. Kepner JGilbert J, ed. Philadelphia, PA: SIAM; 2011:
[33]
Freeman L. C.A set of measures of centrality based on betweenness.Sociometry. 1977;40:35-41
[34]
Gilbert J. R.,Reinhardt S.,Shah V. B.A unified framework for numerical and combinatorial computing.Computing in Science and Engineering. 2008;10:20-25
[35]
Grama A.,Karypis G.,Gupta A.,Kumar V.Introduction to Parallel Computing: Design and Analysis of Algorithms. Addison-Wesley; 2003:
[36]
Gregor D.,Lumsdaine A.Workshop on Parallel Object-Oriented Scientific Computing (POOSC). 2005:
[37]
Heroux M. A.,Bartlett R. A.,Howle V. E.,Hoekstra R. J.,Hu J. J.,Kolda T. G., et al.An overview of the Trilinos project.ACM Transactions on Mathematical Software. 2005;31:397-423
[38]
Järvi J.,Willcock J.,Lumsdaine A.Concept-controlled polymorphism.Proceedings of the 2nd International Conference on Generative Programming and Component Engineering; 2003; 2003. 228.
[39]
Kepner J.Parallel MATLAB for Multicore and Multinode Computers. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics; 2009:
[40]
Kirsch C.,Payer H.,Röck HDepartment of Computer Sciences, University of Salzburg; 2010:
[41]
Lawson C. L.,Hanson R. J.,Kincaid D. R.,Krogh F. T.Basic linear algebra subprograms for Fortran usage.ACM Transactions on Mathematical Software. 1979;5:308-323
[42]
Lumsdaine A.,Gregor D.,Hendrickson B.,Berry J.Challenges in parallel graph processing.Parallel Processing Letters. 2007;17:5-20
[43]
Madduri K.,Ediger D.,Jiang K.,Bader D.,Chavarria-Miranda D.A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets.Proceedings of the 3rd Workshop on Multithreaded Architectures and Applications (MTAAP 2009); 2009; 2009. .
[44]
Malewicz G.,Austern M. H.,Bik A. J.,Dehnert J. C.,Horn I.,Leiser N., et al.Pregel: a system for large-scale graph processing.Proceedings of the 2010 International Conference on Management of Data; 2010New York, NY, USA; 2010. 135.
[45]
Nieplocha J.,Palmer B.,Tipparaju V.,Krishnan M.,Trease H.,Aprà E.Advances, applications and performance of the global arrays shared memory programming toolkit.International Journal of High Performance Computing Applications. 2006;20:203-231
[46]
Nieplocha J.,Tipparaju V.,Krishnan M.,Panda D. K.High performance remote memory access communication: The ARMCI approach.International Journal of High Performance Computing Applications. 2005;20:233-253
[47]
Petrini F.,Kerbyson D. J.,Pakin S.The case of the missing supercomputer performance: Achieving optimal performance on the 8192 processors of ASCI Q.SC '03: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing; 2003; 2003. 55.
[48]
Robinson E.Graph Algorithms in the Language of Linear Algebra. Kepner J.Gilbert J., ed. Philadelphia, PA: SIAM; 2011:
[49]
Saad Y.Iterative Methods for Sparse Linear Systems. 2003). Iterative Methods for Sparse Linear Systems (2nd edition). Philadelpha, PA: SIAM.Philadelpha, PA: SIAM; 2003:
[50]
ShahV.GilbertJ. R. (2004). Sparse matrices in Matlab*P: Design and implementation. Lecture Notes in Computer Science3296: 144-155. URL http://gauss.cs.ucsb.edu/publication/dsparse.pdf.
[51]
Siek J. G.,Lee L. Q.,Lumsdaine A.The Boost Graph Library User Guide and Reference Manual (With CD-ROM). Addison-Wesley Professional; 2001:
[52]
Tan G.,Sreedhar V.,Gao G.Analysis and performance results of computing betweenness centrality on IBM Cyclops64.The Journal of Supercomputing. 2009;:1-24
[53]
University of Texas (2011). Lonestar User Guide. http://services.tacc.utexas.edu/index.php/lonestar-user-guide.
[54]
Valiant L. G.A bridging model for parallel computation.Communications of the ACM. 1990;33:103-111
[55]
Van Dongen S.MCL - a cluster algorithm for graphs.<jtl/>. 2000;:
[56]
Van Dongen S.Graph clustering via a discrete uncoupling process.SIAM Journal on Matrix Analysis and Applications. 2008;30:121-141
[57]
Van Straalen B.,Shalf J.,Ligocki T.,Keen N.,Yang W.S.Scalability challenges for massively parallel AMR applications.IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel &amp; Distributed Processing; 2009; 2009. 1.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications  Volume 25, Issue 4
November 2011
157 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 November 2011

Author Tags

  1. Betweenness centrality
  2. Markov clustering
  3. combinatorial BLAS
  4. combinatorial scientific computing
  5. graph analysis
  6. mathematical software
  7. parallel graph library
  8. software framework
  9. sparse matrices

Qualifiers

  • Other

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media