skip to main content
research-article

Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion

Published: 23 April 2010 Publication History

Abstract

We describe a new data format for storing triangular, symmetric, and Hermitian matrices called Rectangular Full Packed Format (RFPF). The standard two-dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of Level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there is no Level 3 packed BLAS. We combine the good features of packed and full storage using RFPF to obtain high performance via using Level 3 BLAS as RFPF is a standard full-format representation. Also, RFPF requires exactly the same minimal storage as packed the format. Each LAPACK full and/or packed triangular, symmetric, and Hermitian routine becomes a single new RFPF routine based on eight possible data layouts of RFPF. This new RFPF routine usually consists of two calls to the corresponding LAPACK full-format routine and two calls to Level 3 BLAS routines. This means no new software is required. As examples, we present LAPACK routines for Cholesky factorization, Cholesky solution, and Cholesky inverse computation in RFPF to illustrate this new work and to describe its performance on several commonly used computer platforms. Performance of LAPACK full routines using RFPF versus LAPACK full routines using the standard format for both serial and SMP parallel processing is about the same while using half the storage. Performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines.

References

[1]
Agarwal, R. C., Gustavson, F. G., and Zubair, M. 1994. Exploiting functional parallelism on power2 to design high-performance numerical algorithms. IBM J. Res. Develop. 38, 5, 563--576.
[2]
Andersen, B. S., Gunnels, J. A., Gustavson, F., and Waśniewski, J. 2002. A recursive formulation of the inversion of symmetric positive definite matrices in packed storage data format. In Proceedings of the 6th International Conference on Applied Parallel Computing, J. Fagerholm, J. Haataja, J. Järvinen, M. Lyly, and P. R. V. Savolainen, Eds. Lecture Notes in Computer Science, vol. 2367, Springer, Berlin, Germany, 287--296.
[3]
Andersen, B. S., Gustavson, F. G., Reid, J. K., and Waśniewski, J. 2005. A fully portable high performance minimal storage hybrid format Cholesky algorithm. ACM Trans. Math. Softw. 31, 201--227.
[4]
Andersen, B. S., Gustavson, F. G., and Waśniewski, J. 2001. A recursive formulation of cholesky facorization of a matrix in packed storage. ACM Trans. Math. Softw. 27, 2, 214--244.
[5]
Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J. J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D. 1999. LAPACK Users' Guide, 3rd Ed. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[6]
Barker, V. A., Blackford, L. S., Dongarra, J. J., Croz, J. D., Hammarling, S., Marinova, M., Waśniewski, J., and Yalamov, P. 2001. LAPACK95 Users' Guide, 1st ed. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[7]
Buttari, A., Langou, J., Kurzak, J., and Dongarra, J. 2009. A class of parallel tiled linear algebra algorithms for multicore architectures. Parall. Comput. 35, 38--53.
[8]
Chan, E., Quintana-Ortí, E., Quintana-Ortí, G., and van de Geijn, R. 2007. Super-matrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In Proceedings of the 19th ACM Symposium on Parallelism in Algorithms and Architecture. 116--125.
[9]
Demmel, J. W. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA.
[10]
Dongarra, J. J., Bunch, J. R., Moler, C. B., and Stewart, G. W. 1979. Linpack Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[11]
Dongarra, J. J., Du Croz, J., Duff, I. S., and Hammarling, S. 1990a. Algorithm 679: A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1, 18--28.
[12]
Dongarra, J. J., Du Croz, J., Duff, I. S., and Hammarling, S. 1990b. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1, 1--17.
[13]
Dongarra, J. J., Du Croz, J., Hammarling, S., and Hanson, R. J. 1988. An extended set of Fortran basic linear algebra subroutines. ACM Trans. Math. Softw. 14, 1, 1--17.
[14]
Dongarra, J. J., Duff, I. S., Sorensen, D. C., and van der Vorst, H. A. 1998. Numerical Linear Algebra for High Performance Computers. SIAM, Philadelphia, PA.
[15]
Elmroth, E., Gustavson, F. G., Kagstrom, B., and Jonsson, I. 2004. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46, 1, 3--45.
[16]
Golub, G. and Van Loan, C. F. 1996. Matrix Computations 3rd Ed. Johns Hopkins University Press, Baltimore, MD.
[17]
Gunnels, J. A. and Gustavson, F. G. 2004. A new array format for symmetric and triangular matrices. In Proceedings of the International Conference on Applied Parallel Computing (PARA'04) J. W. J. J. Dongare and K. Madsen, eds. Lecture Nots in Computer Science, vol. 3732. Springer-Verlag, Berlin, Heidelberg, Germany, 247--255.
[18]
Gustavson, F. G. 2003. High performance linear algebra algorithms using new generalized data structures for matrices. IBM J. Res. Develop. 47, 1, 823--849.
[19]
Gustavson, F. G., Gunnels, J., and Sexton, J. 2007a. Minimal data copy for dense linear algebra factorization. In Proceedings of the International Conference on Applied Parallel Computing, (PARA'06). Lecture Nots in Computer Science, vol. 4699, Springer-Verlag, Berlin Heidelberg, Germany, 540--549.
[20]
Gustavson, F. G. and Jonsson, I. 2000. Minimal storage high performance Cholesky via blocking and recursion. IBM J. Res. Develop. 44, 6, 823--849.
[21]
Gustavson, F. G., Reid, J. K., and Waśniewski, J. 2007b. Algorithm 865: Fortran 95 subroutines for Cholesky factorization in blocked hybrid format. ACM Trans. Math. Softw. 33, 1, 5.
[22]
Gustavson, F. G. and Waśniewski, J. 2007. Rectangular full packed format for LAPACK algorithms timings on several computers. In Proceedings of the International Conference on Applied Parallel Computing (PARA'06). Lecture Nots in Computer Science, vol. 4699. Springer-Verlag, Berlin Heidelberg, Germany, 570--579.
[23]
Herrero, J. R. 2006. A framework for efficient execution of matrix computations. Ph.D. dissertation. Universitat Politècnica de Catalunya, Bancelona, Spain.
[24]
Higham, N. J. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA.
[25]
IBM. 1997. Engineering and Scientific Subroutine Library for AIX, Version 3, Volume 1. IBM, Yorktown Heights, NY. IBM. Pub. number SA22--7272--0.
[26]
Lawson, C. L., Hanson, R. J., Kincaid, D., and Krogh, F. T. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 308--323.
[27]
Trefethen, L. N. and Bau, D. 1997. Numerical Linear Algebra. SIAM, Philadelphia, PA.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 37, Issue 2
April 2010
281 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/1731022
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2010
Accepted: 01 November 2009
Revised: 01 July 2009
Received: 01 January 2009
Published in TOMS Volume 37, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BLAS
  2. Cholesky factorization and solution
  3. LAPACK
  4. Real symmetric matrices
  5. Rectangular Full Packed Format
  6. complex Hermitian matrices
  7. linear algebra libraries
  8. novel packed matrix data structures
  9. positive definite matrices
  10. recursive algorithms

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media