research-article

Automatically adapting programs for mixed-precision floating-point computation

Authors:

Michael O. Lam,

Jeffrey K. Hollingsworth,

Bronis R. de Supinski,

Matthew P. LegendreAuthors Info & Claims

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

Pages 369 - 378

https://doi.org/10.1145/2464996.2465018

Published: 10 June 2013 Publication History

Abstract

As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. In this paper, we present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. This framework allows developers to explore mixed-precision configurations without modifying their source code, and it permits autotuning of floating-point precision. We include a simple search algorithm to automate identification of code regions that can use lower precision. Our results for several benchmarks show that our framework is effective and incurs low overhead (less than 10X in most cases). In addition, we demonstrate that our tool can replicate manual conversions and suggest further optimization; in one case, we achieve a speedup of 2X.

References

[1]

ASC Sequoia Benchmark Codes. https://asc.llnl.gov/sequoia/benchmarks/. Accessed 21 September 2011.

[2]

M. V. A. Andrade, J. a. L. D. Comba, and J. Stolfi. Affine Arithmetic, 1994.

[3]

H. Anzt, B. Rocker, and V. Heuveline. Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms. Computer Science Research and Development, 25(3-4):141--148, 2010.

[4]

M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov. Accelerating Scientific Computations with Mixed Precision Algorithms. Computer Physics Communications, 180(12):2526--2533, 2008.

[5]

D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0, 1995.

[6]

F. Benz, S. Hack, and A. Hildebrandt. A Dynamic Program Analysis to find Floating-Point Accuracy Problems. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012.

Digital Library

[7]

R. F. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. J. Dongarra. The Matrix Market: A web resource for test matrix collections. In R. F. Boisvert, editor, Quality of Numerical Software, Assessment and Enhancement, pages 125--137. Chapman & Hall, London, 1997.

Digital Library

[8]

B. Buck and J. K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14:317--329, 2000.

Digital Library

[9]

A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek, and S. Tomov. Exploiting Mixed Precision Floating Point Hardware for Scientific Computation. In High Performance Computing and Grids in Action. IOS Press, 2008.

[10]

A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov. Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy. 2007.

[11]

J. W. Carr. Error analysis in oating point arithmetic. Communications of the ACM, 2(5):10--16, May 1959.

Digital Library

[12]

M. A. Clark, R. Babich, K. Barros, R. C. Brower, and C. Rebbi. Solving Lattice QCD systems of equations using mixed precision solvers on GPUs. Computer Physics Communications, 181(9):30, 2010.

[13]

F. De Dinechin, C. Q. Lauter, and G. Melquiond. Assisted verification of elementary functions using Gappa. Proceedings of the 2006 ACM symposium on Applied computing SAC 06, page 1318, 2006.

Digital Library

[14]

J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A Supernodal Approach to Sparse Partial Pivoting. SIAM Journal on Matrix Analysis and Applications, 20(3):720--755, 1999.

Digital Library

[15]

J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J. C. Andre, D. Barkai, J. Y. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, B. Gropp, R. Harrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Y. Ishikawa, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, T. Lippert, B. Lucas, B. Maccabe, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. S. Mueller, W. E. Nagel, H. Nakashima, M. E. Papka, D. Reed, M. Sato, E. Seidel, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, S. Sumimoto, W. Tang, J. Taylor, R. Thakur, A. Trefethen, M. Valero, A. Van Der Steen, J. Vetter, P. Williams, R. Wisniewski, and K. Yelick. The International Exascale Software Project roadmap. International Journal of High Performance Computing Applications, 25(1):3--60, 2011.

Digital Library

[16]

S. P. Eric Goubault Matthieu Martel. Asserting thePrecision of Floating-Point Computations: A Simple Abstract Interpreter. Programming Languages and Systems, pages 287--306, 2002.

Digital Library

[17]

C. F. Fang, R. A. Rutenbar, M. Püschel, and T. Chen. Toward efficient static analysis of finite-precision effects in DSP applications via affine arithmetic modeling. Proceedings of the 40th conference on Design automation DAC 03, page 496, 2003.

Digital Library

[18]

M. Furuichi, D. a. May, and P. J. Tackley. Development of a Stokes flow solver robust to large viscosity jumps using a Schur complement approach with mixed precision arithmetic. Journal of Computational Physics, 230(24):8835--8851, Oct. 2011.

Digital Library

[19]

D. Göddeke and R. Strzodka. Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid. IEEE Transactions on Parallel and Distributed Systems, 22(1):22--32, 2011.

Digital Library

[20]

D. Göoddeke, R. Strzodka, and S. Turek. Performance and accuracy of hardware-oriented native-, emulatedand mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems, 22(4):221--256, Aug. 2007.

Digital Library

[21]

D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5--48, Mar. 1991.

Digital Library

[22]

E. Goubault. Static Analyses of the Precision of Floating-Point Operations. Static Analysis, pages 234--259, 2001.

Digital Library

[23]

X. Hao and A. Varshney. Variable-precision rendering. Proceedings of the 2001 symposium on Interactive 3D graphics SI3D 01, http:149--158, 2001.

Digital Library

[24]

N. J. Higham. Accuracy and Stability of Numerical Algorithms, Second Edition. SIAM Philadelphia, 2002.

Digital Library

[25]

J. D. Hogg and J. A. Scott. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems. ACM Transactions on Mathematical Software, 37(2):1--24, 2010.

Digital Library

[26]

IEEE. IEEE 754-2008, Standard for Floating-Point Arithmetic. IEEE, New York, Aug. 2008.

[27]

J. Jenkins, E. R. Schendel, S. Lakshminarasimhan, D. A. Boyuka II, T. Rogers, S. Ethier, R. Ross, S. Klasky, and N. F. Samatova. Byte-precision level of detail processing for variable precision analytics. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 48:1--48:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[28]

T. Kaneko and B. Liu. On Local Roundoff Errors in Floating-Point Arithmetic. J. ACM, 20(3):391--398, 1973.

Digital Library

[29]

W. Kraemer. A Priori Worst Case Error Bounds for Floating-Point Computations. IEEE transactions on computers, 47(7):750--756, 1998.

Digital Library

[30]

W. Krämer and A. Bantle. Automatic Forward Error Analysis for Floating Point Algorithms. Reliable Computing, 7(4):321--340, Aug. 2001.

[31]

T. I. Laakso and L. B. Jackson. Bounds for oating-point roundoff noise. IEEE transactions on circuits and systems, 41(6):424--426, 1994.

[32]

M. O. Lam, J. K. Hollingsworth, and G. W. Stewart. Dynamic Floating-Point Cancellation Detection. In WHIST '11, 2011.

[33]

J. L. Larson, M. E. Pasternak, and J. A. Wisniewski. Algorithm 594: Software for Relative Error Analysis. ACM Transactions on Mathematical Software, 9(1):125--130, Mar. 1983.

Digital Library

[34]

X. S. Li, M. C. Martin, B. J. Thompson, T. Tung, D. J. Yoo, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, S. Y. Kang, and A. Kapur. Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2):152--205, June 2002.

Digital Library

[35]

M. D. Linderman, M. Ho, D. L. Dill, T. H. Meng, and G. P. Nolan. Towards program optimization through automated analysis of numerical precision. Proceedings of the 8th annual IEEE ACM international symposium on Code generation and optimization CGO 10, page 230, 2010.

Digital Library

[36]

B. Liu and T. Kaneko. Error analysis of digital filters realized with floating-point arithmetic. Proceedings of the IEEE, 57(10):1735--1747, 1969.

[37]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser,G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM.

Digital Library

[38]

Precision Computations: A Semantics Approach. Programming Languages and Systems, pages 159--186, 2002.

[39]

M. Martel. Semantics-Based Transformation of Arithmetic Expressions. Static Analysis, pages 298--314, 2007.

Digital Library

[40]

M. Martel. Program transformation for numerical precision. In PEPM '09: Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation, pages 101--110, New York, NY, USA, Jan. 2009. ACM Press.

Digital Library

[41]

P. L. Richman. Automatic error analysis for determining precision. Communications of the ACM, 15(9):813--820, Sept. 1972.

Digital Library

[42]

Robert Strzodka and Dominik Goddeke. Mixed Precision Methods for Convergent Iterative Schemes. In Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, May 2006, 2006.

[43]

Robert Strzodka and Dominik Goddeke. Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components. IEEE Proceedings on Field-Programmable Custom Computing, 2006.

Digital Library

[44]

J. H. Wilkinson. Rounding Errors in Algebraic Processes. Prentice-Hall, Inc., 1964.

Cited By

Zhang YLu LYang ZLiang ZSuo S(2025)LE-GEMM: A lightweight emulation-based GEMM with precision refinement on GPUJournal of Systems Architecture10.1016/j.sysarc.2025.103336160(103336)Online publication date: Mar-2025
https://doi.org/10.1016/j.sysarc.2025.103336
Wang CXia JChen L(2025)A heterogeneous hybrid-precision finite volume method for compressible flow on unstructured gridsComputers & Fluids10.1016/j.compfluid.2024.106505288(106505)Online publication date: Mar-2025
https://doi.org/10.1016/j.compfluid.2024.106505
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Show More Cited By

Index Terms

Automatically adapting programs for mixed-precision floating-point computation

Recommendations

Mixed-Precision AMG method for Many Core Accelerators
EuroMPI/ASIA '14: Proceedings of the 21st European MPI Users' Group Meeting

There is a large gap between single- and double-precision performances on GPUs. Single-precision arithmetic is more than twice as fast as double-precision arithmetic on GPUs. However, single-precision arithmetic cannot achieve sufficient accuracy. By ...
Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance

Tensor Core is a mixed-precision matrix–matrix multiplication unit on NVIDIA GPUs with a theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor Cores were developed in response to the high demand of dense matrix ...
Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Using NVIDIA graphics processing units (GPUs) equipped with Tensor Cores has enabled the significant acceleration of general matrix multiplication (GEMM) for applications in machine learning (ML) and artificial intelligence (AI) and in high-performance ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing

June 2013

512 pages

ISBN:9781450321303

DOI:10.1145/2464996

General Chair:
Allen D. Malony
University of Oregon, USA
,
Program Chairs:
Mario Nemirovsky
Barcelona Supercomputing Center, Spain
,
Sam Midkiff
Purdue University, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS'13

Sponsor:

SIGARCH

ICS'13: International Conference on Supercomputing

June 10 - 14, 2013

Oregon, Eugene, USA

Acceptance Rates

ICS '13 Paper Acceptance Rate 43 of 202 submissions, 21%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

101
Total Citations
View Citations
745
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)7

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLu LYang ZLiang ZSuo S(2025)LE-GEMM: A lightweight emulation-based GEMM with precision refinement on GPUJournal of Systems Architecture10.1016/j.sysarc.2025.103336160(103336)Online publication date: Mar-2025
https://doi.org/10.1016/j.sysarc.2025.103336
Wang CXia JChen L(2025)A heterogeneous hybrid-precision finite volume method for compressible flow on unstructured gridsComputers & Fluids10.1016/j.compfluid.2024.106505288(106505)Online publication date: Mar-2025
https://doi.org/10.1016/j.compfluid.2024.106505
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Wang YRubio-González CRoychoudhury APaiva AAbreu RStorey M(2024)Predicting Performance and Accuracy of Mixed-Precision Programs for Precision TuningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623338(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623338
Vanover JAltuntas ARubio-González C(2024)Toward Automated Precision Tuning of Weather and Climate Models: A Case StudySC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00026(148-159)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SCW63240.2024.00026
Fakhreddine YRevy G(2024)Performance on SIMD architectures of auto-tuned programs for matrix multiplication2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00096(564-571)Online publication date: 16-Dec-2024
https://doi.org/10.1109/MCSoC64144.2024.00096
Dai WJia ZBai YSun Q(2024)Convergence-aware operator-wise mixed-precision trainingCCF Transactions on High Performance Computing10.1007/s42514-024-00208-9Online publication date: 31-Dec-2024
https://doi.org/10.1007/s42514-024-00208-9
Liu JWang YGao JJi W(2024)pSpMv: precision-based sparse matrix partition and SpMV optimizationCCF Transactions on High Performance Computing10.1007/s42514-024-00195-xOnline publication date: 16-Dec-2024
https://doi.org/10.1007/s42514-024-00195-x
Raţiu CAssunção WHerac EHaas RLauwerys CEgyed A(2024)Using reactive links to propagate changes across engineering modelsSoftware and Systems Modeling10.1007/s10270-024-01186-wOnline publication date: 10-Jun-2024
https://doi.org/10.1007/s10270-024-01186-w
Zhu YZhuo JMa BGeng JWei XWei XWang SEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Orthogonal Temporal Interpolation for Zero-Shot Video RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611903(7491-7501)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611903
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten