skip to main content
10.1145/2464996.2465018acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Automatically adapting programs for mixed-precision floating-point computation

Published: 10 June 2013 Publication History

Abstract

As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. In this paper, we present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. This framework allows developers to explore mixed-precision configurations without modifying their source code, and it permits autotuning of floating-point precision. We include a simple search algorithm to automate identification of code regions that can use lower precision. Our results for several benchmarks show that our framework is effective and incurs low overhead (less than 10X in most cases). In addition, we demonstrate that our tool can replicate manual conversions and suggest further optimization; in one case, we achieve a speedup of 2X.

References

[1]
ASC Sequoia Benchmark Codes. https://asc.llnl.gov/sequoia/benchmarks/. Accessed 21 September 2011.
[2]
M. V. A. Andrade, J. a. L. D. Comba, and J. Stolfi. Affine Arithmetic, 1994.
[3]
H. Anzt, B. Rocker, and V. Heuveline. Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms. Computer Science Research and Development, 25(3-4):141--148, 2010.
[4]
M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov. Accelerating Scientific Computations with Mixed Precision Algorithms. Computer Physics Communications, 180(12):2526--2533, 2008.
[5]
D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0, 1995.
[6]
F. Benz, S. Hack, and A. Hildebrandt. A Dynamic Program Analysis to find Floating-Point Accuracy Problems. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012.
[7]
R. F. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. J. Dongarra. The Matrix Market: A web resource for test matrix collections. In R. F. Boisvert, editor, Quality of Numerical Software, Assessment and Enhancement, pages 125--137. Chapman & Hall, London, 1997.
[8]
B. Buck and J. K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14:317--329, 2000.
[9]
A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek, and S. Tomov. Exploiting Mixed Precision Floating Point Hardware for Scientific Computation. In High Performance Computing and Grids in Action. IOS Press, 2008.
[10]
A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov. Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy. 2007.
[11]
J. W. Carr. Error analysis in oating point arithmetic. Communications of the ACM, 2(5):10--16, May 1959.
[12]
M. A. Clark, R. Babich, K. Barros, R. C. Brower, and C. Rebbi. Solving Lattice QCD systems of equations using mixed precision solvers on GPUs. Computer Physics Communications, 181(9):30, 2010.
[13]
F. De Dinechin, C. Q. Lauter, and G. Melquiond. Assisted verification of elementary functions using Gappa. Proceedings of the 2006 ACM symposium on Applied computing SAC 06, page 1318, 2006.
[14]
J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A Supernodal Approach to Sparse Partial Pivoting. SIAM Journal on Matrix Analysis and Applications, 20(3):720--755, 1999.
[15]
J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J. C. Andre, D. Barkai, J. Y. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, B. Gropp, R. Harrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Y. Ishikawa, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, T. Lippert, B. Lucas, B. Maccabe, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. S. Mueller, W. E. Nagel, H. Nakashima, M. E. Papka, D. Reed, M. Sato, E. Seidel, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, S. Sumimoto, W. Tang, J. Taylor, R. Thakur, A. Trefethen, M. Valero, A. Van Der Steen, J. Vetter, P. Williams, R. Wisniewski, and K. Yelick. The International Exascale Software Project roadmap. International Journal of High Performance Computing Applications, 25(1):3--60, 2011.
[16]
S. P. Eric Goubault Matthieu Martel. Asserting thePrecision of Floating-Point Computations: A Simple Abstract Interpreter. Programming Languages and Systems, pages 287--306, 2002.
[17]
C. F. Fang, R. A. Rutenbar, M. Püschel, and T. Chen. Toward efficient static analysis of finite-precision effects in DSP applications via affine arithmetic modeling. Proceedings of the 40th conference on Design automation DAC 03, page 496, 2003.
[18]
M. Furuichi, D. a. May, and P. J. Tackley. Development of a Stokes flow solver robust to large viscosity jumps using a Schur complement approach with mixed precision arithmetic. Journal of Computational Physics, 230(24):8835--8851, Oct. 2011.
[19]
D. Göddeke and R. Strzodka. Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid. IEEE Transactions on Parallel and Distributed Systems, 22(1):22--32, 2011.
[20]
D. Göoddeke, R. Strzodka, and S. Turek. Performance and accuracy of hardware-oriented native-, emulatedand mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems, 22(4):221--256, Aug. 2007.
[21]
D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5--48, Mar. 1991.
[22]
E. Goubault. Static Analyses of the Precision of Floating-Point Operations. Static Analysis, pages 234--259, 2001.
[23]
X. Hao and A. Varshney. Variable-precision rendering. Proceedings of the 2001 symposium on Interactive 3D graphics SI3D 01, http:149--158, 2001.
[24]
N. J. Higham. Accuracy and Stability of Numerical Algorithms, Second Edition. SIAM Philadelphia, 2002.
[25]
J. D. Hogg and J. A. Scott. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems. ACM Transactions on Mathematical Software, 37(2):1--24, 2010.
[26]
IEEE. IEEE 754-2008, Standard for Floating-Point Arithmetic. IEEE, New York, Aug. 2008.
[27]
J. Jenkins, E. R. Schendel, S. Lakshminarasimhan, D. A. Boyuka II, T. Rogers, S. Ethier, R. Ross, S. Klasky, and N. F. Samatova. Byte-precision level of detail processing for variable precision analytics. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 48:1--48:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[28]
T. Kaneko and B. Liu. On Local Roundoff Errors in Floating-Point Arithmetic. J. ACM, 20(3):391--398, 1973.
[29]
W. Kraemer. A Priori Worst Case Error Bounds for Floating-Point Computations. IEEE transactions on computers, 47(7):750--756, 1998.
[30]
W. Krämer and A. Bantle. Automatic Forward Error Analysis for Floating Point Algorithms. Reliable Computing, 7(4):321--340, Aug. 2001.
[31]
T. I. Laakso and L. B. Jackson. Bounds for oating-point roundoff noise. IEEE transactions on circuits and systems, 41(6):424--426, 1994.
[32]
M. O. Lam, J. K. Hollingsworth, and G. W. Stewart. Dynamic Floating-Point Cancellation Detection. In WHIST '11, 2011.
[33]
J. L. Larson, M. E. Pasternak, and J. A. Wisniewski. Algorithm 594: Software for Relative Error Analysis. ACM Transactions on Mathematical Software, 9(1):125--130, Mar. 1983.
[34]
X. S. Li, M. C. Martin, B. J. Thompson, T. Tung, D. J. Yoo, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, S. Y. Kang, and A. Kapur. Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2):152--205, June 2002.
[35]
M. D. Linderman, M. Ho, D. L. Dill, T. H. Meng, and G. P. Nolan. Towards program optimization through automated analysis of numerical precision. Proceedings of the 8th annual IEEE ACM international symposium on Code generation and optimization CGO 10, page 230, 2010.
[36]
B. Liu and T. Kaneko. Error analysis of digital filters realized with floating-point arithmetic. Proceedings of the IEEE, 57(10):1735--1747, 1969.
[37]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser,G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM.
[38]
Precision Computations: A Semantics Approach. Programming Languages and Systems, pages 159--186, 2002.
[39]
M. Martel. Semantics-Based Transformation of Arithmetic Expressions. Static Analysis, pages 298--314, 2007.
[40]
M. Martel. Program transformation for numerical precision. In PEPM '09: Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation, pages 101--110, New York, NY, USA, Jan. 2009. ACM Press.
[41]
P. L. Richman. Automatic error analysis for determining precision. Communications of the ACM, 15(9):813--820, Sept. 1972.
[42]
Robert Strzodka and Dominik Goddeke. Mixed Precision Methods for Convergent Iterative Schemes. In Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, May 2006, 2006.
[43]
Robert Strzodka and Dominik Goddeke. Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components. IEEE Proceedings on Field-Programmable Custom Computing, 2006.
[44]
J. H. Wilkinson. Rounding Errors in Algebraic Processes. Prentice-Hall, Inc., 1964.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
June 2013
512 pages
ISBN:9781450321303
DOI:10.1145/2464996
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary instrumentation
  2. floating-point
  3. mixed precision

Qualifiers

  • Research-article

Conference

ICS'13
Sponsor:
ICS'13: International Conference on Supercomputing
June 10 - 14, 2013
Oregon, Eugene, USA

Acceptance Rates

ICS '13 Paper Acceptance Rate 43 of 202 submissions, 21%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)7
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media