skip to main content
10.1145/1944862.1944880acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

Automated empirical tuning of scientific codes for performance and power consumption

Published: 24 January 2011 Publication History

Abstract

Automatic empirical tuning of compiler optimizations has been widely used to achieve portable high performance for scientific applications. However, as power dissipation becomes increasingly important in modern architecture design, few have attempted to empirically tune optimization configurations to reduce the power consumption of applications. We provide an automated empirical tuning framework that can be configured to optimize for both performance and energy efficiency. In particular, we extensively parameterize the configuration of a large number of compiler optimizations, including loop parallelization, blocking, unroll-and-jam, array copying, scalar replacement, strength reduction, and loop unrolling. We then use hardware counters combined with elapsed time to estimate both the performance and the power consumption of differently optimized code to automatically discover desirable configurations for these optimizations. We use a power meter to verify our tuning results on two multi-core computers and show that our approach can effectively achieve a balanced performance and energy efficiency on modern CMP machines.

References

[1]
N. Baradaran, J. Chame, C. Chen, P. Diniz, M. Hall, Y.-J. Lee, B. Liu, and R. Lucas. Eco: An empirical-based compilation and optimization system. In International Parallel and Distributed Processing Symposium, 2003.
[2]
D. Bautista, J. Sahuquillo, H. Hassan, S. Petit, and J. Duato. A simple power-aware scheduling for multicore systems when running real-time applications. In Proc. 22nd IEEE/ACM International Parallel and Distributed Processing Symposium, Apr. 2008.
[3]
Y. Ben-Itzhak, I. Cidon, and A. Kolodny. Performance and power aware cmp thread allocation modeling. In International Conference on High Performance Embedded Architectures and Compilers, pages 232--246, Jan. 2010.
[4]
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using phipac: a portable, high-performance, ansi c coding methodology. In Proc. the 11th international conference on Supercomputing, pages 340--347, New York, NY, USA, 1997. ACM Press.
[5]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proc. 27th annual international symposium on Computer architecture, pages 83--94, New York, NY, USA, 2000. ACM.
[6]
C. Chen, J. Chame, and M. Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In International Symposium on Code Generation and Optimization, San Jose, CA, USA, March 2005.
[7]
G. Contreras and M. Martonosi. Power prediction for intel xscale® processors using performance monitoring unit events. In International symposium on Low power electronics and design, pages 221--226, New York, NY, USA, 2005. ACM.
[8]
M. Frigo and S. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3, page 1381, 1998.
[9]
T. Kisuki, P. Knijnenburg, M. O'Boyle, and H. Wijsho. Iterative compilation in program optimization. In Compilers for Parallel Computers, pages 35--44, 2000.
[10]
C. Lee, J. K. Lee, T. Hwang, and S.-C. Tsai. Compiler optimization on instruction scheduling for low power. In ISSS '00: Proceedings of the 13th international symposium on System synthesis, pages 55--60, Washington, DC, USA, 2000. IEEE Computer Society.
[11]
J. Moura, J. Johnson, R. Johnson, D. Padua, M. Puschel, and M. Veloso. Spiral: Automatic implementation of signal processing algorithms. In Proceedings of the Conference on High-Performance Embedded Computing, MIT Lincoln Laboratories, Boston, MA, 2000.
[12]
J. L. Myers and A. D. Well. Research Design and Statistical Analysis. Lawrence Erlbaum Associates, Inc., 2003.
[13]
Z. Pan and R. Eigenmann. Fast automatic procedure-level performance tuning. In Proc. Parallel Architectures and Compilation Techniques, 2006.
[14]
A. Parikh, S. K. Kim, M. Vijaykrishnan, and M. J. N. Irwin. Instruction scheduling for low power. Journal of VLSI Signal Processing Systems For Signal Image And Video Technology, (1):129--149, 2004.
[15]
G. Pike and P. Hilfinger. Better tiling and array contraction for compiling scientific programs. In SC, Baltimore, MD, USA, November 2002.
[16]
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on Program Generation, Optimization, and Adaptation, 93(2), 2005.
[17]
A. Qasem and K. Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In Proceedings of the 20th ACM International Conference on SuperComputing (ICS06), June 2006.
[18]
W.-T. Shiue. Retargetable compilation for low power. In Proc. the 9th international symposium on Hardware/software codesign, pages 254--259, New York, NY, USA, 2001. ACM.
[19]
K. Singh, M. Bhadauria, and S. A. McKee. Real time power estimation and thread scheduling via performance counters. In Proc. Workshop on Design, Architecture, and Simulation of Chip Multi-Processors, Como, IT., November 2008.
[20]
V. Tiwari, S. Malik, A. Wolfe, and M. T.-C. Lee. Instruction level power analysis and optimization of software. The Journal of VLSI Signal Processing, 13(2--3):223--238, 1996.
[21]
M. J. Voss and R. Eigenmann. High-level adaptive program optimization with adapt. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 93--102, 2001.
[22]
R. Vuduc, J. Demmel, and K. Yelick. OSKI: An interface for a self-optimizing library of sparse matrix kernels, 2005. bebop.cs.berkeley.edu/oski.
[23]
R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1):3--25, 2001.
[24]
H. Yang, G. R. Gao, and C. Leung. On achieving balanced power consumption in software pipelined loops. In CASES '02: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, pages 210--217, New York, NY, USA, 2002. ACM.
[25]
Q. Yi, K. Seymour, H. You, R. Vuduc, and D. Quinlan. POET: Parameterized optimizations for empirical tuning. In Workshop on Performance Optimization for High-Level Languages and Libraries, Mar 2007.
[26]
Q. Yi and C. Whaley. Automated transformation for performance-critical kernels. In ACM SIGPLAN Symposium on Library-Centric Software Design, Montreal, Canada, Oct. 2007.

Cited By

View all

Index Terms

  1. Automated empirical tuning of scientific codes for performance and power consumption

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
    January 2011
    226 pages
    ISBN:9781450302418
    DOI:10.1145/1944862
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • HiPEAC: HiPEAC Network of Excellence

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 January 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. compiler optimizations
    2. empirical tuning
    3. performance
    4. power consumption

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    HIPEAC '11
    Sponsor:
    • HiPEAC

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media