skip to main content
10.1109/GreenCom-CPSCom.2010.102guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU

Published: 18 December 2010 Publication History

Abstract

As one of the most popular accelerators, Graphics Processing Unit (GPU) has demonstrated high computing power in several application fields. On the other hand, GPU also produces high power consumption and has been one of the most largest power consumers in desktop and supercomputer systems. However, software power optimization method targeted for GPU has not been well studied. In this work, we propose kernel fusion method to reduce energy consumption and improve power efficiency on GPU architecture. Through fusing two or more independent kernels, kernel fusion method achieves higher utilization and much more balanced demand for hardware resources, which provides much more potential for power optimization, such as dynamic voltage and frequency scaling (DVFS). Basing on the CUDA programming model, this paper also gives several different fusion methods targeted for different situations. In order to make judicious fusion strategy, we deduce the process of fusing multiple independent kernels as a dynamic programming problem, which could be well solved with many existing tools and be simply embedded into compiler or runtime system. To reduce the overhead introduced by kernel fusion, we also propose effective method to reduce the usage of shared memory and coordinate the thread space of the kernels to be fused. Detailed experimental evaluation validates that the proposed kernel fusion method could reduce energy consumption without performance loss for several typical kernels.

References

[1]
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.- m. W. Hwu, "Optimization principles and application performance evaluation of a multithreaded gpu using cuda," in PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. New York, NY, USA: ACM, 2008, pp. 73-82.
[2]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, "A performance study of general-purpose applications on graphics processors using CUDA," J. Parallel Distrib. Comput, vol. 68, no. 10, pp. 1370-1380, 2008.
[3]
V. Volkov and J. Demmel, "Benchmarking GPUs to tune dense linear algebra," in SC. IEEE/ACM, 2008, p. 31.
[4]
"Top500 list," http://www.top500.org/lists/2009/11/, Nov 2009.
[5]
S. Ryoo, C. I. Rodrigues, S. S. Stone, J. A. Stratton, S.-Z. Ueng, S. S. Baghsorkhi, and W. mei W. Hwu, "Program optimization carving for GPU computing," J. Parallel Distrib. Comput, vol. 68, no. 10, pp. 1389-1401, 2008.
[6]
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A compiler framework for optimization of affine loop nests for gpgpus," in ICS, P. Zhou, Ed. ACM, 2008, pp. 225-234.
[7]
S. Krishnamoorthy, M. M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan, "Gpgpu compiler for memory optimization and parallelism management," in PLDI, J. Ferrante and K. S. McKinley, Eds. ACM, 2007, pp. 235-244.
[8]
Y. Zhu, G. Magklis, M. L. Scott, C. Ding, and D. H. Albonesi, "The energy impact of aggressive loop fusion," in PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, 2004, pp. 153-164.
[9]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), April 2009, pp. 163-174.
[10]
D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A framework for architectural-level power analysis and optimizations," in In Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 83-94.
[11]
L.-S. P. Andrew B. Kahng, Bin Li and K. Samadi, "Orion 2.0: A fast and accurate noc power and area model for early-stage design space exploration," in DATE'09: Proceedings of Design, Automation and Test in Europe, 2009, pp. 423-428.
[12]
S. Hong and H. Kim, "An integrated gpu power and performance model," in ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture. New York, NY, USA: ACM, 2010, pp. 280-289.
[13]
G. Wang, T. Tang, X. Fang, and X. Ren, "Program optimization of array-intensive spec2k benchmarks on multithreaded gpu using cuda and brook+," International Conference on Parallel and Distributed Systems, vol. 0, pp. 292- 299, 2009.
[14]
K. Li, "Performance analysis of power-aware task scheduling algorithms on multiprocessor computers with dynamic voltage and speed," IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 11, pp. 1484-1497, 2008.
[15]
Y. Dong, J. Chen, X. Yang, L. Deng, and X. Zhang, "Energy-oriented openmp parallel loop scheduling," in In the Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. Washington, DC, USA: IEEE Computer Society, 2008, pp. 162-169.
[16]
Y. Wang, K. Ma, and X. Wang, "Temperature-constrained power control for chip multiprocessors with online model estimation," ACM SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 314-324, 2009.
[17]
M. Curtis-Maury, F. Blagojevic, C. D. Antonopoulos, and D. S. Nikolopoulos, "Prediction-based power-performance adaptation of multithreaded scientific codes," IEEE Transactions on Parallel and Distributed Systems, vol. 19, pp. 1396-1410, 2007.
[18]
"Powerplay Tech." http://www.amd.com/us/products/technologies/ati-power-play/Pages/ati-power-play.aspx.
[19]
S. Huang, S. Xiao, and W. Feng, "On the energy efficiency of graphics processing units for scientific computing," in Fifth Workshop on High-Performance, Power-Aware Computing (HPPAC'09). Washington, DC, USA: IEEE Computer Society, 2009, pp. 1-8.
[20]
S. Collange, D. Defour, and A. Tisserand, "Power consumption of gpus from a software perspective," in ICCS '09: Proceedings of the 9th International Conference on Computational Science. Berlin, Heidelberg: Springer-Verlag, 2009, pp. 914-923.
[21]
M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron., "Enabling task parallelism in the cuda scheduler," in Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA), Raleigh, NC, USA, 2009, pp. 69-76.

Cited By

View all
  • (2024)VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand PackingProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673045(1012-1021)Online publication date: 12-Aug-2024
  • (2024)T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640410(1146-1164)Online publication date: 27-Apr-2024
  • (2024)RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input PreprocessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640406(964-979)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
GREENCOM-CPSCOM '10: Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
December 2010
974 pages
ISBN:9780769543314

Publisher

IEEE Computer Society

United States

Publication History

Published: 18 December 2010

Author Tags

  1. GPGPU
  2. Kernel Fusion
  3. Power Efficiency
  4. Power Optimization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)617
  • Downloads (Last 6 weeks)55
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media