research-article

Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature

Authors:

Cheol Hong Kim,

Sung Woo Chung,

Joong Chae NaAuthors Info & Claims

RACS '11: Proceedings of the 2011 ACM Symposium on Research in Applied Computation

Pages 38 - 43

https://doi.org/10.1145/2103380.2103388

Published: 02 November 2011 Publication History

Abstract

In recent computing systems, CPUs have encountered the situations in which they cannot meet the increasing throughput demands. To overcome the limits of CPUs in processing heavy tasks, especially for computer graphics, GPUs have been widely used. Therefore, the performance of up-to-date computing systems can be maximized when the task scheduling between the CPU and the GPU is optimized. In this paper, we analyze the system in the perspective of performance, energy efficiency, and temperature according to the execution methods between the CPU and the GPU. Experimental results show that the GPU leads to better efficiency compared to the CPU when single application is executed. However, when two applications are executed, the GPU does not guarantee superior efficiency than the CPU depending on the application characteristics.

References

[1]

D. Geer, "Industry Trends: Chip Makers Turn to Multicore Processors", Computer, 38, 5(May 2005), DOI=http://dx.doi.org/10.1109/MC.2005.160.

Digital Library

[2]

Akenine-Moller, T. and Hainess, E. 2002. Real-Time Rendering. 2nd edition. AK Peters Ltd.

Digital Library

[3]

Dadvar, P., and Skadron, K. 205. Potential thermal security risks. In Proceedings of the IEEE/ASME Semiconductor Thermal Measurement and Management Symposium (San Jose, USA, March 15--17, 2005). DOI=10.1109/STHERM.2005.1412184

[4]

R. Jayaseelan and T. Mitra, "Temperature Aware Scheduling for Embedded processors", Journal of Low Power Electronics, 2009.

[5]

K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy and D. Tarjan, "Temperature-aware Microarchitecture: Modeling and Implementation", ACM Transactions on Architecture and Code Optimization TACO, 1(1), 2004.

Digital Library

[6]

NVIDIA CUDA Programming Guide V3.1.1, available at http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf

[7]

OpenCL, available at http://www.khronos.org/opencl/

[8]

W. R. Mark, R. S. Glanville, K. Akeley, and M. J. Kilgard, Cg: A System for Programming Graphics Hardware in a C-like Language, ACM Transactions on Graphics (TOG), 22, 3 (July 2003), DOI=http://doi.acm.org/10.1145/143369.143405.

Digital Library

[9]

Icrosoft HLSL, available at http://msdn2.microsoft.com/en-us/library/bb509638.aspx

[10]

The OpenGL Shading Language, available at http://www.opengl.org/registry/doc/GLSLangSpec.Full.1.20.8.pdf

[11]

S. Che, J. Meng, J. Sheaffer and K. Skadron, "A performance study of general purpose applications on graphics processors", Journal of Parallel and Distributed Computing, 2008.

Digital Library

[12]

Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B. and Hwu, W. W. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming (Salt Lake City, USA, February 20--23, 2008). PPoPP '08. ACM Press, New York, NY, 73--82. DOI=http://doi.acm.org/10.1145/1345206.1345220

Digital Library

[13]

P. Sripada, "Mp3 decoder in theory Practice", University essay from Blekinge Tekniska Högskola/Sektionen för Teknik (TEK), 2006.

[14]

J. H. Sung, Y. W. Lee, A. Han, W. I. Choi and D. S. Kwon, "A Parallel Bulk Loading Method for B+-Tree Using CUDA", Journal of KIISE: Computing Practices and Letters, 2010.

[15]

A. J. Park, H. H. Jang and K. C. Jung, "Implementation of Neural Networks using CUDA and OpenMP", Journal of KIISE: Software and Applications, 2009.

[16]

lm-sensors, available at http://lm-sensors.org

[17]

Parboil benchmark suite, available at http://www.crhc.uiuc.edu/impact/parboil.php.

[18]

Hanson, H. and Keckler, S. W. 2004. Coordinated Power, Energy, and Temperature Management for High-Performance Microprocessors. In Proceedings of the Austin Center for Advanced Studies {IBM} Conference.

Digital Library

[19]

J. H. Jeong, "Heat-radiant and Cooling Device of Central Processing Unit and Peripheral devices", Journal of Korea Intellectual patent society, 2006.

[20]

J. H, Choi, "Thermal Management for Multi-core Processor and Prototyping Thermal-aware Task Scheduler", Journal of KIISE: Computer Systems and Theory, 2008.

[21]

V. Jiménez, L. Vilanova, I. Gelado, M. Gil, G. Fursin, and N. Navarro, "Predictive runtime code scheduling for heterogeneous architectures", In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, Springer, 20

Digital Library

Cited By

Valera HDalmau MRoose PLarracoechea JHerzog CHung CHong JBechini ASong E(2021)An energy saving approachProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441888(69-78)Online publication date: 22-Mar-2021
https://dl.acm.org/doi/10.1145/3412841.3441888
Tsai TChen YHe XLi C(2018)STEM: A Thermal-Constrained Real-Time Scheduling for 3D Heterogeneous-ISA Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2017.278394167:6(874-889)Online publication date: 1-Jun-2018
https://doi.org/10.1109/TC.2017.2783941

Index Terms

Exploration of CPU/GPU co-execution: from the perspective of performance, energy, and temperature
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC
IA³ '14: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms

In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on ...
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous ...
Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study

In this paper, we present the analysis and development of a cross-platform OpenCL implementation of the box-counting algorithm, which is one of the most widely-used methods for estimating the Fractal Dimension. The Fractal Dimension is a relevant image ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RACS '11: Proceedings of the 2011 ACM Symposium on Research in Applied Computation

November 2011

355 pages

ISBN:9781450310871

DOI:10.1145/2103380

General Chairs:
Rex E. Gantenbein
University of Wyoming
,
Tei-Wei Kuo
National Taiwan University, Taiwan

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing
ACCT: Association of Convergent Computing Technology
CUSST: University of Suwon: Center for U-city Security & Surveillance Technology of the University of Suwon
KIISE: Korean Institute of Information Scientists and Engineers
KISTI

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

RACS '11

Sponsor:

SIGAPP
ACCT
CUSST
KIISE

RACS '11: Research in Applied Computation Symposium

November 2 - 5, 2011

Florida, Miami

Acceptance Rates

Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
370
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Valera HDalmau MRoose PLarracoechea JHerzog CHung CHong JBechini ASong E(2021)An energy saving approachProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441888(69-78)Online publication date: 22-Mar-2021
https://dl.acm.org/doi/10.1145/3412841.3441888
Tsai TChen YHe XLi C(2018)STEM: A Thermal-Constrained Real-Time Scheduling for 3D Heterogeneous-ISA Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2017.278394167:6(874-889)Online publication date: 1-Jun-2018
https://doi.org/10.1109/TC.2017.2783941

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents