Article

Pegasus: coordinated scheduling for virtualized accelerator-based systems

Authors:

Vishakha Gupta,

Karsten Schwan,

Parthasarathy RanganathanAuthors Info & Claims

USENIXATC'11: Proceedings of the 2011 USENIX conference on USENIX annual technical conference

Page 3

Published: 15 June 2011 Publication History

Abstract

Heterogeneous multi-cores--platforms comprised of both general purpose and accelerator cores--are becoming increasingly common. While applications wish to freely utilize all cores present on such platforms, operating systems continue to view accelerators as specialized devices. The Pegasus system described in this paper uses an alternative approach that offers a uniform resource usage model for all cores on heterogeneous chip multiprocessors. Operating at the hypervisor level, its novel scheduling methods fairly and efficiently share accelerators across multiple virtual machines, thereby making accelerators into first class schedulable entities of choice for many-core applications. Using NVIDIA GPGPUs coupled with x86-based general purpose host cores, a Xen-based implementation of Pegasus demonstrates improved performance for applications by better managing combined platform resources. With moderate virtualization penalties, performance improvements range from 18% to 140% over base GPU driver scheduling when the GPUs are shared.

References

[1]

AMAZON INC. High Performance Computing Using Amazon EC2. http://aws.amazon.com/ec2/hpc-applications/.

[2]

BAKHODA, A., YUAN, G. L., FUNG, W. W., ET AL. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In ISPASS (Boston, USA, 2009).

[3]

BARHAM, P., DRAGOVIC, B., FRASER, K., ET AL. Xen and the art of virtualization. In SOSP (Bolton Landing, USA, 2003).

[4]

BAUMANN, A., BARHAM, P., DAGAND, P. E., ET AL. The multikernel: a new OS architecture for scalable multicore systems. In SOSP (Big Sky, USA, 2009).

[5]

BERGMANN, A. The Cell Processor Programming Model. In LinuxTag (2005).

[6]

BORDAWEKAR, R., BONDHUGULA, U., AND RAO, R. Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application! Tech. Report RC24982, IBM T. J. Watson Research Center, 2010.

[7]

CHISNALL, D. The Definitive Guide to the Xen Hypervisor, 1st ed. Prentice Hall, 2008.

[8]

DIAMOS, G., AND YALAMANCHILI, S. Harmony: An Execution Model and Runtime for Heterogeneous Many Core Systems. In HPDC Hot Topics (Boston, USA, 2008).

[9]

DOWTY, M., AND SUGERMAN, J. GPU Virtualization on VMware's Hosted I/O Architecture. In WIOV (San Diego, USA, 2008).

[10]

FEDOROVA, A., KUMAR, V., KAZEMPOUR, V., ET AL. Cypress: A Scheduling Infrastructure for a Many-Core Hypervisor. In MMCS (Boston, USA, 2008).

[11]

GOVIL, K., TEODOSIU, D., HUANG, Y., ET AL. Cellular Disco: resource management using virtual clusters on shared-memory multiprocessors. In SOSP (Charleston, USA, 1999).

[12]

GUEVARA, M., GREGG, C., HAZELWOOD, K., ET AL. Enabling Task Parallelism in the CUDA Scheduler. In PMEA (Raleigh, USA, 2009).

[13]

GUPTA, V., GAVRILOVSKA, A., SCHWAN, K., ET AL. GViM: GPU-accelerated Virtual Machines. In HPCVirt (Nuremberg, Germany, 2009).

[14]

GUPTA, V., XENIDIS, J., TEMBEY, P., ET AL. Cellule: Lightweight Execution Environment for Accelerator-based Systems. Tech. Rep. GIT-CERCS-10-03, Georgia Tech, 2010.

[15]

HEINIG, A., STRUNK, J., REHM, W., ET AL. ACCFS - Operating System Integration of Computational Accelerators Using a VFS Approach, vol. 5453. Springer Berlin, 2009.

[16]

JIMÉNEZ, V. J., VILANOVA, L., GELADO, I., ET AL. Predictive Runtime Code Scheduling for Heterogeneous Architectures. In HiPEAC (Paphos, Cyprus, 2009).

[17]

JOHNSON, C., ALLEN, D. H., BROWN, J., ET AL. A Wire-Speed PowerTM Processor: 2.3GHz 45nm SOI with 16 Cores and 64 Threads. In ISSCC (San Francisco, USA, 2010).

[18]

KERR, A., DIAMOS, G., AND YALAMANCHILI, S. A Characterization and Analysis of PTX Kernels. In IISWC (Austin, USA, 2009).

[19]

KHRONOS GROUP. The OpenCL Specification. http://tinyurl.com/OpenCL08, 2008.

[20]

KUMAR, S., TALWAR, V., KUMAR, V., ET AL. vManage: Loosely Coupled Platform and Virtualization Management in Data Centers. In ICAC (Barcelona, Spain, 2009).

[21]

LAGAR-CAVILLA, H. A., TOLIA, N., SATYANARAYANAN, M., ET AL. VMM-independent graphics acceleration. In VEE (San Diego, CA, 2007).

[22]

LANGE, J., PEDRETTI, K., DINDA, P., ET AL. Palacios: A New Open Source Virtual Machine Monitor for Scalable High Performance Computing. In IPDPS (Atlanta, USA, 2010).

[23]

LUK, C.-K., HONG, S., AND KIM, H. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Micro-42 (New York, USA, 2009).

[24]

MARCIAL, E. The ICE Financial Application. http://www.theice.com, 2010. Private Communication.

[25]

MICROSOFT CORP. What is Photosynth? http://photosynth.net/about.aspx, 2010.

[26]

NIGHTINGALE, E. B., HODSON, O., MCILROY, R., ET AL. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP (Big Sky, USA, 2009).

[27]

NVIDIA CORP. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. http://tinyurl.com/nvidia-fermi-whitepaper.

[28]

NVIDIA CORP. NVIDIA CUDA Compute Unified Device Architecture. http://tinyurl.com/cx3tl3, 2007.

[29]

RAJ, H., AND SCHWAN, K. High performance and scalable I/O virtualization via self-virtualized devices. In HPDC (Monterey, USA, 2007).

[30]

RYOO, S., RODRIGUES, C. I., BAGHSORKHI, S. S., ET AL. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP (Salt Lake City, USA, 2008).

[31]

SHIMPI, A. L. Intel's Sandy Bridge Architecture Exposed. http://tinyurl.com/SandyBridgeArch.

[32]

SNAPFISH. About Snapfish. http://www.snapfish.com.

[33]

SNAVELY, N., SEITZ, S. M., AND SZELISKI, R. Modeling the World from Internet Photo Collections. International Journal of Computer Vision 80, 2 (2008).

[34]

TURNER, J. A. The Los Alamos Roadrunner Petascale Hybrid Supercomputer: Overview of Applications, Results, and Programming. Roadrunner Technical Seminar Series, 2008.

[35]

VETTER, J., GLASSBROOK, D., DONGARRA, J., ET AL. Keeneland - Enabling Heterogeneous Computing For The Open Science Community. http://tinyurl.com/KeenelandSC10, 2010.

[36]

VMWARE CORP. VMware vSphere 4: The CPU Scheduler in VMware ESX 4. http://tinyurl.com/ykenbjw, 2009.

Cited By

Hunt TJia ZMiller VSzekely AHu YRossbach CWitchel EBhagwan RPorter G(2020)TelekineProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388301(817-834)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388301
Xu XZhang NCui MHe JSurana RDelimitrou CPorts D(2019)Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware schedulerProceedings of the 11th USENIX Conference on Hot Topics in Cloud Computing10.5555/3357034.3357051(14-14)Online publication date: 8-Jul-2019
https://dl.acm.org/doi/10.5555/3357034.3357051
Zhang KHe BHu JWang ZHua BMeng JYang LSeshan SBanerjee S(2018)G-netProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307458(187-200)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.5555/3307441.3307458
Show More Cited By

Pegasus: coordinated scheduling for virtualized accelerator-based systems

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIXATC'11: Proceedings of the 2011 USENIX conference on USENIX annual technical conference

June 2011

36 pages

Program Chairs:
Jason Nieh
Columbia University
,
Carl Waldspurger

Publisher

USENIX Association

United States

Publication History

Published: 15 June 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hunt TJia ZMiller VSzekely AHu YRossbach CWitchel EBhagwan RPorter G(2020)TelekineProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388301(817-834)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388301
Xu XZhang NCui MHe JSurana RDelimitrou CPorts D(2019)Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware schedulerProceedings of the 11th USENIX Conference on Hot Topics in Cloud Computing10.5555/3357034.3357051(14-14)Online publication date: 8-Jul-2019
https://dl.acm.org/doi/10.5555/3357034.3357051
Zhang KHe BHu JWang ZHua BMeng JYang LSeshan SBanerjee S(2018)G-netProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307458(187-200)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.5555/3307441.3307458
Hu YRallapalli SKo BGovindan RPierre GFerreira PShrira L(2018)OlympianProceedings of the 19th International Middleware Conference10.1145/3274808.3274813(53-65)Online publication date: 26-Nov-2018
https://dl.acm.org/doi/10.1145/3274808.3274813
Farooqui NRoy IChen YTalwar VBarik RLewis BShpeisman TSchwan K(2018)Accelerating Data Analytics on Integrated GPU Platforms via Runtime SpecializationInternational Journal of Parallel Programming10.1007/s10766-016-0482-x46:2(336-375)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10766-016-0482-x
Hong CSpence INikolopoulos D(2017)GPU Virtualization and Scheduling MethodsACM Computing Surveys10.1145/306828150:3(1-37)Online publication date: 29-Jun-2017
https://dl.acm.org/doi/10.1145/3068281
Panneerselvam SSwift MZaks AMendelson BRauchwerger LHwu W(2016)RinneganProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967964(373-386)Online publication date: 11-Sep-2016
https://dl.acm.org/doi/10.1145/2967938.2967964
Farooqui NRoy IChen YTalwar VSchwan KPalermo GFeo JTumeo AFranke H(2016)Accelerating graph applications on integrated GPU platforms via instrumentation-driven optimizationsProceedings of the ACM International Conference on Computing Frontiers10.1145/2903150.2903152(19-28)Online publication date: 16-May-2016
https://dl.acm.org/doi/10.1145/2903150.2903152
Farooqui NKaeli DCavazos J(2016)A systems perspective on GPU computingProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884057(72-81)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2884045.2884057
Gao YSherwood T(2016)Hardware-Assisted Context Management for Accelerator VirtualizationProceedings of the 29th International Conference on Architecture of Computing Systems -- ARCS 2016 - Volume 963710.1007/978-3-319-30695-7_6(72-83)Online publication date: 4-Apr-2016
https://dl.acm.org/doi/10.1007/978-3-319-30695-7_6
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten