skip to main content
10.1145/2451116.2451155acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Published: 16 March 2013 Publication History

Abstract

Most hardware and software venders suggest disabling hardware prefetching in virtualized environments. They claim that prefetching is detrimental to application performance due to inaccurate prediction caused by workload diversity and VM interference on shared cache. However, no comprehensive or quantitative measurements to support this belief have been performed.
This paper is the first to systematically measure the influence of hardware prefetching in virtualized environments. We examine a wide variety of benchmarks on three types of chip-multiprocessors (CMPs) to analyze the hardware prefetching performance. We conduct extensive experiments by taking into account a number of important virtualization factors. We find that hardware prefetching has minimal destructive influence under most configurations. Only with certain application combinations does prefetching influence the overall performance.
To leverage these findings and make hardware prefetching effective across a diversity of virtualized environments, we propose a dynamic prefetching-aware VCPU-core binding approach (PAVCB), which includes two phases - classifying and binding. The workload of each VM is classified into different cache sharing constraint categories based upon its cache access characteristics, considering both prefetch requests and demand requests. Then following heuristic rules, the VCPUs of each VM are scheduled onto appropriate cores subject to cache sharing constraints. We show that the proposed approach can improve performance by 12% on average over the default scheduler and 46% over manual system administrator bindings across different workload combinations in the presence of hardware prefetching.

References

[1]
Adams, K., and Agesen, O. A comparison of software and hardware techniques for x86 virtualization. In ASPLOS (2006), pp. 2--13.
[2]
AMD. BIOS and kernel developer's guide for AMD family 10h processors. White Paper, 2010.
[3]
Barrow-Williams, N., Fensch, C., and Moore, S. A communication characterisation of Splash-2 and Parsec. In IISWC (2009), pp. 86--97.
[4]
Bhattacharjee, A., and Martonosi, M. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In PACT (2009), pp. 29--40.
[5]
Bienia, C. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.
[6]
Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. Coordinated control of multiple prefetchers in multi-core systems. In Micro (2009), pp. 316--326.
[7]
Ebrahimi, E., Mutlu, O., and Patt, Y. N. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA (2009), pp. 7 -- 17.
[8]
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A. D., Ailamaki, A., and Falsafi, B. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS (2012), pp. 37--48.
[9]
Filebench. Filebench. http://sourceforge.net/apps/mediawiki/filebench.
[10]
Govindan, S., Liu, J., Kansal, A., and Sivasubramaniam, A. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SoCC (2011), pp. 22:1--22:14.
[11]
IBM. IBM eServer xSeries 366 tuning tips. Technical Report, 2005.
[12]
IBM. Virtualization on the IBM system x3950 server. Technical Report, 2006.
[13]
IBM. Tuning IBM system x servers for performance. Technical Report, 2007.
[14]
Intel. Achieving fast, scalable I/O for virtualized servers. White Paper, 2009.
[15]
Jaleel, A., Najaf-abadi, H. H., Subramaniam, S., Steely, S. C., and Emer, J. CRUISE: cache replacement and utility-aware scheduling. In ASPLOS (2012), pp. 249--260.
[16]
Jaleel, A., Theobald, K. B., Steely, Jr., S. C., and Emer, J. High performance cache replacement using re-reference interval prediction (RRIP). In ISCA (2010), pp. 60--71.
[17]
Jones, S. T., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Geiger: monitoring the buffer cache in a virtual machine environment. In ASPLOS (2006), pp. 14--24.
[18]
Khan, S. M., Tian, Y., and Jimenez, D. A. Sampling dead block prediction for last-level caches. In Micro (2010), pp. 175--186.
[19]
Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. Prefetch-Aware DRAM controllers. In Micro (2008), pp. 200--209.
[20]
Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. Prefetch-aware shared resource management for multi-core systems. In ISCA (2011), pp. 141--152.
[21]
Liu, F., Jiang, X., and Solihin, Y. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In HPCA (2010), pp. 1--12.
[22]
Liu, F., and Solihin, Y. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. In SIGMETRICS (2011), pp. 37--48.
[23]
Lo, J., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. An analysis of database workload performance on simultaneous multithreaded processors. In ISCA (1998), pp. 39--50.
[24]
Ma, Z., Sheng, Z., Gu, L., Wen, L., and Zhang, G. DVM: towards a datacenter-scale virtual machine. In VEE (2012), pp. 39--50.
[25]
Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M., and Moscibroda, T. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Micro (2011), pp. 374--385.
[26]
Ongaro, D., Cox, A. L., and Rixner, S. Scheduling I/O in virtual machine monitors. In VEE (2008), pp. 14--24.
[27]
OProfile. A system profiler for Linux. http://http://oprofile.sourceforge.net.
[28]
Pan, S., Cherng, C., Dick, K., and Ladner, R. E. Algorithms to take advantage of hardware prefetching. In ALENEX (2007).
[29]
Singh, B. Page/slab cache control in a virtualized environment. In Linux Symposium (2010), pp. 252--262.
[30]
Soares, L., Tam, D., and Stumm, M. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In MICRO (2008), pp. 258--269.
[31]
Srikantaiah, S., Kandemir, M., and Irwin, M. J. Adaptive set pinning: managing shared caches in chip multiprocessors. In ASPLOS (2008), pp. 135--144.
[32]
Srinath, S., Mutlu, O., Kim, H., and Patt, Y. N. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA (2007), pp. 63--74.
[33]
SysBench. Sysbench: a system performance benchmark. http://sysbench.sourceforge.net.
[34]
Tam, D., Azimi, R., Soares, L., and Stumm, M. Managing shared L2 caches on multicore systems in software. In WIOSCA (2007).
[35]
Tang, L., Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. The impact of memory subsystem resource sharing on datacenter applications. In ISCA (2011), pp. 283--294.
[36]
Verma, S., Koppelman, D. M., and Peng, L. Efficient prefetching with hybrid schemes and use of program feedback to adjust prefetcher aggressiveness. Journal of Instruction-Level Parallelism, 13 (2011), 1--14.
[37]
VMware. VMware VMmark v1.0.0 Results - Dell PowerEdge R900. Technical Report, 2008.
[38]
VMware. Performance best practices for VMware vSphere 5.0. Technical Report, 2011.
[39]
Waldspurger, C. A. Memory resource management in vmware esx server. In SIGOPS Oper. Syst. Rev. (2002), pp. 181--194.
[40]
Wu, C.-J., Jaleel, A., Martonosi, M., Steely, Jr., S. C., and Emer, J. PACMan: prefetch-aware cache management for high performance caching. In Micro (2011), pp. 442--453.
[41]
Xie, Y., and Loh, G. H. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA (2009), pp. 174--183.
[42]
Zhang, E., Jiang, Y., and Shen, X. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In PPoPP (2010), pp. 203--212.
[43]
Zhang, X., Dwarkadas, S., and Shen, K. Towards practical page coloring-based multicore cache management. In Eurosys (2009), pp. 89--102.
[44]
Zhuravlev, S., Blagodurov, S., and Fedorova, A. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS (2010), pp. 129--142.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
March 2013
574 pages
ISBN:9781450318709
DOI:10.1145/2451116
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
    ASPLOS '13
    March 2013
    540 pages
    ISSN:0163-5964
    DOI:10.1145/2490301
    Issue’s Table of Contents
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 4
    ASPLOS '13
    April 2013
    540 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2499368
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. prefetching
  2. shared cache
  3. virtualization

Qualifiers

  • Research-article

Conference

ASPLOS '13

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media