skip to main content
10.1145/1006209.1006246acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

CQoS: a framework for enabling QoS in shared caches of CMP platforms

Published: 26 June 2004 Publication History

Abstract

Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and their workloads range from single-threaded and multithreaded applications to complex virtual machines (VMs), a shared cache resource will be consumed by these different entities generating heterogeneous memory access streams exhibiting different locality properties and varying memory sensitivity. As a result, conventional cache management approaches that treat all memory accesses equally are bound to result in inefficient space utilization and poor performance even for applications with good locality properties. To address this problem, this paper presents a new cache management framework (CQoS) that (1) recognizes the heterogeneity in memory access streams, (2) introduces the notion of QoS to handle the varying degrees of locality and latency sensitivity and (3) assigns and enforces priorities to streams based on latency sensitivity, locality degree and application performance needs. To achieve this, we propose CQoS options for priority classification, priority assignment and priority enforcement. We briefly describe CQoS priority classification and assignment options -- ranging from user-driven and developer-driven to compiler-detected and flow-based approaches. Our focus in this paper is on CQoS mechanisms for priority enforcement -- these include (1) selective cache allocation, (2) static/dynamic set partitioning and (3) heterogeneous cache regions. We discuss the architectural design and implementation complexity of these CQoS options. To evaluate the performance trade-offs for these options, we have modeled these CQoS options in a cache simulator and evaluated their performance in CMP platforms running network-intensive server workloads. Our simulation results show the effectiveness of our proposed options and make the case for CQoS in future multi-threaded/multi-core platforms since it improves shared cache efficiency and increases overall system performance as a result.

References

[1]
H. Abdel-Shafi, et al., "An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors," Proceedings of the 3rd International Symposium on High-Performance Computer Architecture, February 1997, 204--215.]]
[2]
K. Beyls, "Faster Computing through Software-Controlled Cache Replacement," http://escher.elis.ugent.be/publ/Edocs/DOC/P102_118.pdf]]
[3]
F. Bodin, A. Seznec, "Skewed Associativity improves performance and enhances predictability", IEEE Transactions on Computers, May 1997.]]
[4]
D. Clark et. al., "An analysis of TCP Processing overhead", IEEE Communications, June 1989.]]
[5]
T. Garfinkel, Ben Pfaff, Jim Chow, Mendel Rosenblum, Dan Boneh, "Terra: a virtual machine-based platform for trusted computing," Proceedings of the 9th ACM symposium on Operating Systems Principles, Oct 2003, NY, USA.]]
[6]
R. Iyer, "CASPER: Cache Architecture, Simulation and Performance Exploration using Re-streams," Intel's Design and Test Technology Conference (DTTC), 2001.]]
[7]
R. Iyer, "On Modeling and Analyzing Cache Hierarchies using CASPER," MASCOTS-11, 2003.]]
[8]
P. Jain, et al., "Software Assisted Cache Replacement and Prefetching Pollution Control," http://www.csail.mit.edu/research/abstracts/abstracts03/architecture/24jain.pdf]]
[9]
N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," Proceedings of 17th International Symposium on Computer Architecture, pages 364--373. IEEE, June 1990.]]
[10]
S.T. King, George W. Dunlap, Peter M. Chen, "Operating System Support for Virtual Machines", Proceedings of the 2003 Annual USENIX Technical Conference, June 2003.]]
[11]
D. Koufaty, et.al, "Data Forwarding in Scalable Shared Memory Multiprocessors, IEEE TPDS, 1997.]]
[12]
D. Lilja and P-C. Yew, "Combining hardware and software cache coherence strategies," International Conference on Supercomputing, 1991.]]
[13]
S. Makineni and R. Iyer, "Architectural Characterization of TCP/IP Packet Processing on the Pentium® M microprocessor," HPCA-10, 2004.]]
[14]
S. Makineni and R. Iyer, "Performance Characterization of TCP/IP Packet Processing in Commercial Workloads," IEEE WWC-6, 2003.]]
[15]
D. Marr et al., "Hyper-Threading Technology Architecture and Microarchitecture" Intel Technology Journal, 2002. http://www.intel.com/technology/itj/2002/volume06issue01/]]
[16]
M. Martin, et al., "Token Coherence: A New Framework for Shared-Memory Multiprocessors," IEEE Micro Special Issue, Nov-Dec 2003.]]
[17]
N. Megido, "Adaptive Replacement Cache," IBM T.J. Watson Research Center, http://www.almaden.ibm.com/cs/people/dmodha/arc-fast.pdf]]
[18]
D. Minturn, et al., "Exploiting Architectural Techniques for Improving TCP/IP Processing Performance," submitted to a conference.]]
[19]
B. Nayfeh, K. Olukotun and J.P. Singh, "The Impact of Shared Cache Clustering in Small-Scale Shared Memory Multiprocessors," Int'l Conference on High Performance Computer Architecture (HPCA-1), Feb 1996.]]
[20]
J. B. Postel, "Transmission Control Protocol", RFC 793, Information Sciences Institute, Sept. 1981.]]
[21]
D.K. Poulsen and P.C. Yew, "Integrating Fine Grained Message Passing in Cache Coherent Shared Memory Multiprocessors," Journal of Parallel and Distributed Computing, 1996.]]
[22]
P. Ranganathan, et al., "The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems," 24th International Symposium on Computer Architecture, June 1997, 144--156.]]
[23]
A. Seznec, "Decoupled Sectored Caches", IEEE Transactions on Computers, Feb. 1997.]]
[24]
SimpleScalar LLC, http://www.simplescalar.com]]
[25]
Y. Solihin, J. Lee, and Josep Torrellas. "Using a User-Level Memory Thread for Correlation Prefetching", The 29th Annual International Symposium on Computer Architecture (ISCA 2002), Anchorage, Alaska, May 2002.]]
[26]
"SPECweb99 Design Document," available at http://www.specbench.org/osg/web99/docs/whitepaper.html]]
[27]
P. Stenstrom, "A Survey of Cache Coherence Protocols," IEEE Computer, 1990.]]
[28]
E. Suh, L. Rudolph and S. Devadas, "Dynamic Partitioning of Shared Cache Memory," Journal of Supercomputing, July 2002.]]
[29]
"The TTCP Benchmark", http://ftp.arl.mil/~mike/ttcp.html]]
[30]
D. M. Tullsen and S. J. Eggers. "Limitations of Cache Prefetching on a Bus-Based Multiprocessor," Proc. 20th Annual Int. Symposium on Computer Architecture, pp.278--288, 1993.]]
[31]
D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," 22nd International Symposium on Computer Architecture, 1995.]]
[32]
VMware Inc., "VMware is Virtual Infrastructure", http://www.vmware.com/vinfrastructure/]]
[33]
C. A. Waldspurger, "Memory Resource Management in VMware ESX Server," 5th Symposium on OSDI, 2002.]]
[34]
W. A. Wulf and S. A. McKee. "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, 23(1):20--24, Mar 1995.]]
[35]
L. Zhao, et al., "Efficient Cache Structures and Policies for Server Network Acceleration," submitted to a conference.]]

Cited By

View all

Index Terms

  1. CQoS: a framework for enabling QoS in shared caches of CMP platforms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '04: Proceedings of the 18th annual international conference on Supercomputing
    June 2004
    360 pages
    ISBN:1581138393
    DOI:10.1145/1006209
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CMP
    2. QoS
    3. cache
    4. partitioning
    5. performance
    6. sharing

    Qualifiers

    • Article

    Conference

    ICS04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media