skip to main content
10.1145/2694344.2694382acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD)

Published: 14 March 2015 Publication History

Abstract

This paper presents PARD, a programmable architecture for resourcing-on-demand that provides a new programming interface to convey an application's high-level information like quality-of-service requirements to the hardware. PARD enables new functionalities like fully hardware-supported virtualization and differentiated services in computers. PARD is inspired by the observation that a computer is inherently a network in which hardware components communicate via packets (e.g., over the NoC or PCIe). We apply principles of software-defined networking to this intra-computer network and address three major challenges. First, to deal with the semantic gap between high-level applications and underlying hardware packets, PARD attaches a high-level semantic tag (e.g., a virtual machine or thread ID) to each memory-access, I/O, or interrupt packet. Second, to make hardware components more manageable, PARD implements programmable control planes that can be integrated into various shared resources (e.g., cache, DRAM, and I/O devices) and can differentially process packets according to tag-based rules. Third, to facilitate programming, PARD abstracts all control planes as a device file tree to provide a uniform programming interface via which users create and apply tag-based rules.
Full-system simulation results show that by co-locating latencycritical memcached applications with other workloads PARD can improve a four-core computer's CPU utilization by up to a factor of four without significantly increasing tail latency. FPGA emulation based on a preliminary RTL implementation demonstrates that the cache control plane introduces no extra latency and that the memory control plane can reduce queueing delay for high-priority memory-access requests by up to a factor of 5.6.

References

[1]
Gartner says efficient data center design can lead to 300 percent capacity growth in 60 percent less space. http://www.gartner.com/newsroom/id/1472714.
[2]
Software-Defined Networking. https://www.opennetworking.org/sdn-resources/sdn-definition/.
[3]
BusyBox. http://www.busybox.net/.
[4]
Cgroups. http://en.wikipedia.org/wiki/Cgroups.
[5]
Intel 64 and IA-32 Architectures Software Developer Manuals, volume 3: System Programming Guide.
[6]
Intelligent Platform Management Interface (IPMI). http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface.
[7]
Linux Container(LXC). http://lxc.sourceforge.net/.
[8]
Memcached. http://memcached.org/.
[9]
Intel 82599 10 gigabit ethernet controller: Datasheet. http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-controller-datasheet.html.
[10]
Openflow switch specification. https://www.opennetworking.org/sdn-resources/openflow/.
[11]
Xilinx Virtex-7 FPGA VC709 Connectivity Kit. http://www.xilinx.com/products/boards-and-kits/EK-V7-VC709-CES-G.htm.
[12]
Vivado Design Suite. http://www.xilinx.com/products/design-tools/vivado/.
[13]
Computing Community Consortium (CCC). 21st century computer architecture. A community white paper, 2012. URL http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf.
[14]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, New York, NY, USA, 2010.
[15]
L. A. Barroso, J. Clidaras, and U. Holzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 8(3):1--154, 2013.
[16]
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schupbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009.
[17]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011.
[18]
M. N. Bojnordi and E. Ipek. PARDIS: a programmable memory controller for the DDRx interfacing standards. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 13--24, Washington, DC, USA, 2012.
[19]
J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2): 74--80, Feb. 2013.
[20]
C. Delimitrou and C. Kozyrakis. ibench: Quantifying interference for datacenter applications. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 23--33, 2013.
[21]
C. Delimitrou and C. Kozyrakis. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the eighteenth inter- national conference on Architectural support for programming languages and operating systems, page 77--88, 2013.
[22]
C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, page 127--144, New York, NY, USA, 2014.
[23]
X. L. Dong, B. Saha, and D. Srivastava. Less is more: selecting sources wisely for integration. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB'13, Trento, Italy, 2013.
[24]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '12, pages 37--48, New York, NY, USA, 2012.
[25]
Google. Google Cluster workload traces. http://code.google.com/p/googleclusterdata.
[26]
J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1--17, Sept. 2006.
[27]
A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based QoS techniques for cache/memory in CMP platforms. In Proceedings of the 23rd international conference on Supercomputing, page 479--488, 2009.
[28]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 22--22, Berkeley, CA, USA, 2011.
[29]
C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. SIGCOMM Comput. Commun. Rev., 42 (4), Aug. 2012.
[30]
Intel. An Introduction to the Intel QuickPath Interconnect. Jan. 2009.
[31]
R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th annual international conference on Supercomputing, page 257--266, 2004.
[32]
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS policies and architecture for Cache/Memory in CMP platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '07, page 25--36, New York, NY, USA, 2007.
[33]
V. Jeyakumar, M. Alizadeh, D. Mazires, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical network performance isolation at the edge. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, Berkeley, CA, USA, 2013.
[34]
J. M. Kaplan, W. Forrest, and N. Kindler. Revolutionizing data center energy efficiency. Technical report, McKinsey & Company, 2008.
[35]
R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: Predictable low latency for data center applications. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC '12, pages 9:1--9:14, New York, NY, USA, 2012.
[36]
H. Kasture and D. Sanchez. Ubik: Efficient cache sharing with strict qos for latency-critical workloads. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, page 729--742, New York, NY, USA, 2014.
[37]
E. Keller, J. Szefer, J. Rexford, and R. B. Lee. NoHype: virtualized cloud infrastructure without the virtualization. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, page 350--361, New York, NY, USA, 2010.
[38]
C. Kozyrakis. Resource efficient computing for warehouse-scale data-centers. In Design, Automation Test in Europe Conference Exhibition (DATE), pages 1351--1356, Mar. 2013.
[39]
J. Leverich and C. Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceedings of the 2014 EuroSys Conference, Amsterdam, Nethelands, 2014.
[40]
B. Li, L. Zhao, R. Iyer, L. S. Peh, M. Leddige, M. Espig, S. E. Lee, and D. Newell. CoQoS: coordinating QoS-aware shared resources in NoC-based SoCs. Journal of Parallel and Distributed Computing, 71 (5):700--713, 2011.
[41]
B. Li, L. S. Peh, L. Zhao, and R. Iyer. Dynamic QoS management for chip multiprocessors. ACM Trans. Archit. Code Optim., 9(3): 17:1--17:29, Oct. 2012.
[42]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In IEEE 14th International Symposium on High Performance Computer Architecture, 2008. HPCA 2008, pages 367--378, Feb. 2008.
[43]
L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, page 367--376, 2012.
[44]
L. Liu, Y. Li, Z. Cui, Y. Bao, M. Chen, and C. Wu. Going vertical in memory management: Handling multiplicity by multi-policy. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pages 169--180, June 2014.
[45]
J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: Online contention detection and response. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '10, pages 257--265, New York, NY, USA, 2010.
[46]
J. Mars, L. Tang, and R. Hundt. Heterogeneity in "homogeneous" warehouse-scale computers: A performance opportunity. Computer Architecture Letters, 10(2):29--32, 2011.
[47]
J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross core interference through contention synthesis. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pages 167--176, New York, NY, USA, 2011.
[48]
M. Mesnier, F. Chen, T. Luo, and J. B. Akers. Differentiated storage services. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, page 57--70, New York, NY, USA, 2011.
[49]
S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, page 374--385, 2011.
[50]
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 146--160, 2007.
[51]
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 385--398, Lombard, IL, 2013.
[52]
D. Novakovic, N. Vasic, S. Novakovic, D. Kostic, and R. Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In the 2013 USENIX Annual Technical Conference (USENIX ATC 13), pages 219--230, San Jose, CA, 2013.
[53]
Oracle. Oracle VM Server for SPARC (Logical Domains). http://www.oracle.com/technetwork/systems/logical-domains/index.html.
[54]
Oracle. OpenSPARC T1 microprocessor. http://www.oracle.com/technetwork/systems/opensparc/index.html.
[55]
Patrick Mochel. The sysfs filesystem. In Linux Symposium, 2005.
[56]
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low- overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, page 423--432, 2006.
[57]
S. Radhakrishnan, Y. Geng, V. Jeyakumar, A. Kabbani, G. Porter, and A. Vahdat. Senic: Scalable nic for end-host rate limiting. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 475--488, Berkeley, CA, USA, 2014.
[58]
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT '06, pages 2--12, New York, NY, USA, 2006.
[59]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC '12, pages 7:1--7:13, New York, NY, USA, 2012.
[60]
RFC2474. Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers. http://tools.ietf.org/html/rfc2474.
[61]
RFC2475. An Architecture for Differentiated Services. http://tools.ietf.org/html/rfc2475.
[62]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA '00, pages 128--138, New York, NY, USA, 2000.
[63]
D. Sanchez and C. Kozyrakis. The ZCache: decoupling ways and associativity. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, page 187--198, Washington, DC, USA, 2010.
[64]
D. Sanchez and C. Kozyrakis. Vantage: scalable and efficient fine-grain cache partitioning. In ACM SIGARCH Computer Architecture News, volume 39, page 57--68, 2011.
[65]
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 351--364, New York, NY, USA, 2013.
[66]
A. Sharifi, S. Srikantaiah, A. K. Mishra, M. Kandemir, and C. R. Das. METE: meeting end-to-end QoS in multicores through system-wide resource management. In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, page 13--24, 2011.
[67]
A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the data center network. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, Berkeley, CA, USA, 2011.
[68]
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In In Proc. of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), 2007.
[69]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 283--294, New York, NY, USA, 2011.
[70]
R. B. Tremaine, P. Franaszek, J. Robinson, C. Schulz, T. Smith, M. Wazlowski, and P. M. Bland. IBM memory expansion technology (MXT). IBM Journal of Research and Development, 45(2):271--285, Mar. 2001.
[71]
B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter TCP (d2tcp). In Proceedings of the ACM SIGCOMM 2012 Conference, SIGCOMM '12, New York, NY, USA, 2012.
[72]
G. Wang and T. S. E. Ng. The impact of virtualization on network performance of amazon EC2 data center. In Proceedings of the 29th Conference on Information Communications, INFOCOM'10, Piscat- away, NJ, USA, 2010.
[73]
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, New York, NY, USA, 2011.
[74]
Y. Xu, M. Bailey, B. Noble, and F. Jahanian. Small is better: Avoiding latency traps in virtualized data centers. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 7:1--7:16, New York, NY, USA, 2013.
[75]
Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding long tails in the cloud. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 329--342, Berkeley, CA, USA, 2013.
[76]
H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture, page 607--618, 2013.
[77]
M. Yu, A. Greenberg, D. Maltz, J. Rexford, L. Yuan, S. Kandula, and C. Kim. Profiling network performance for multi-tier data center applications. In Proceedings of the 8th USENIX Conference on Net- worked Systems Design and Implementation, NSDI'11, Berkeley, CA, USA, 2011.
[78]
D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz. DeTail: Reducing the flow completion time tail in datacenter networks. SIGCOMM Comput. Commun. Rev., 42(4), Aug. 2012.
[79]
Y. Zhang, M. Laurenzano, J. Mars, and L. Tang. Smite: Precise qos prediction on real system smt processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2015
720 pages
ISBN:9781450328357
DOI:10.1145/2694344
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. QoS
  2. data center
  3. hardware/software interface

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '15

Acceptance Rates

ASPLOS '15 Paper Acceptance Rate 48 of 287 submissions, 17%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)2
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media