research-article

SPACE: Semi-Partitioned CachE for Energy Efficient, Hard Real-Time Systems

Authors:

Israel CidonAuthors Info & Claims

IEEE Transactions on Computers, Volume 66, Issue 4

Pages 717 - 730

https://doi.org/10.1109/TC.2016.2608775

Published: 01 April 2017 Publication History

Abstract

Multi-core processors are increasingly popular because they yield higher performance, but they also present new challenges for hard real-time systems in that they make it much more difficult to estimate a task's worst-case execution time (WCET). Partitioned cache architecture is being used to ease the problem by providing an isolated execution environment for each thread. Although simple to implement and use, this method may be sub-optimal with respect to both energy consumption and performance since it prevents taking advantage of information shared across threads for both instructions and data. This work presents a new cache architecture termed SPACE (Semi-Partitioned CachE) that makes it possible to leverage information sharing, yielding in turn a tighter WCET. The SPACE architecture together with our new WCET algorithm can be used to maintain the predictability of the execution time of the parallel threads while reducing the overall energy consumption of the system. The new proposed cache architecture was implemented using Verilog and deployed on a Xilinx MicroBlaze multi-core design for testing, validation and measurements. The application level experiments were conducted using the Chronos tool for estimation and the Wattch/SimpleScalar simulator for execution. Using three real-time programs–a radar tracker, a DES encryption algorithm, and an FM radio–we showed that SPACE together with the enhanced WCET algorithm reduce the average system WCET of these applications by 31 percent and reduce the actual energy consumption by 18 percent in comparison with other cache architectures.

References

[1]

R. Obermaisser, C. El Salloum, B. Huber, and H. Kopetz, “From a federated to an integrated automotive architecture,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. Volume 28, no. Issue 7, pp. 956–965, 2009.

Digital Library

[2]

, “ARMADA1000/88DE3010 HD media processor SoC,” Mar. 2013. {Online}. Available: http://www.marvell.com/digital-entertainment/armada-1000/

[3]

, “The Cisco QuantumFlow processor: Cisco's next generation network processor,” 2013. {Online}. Avaliable: http://www.cisco.com/en/US/prod/collateral/routers/ps9343/solution

[4]

, “EyeQ2,” 2011. {Online}. Available: http://mobileye.com/technology/processing-platforms/eyeq2/

[5]

, “ARM Processors Selection Guide,” 2013. {Online}. Available: http://www.ti.com/lit/sg/sprt596f/sprt596f.pdf

[6]

, “TriCore—Architecture overview handbook, v1.3.3,” 2002. {Online}. Available: http://www.infineon.com/dgdl/TC1_3_ArchOverview_1.pdf

[7]

A. Stegmeier, M. Frieb, R. Jahr, and T. Ungerer, “Algorithmic skeletons for parallelization of embedded real-time systems,” in Proc. 3rd Workshop High-Perform. Real-Time Embedded Syst., 2015.

[8]

M. Frieb, R. Jahr, H. Ozaktas, A. Hugl, H. Regler, and T. Ungerer, “A parallelization approach for hard real-time systems and its application on two industrial programs,” Int. J. Parallel Program., vol. Volume 44, no. Issue 6, pp. 1296–1336, 2016.

Digital Library

[9]

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” in Proc. 25 Years Int. Symp. Comput. Archit., 1998, pp. 376–387.

Digital Library

[10]

P. Subedi and W. Zhang, “WCET estimation of multi-core processors with the MSI cache coherency protocol,” in Proc. Work-in-Progress Session LCTES, 2012, pp. 17–20.

[11]

J. Harnisch, “Predictable hardware: The aurix microcontroller family,” in presented at the Workshop on Worst-Case Execution Time Analysis, Paris, France, 2013.

[12]

E. Bost, “Hardware support for robust partitioning in freescale QorlQ multicore SoCs,” 2013. {Online}. Available: freescale.com/files/32bit/doc/white_paper

[13]

M. E. T Gerards and J. Kuper, “Optimal DPM and DVFS for frame-based real-time systems,” ACM Trans. Archit. Code Optimization, vol. Volume 9, no. Issue 4, pp. 41–64, 2013.

Digital Library

[14]

Z. Navabi, Verilog Digital System Design . New York, NY, USA: McGraw-Hill, 1999.

Digital Library

[15]

, “Synopsys.com,” 2013. {Online}. Available: http://www.synopsys.com/home.aspx

[16]

C. Zhang, F. Vahid, and W. Najjar, “A highly configurable cache for low energy embedded systems,” ACM Trans. Embedded Comput. Syst., vol. Volume 4, no. Issue 2, pp. 363–387, 2005.

Digital Library

[17]

, “Xilinx Virtex-7 FPGA VC709 connectivity kit,” 2013. {Online}. Available: http://www.xilinx.com/products/boards-and-kits/dk-v7-vc709-g.html

[18]

D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for architectural-level power analysis and optimizations,” in Proc. 27th Annu. Int. Symp. Comput. Archit., 2000, vol. Volume 28, pp. 83–94.

Digital Library

[19]

A. Saman, “StreamIt benchmark,” 2013. {Online}. Available: http://groups.csail.mit.edu.cag/streamit/shtml/documentation.shtml

[20]

H. Yun, P.-L. Wu, A. Arya, C. Kim, T. Abdelzaher, and L. Sha, “System-wide energy optimization for multiple DVS components and real-time tasks,” Real-Time Syst., vol. Volume 47, no. Issue 5, pp. 489–515, 2011.

Digital Library

[21]

T. Simunic, L. Benini, A. Acquaviva, P. Glynn, and G. De Micheli, “Dynamic voltage scaling and power management for portable systems,” in Proc. 38th Annu. Des. Autom. Conf., 2001, pp. 524–529.

Digital Library

[22]

J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, “The Mälardalen WCET benchmarks—Past, present and future,” in Proc. 10th Int. Workshop Worst-Case Execution Time Anal., 2010, pp. 137–147.

[23]

Y. Li, V. Suhendra, Y. Liang, T. Mitra, and A. Roychoudhury, “Timing analysis of concurrent programs running on shared cache multi-cores,” in Proc. 30th IEEE Real-Time Syst. Symp., 2009, pp. 57–67.

Digital Library

[24]

S. Chattopadhyay, A. Roychoudhury, and T. Mitra, “Modeling shared cache and bus in multi-cores for timing analysis,” in Proc. 13th Int. Workshop Softw. Compilers Embedded Syst., 2010, pp. 1–10.

Digital Library

[25]

K. Nagar and Y.N. Srikant, “Precise shared cache analysis using optimal interference placement,” in Proc. IEEE 20th Real-Time Embedded Technol. Appl. Symp., 2014, pp. 125–134.

[26]

V. Suhendra and T. Mitra, “Exploring locking & partitioning for predictable shared caches on multi-cores,” in Proc. 45th ACM/EDAC/IEEE Des. Autom. Conf., 2008, pp. 300–303.

Digital Library

[27]

W. Wang, P. Mishra, and S. Ranka, “Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems,” in Proc. 48th ACM/EDAC/IEEE Des. Autom. Conf., 2011, pp. 948–953.

Digital Library

[28]

H. Kim, A. Kandhalu, and R. Rajkumar, “A coordinated approach for practical OS-level cache management in multi-core real-time systems,” in Proc. 25th Euromicro Conf. Real-Time Syst., 2013, pp. 80–89.

Digital Library

[29]

J. Yan and W. Zhang, “Time-predictable multicore cache architectures,” in Proc. 3rd Int. Conf. Comput. Res. Develop., 2011, pp. 1–5.

[30]

A. Pyka, M. Rohde, and S. Uhrig, “A real-time capable first-level cache for multi-cores,” in Proc. Workshop High Perform. Real-Time Embedded Syst. Conjunction HiPEAC, 2013.

[31]

A. Pyka, et al., “WCET analysis of parallel benchmarks using on-demand coherent cache,” in Proc. 3rd Workshop on High-performance and Real-Time Embedded Syst., 2015.

[32]

A. Pyka, M. Rohde, and S. Uhrig, “Performance evaluation of the time analysable on-demand coherent cache,” in Proc. 12th IEEE Int. Conf. Trust Secur. Privacy Comput. Commun., 2013, pp. 1887–1892.

Digital Library

[33]

M. Paolieri, et al., “A software-pipelined approach to multicore execution of timing predictable multi-threaded hard real-time tasks,” in Proc. IEEE Symp. Real-Time Comput., 2011, pp. 233–240.

Digital Library

[34]

I. Puaut and D. Decotigny, “Low-complexity algorithms for static cache locking in multitasking hard real-time systems,” in Proc. 23rd IEEE Real-Time Syst. Symp., 2002, pp. 114–123.

Digital Library

[35]

W. A. Wong and J.-L. Baer, “Modified LRU policies for improving second-level cache behavior,” in Proc. 6th Int. Symp. High-Perform. Comput. Archit., 2000, pp. 49–60.

[36]

D. Hardy and I. Puaut, “WCET analysis of multi-level non-inclusive set-associative instruction caches,” in Proc. 29th IEEE Real-Time Syst. Symp., 2008, pp. 456–466.

Digital Library

[37]

M. Paolieri, E. Quiõnes, F. J. Cazorla, G. Bernat, and M. Valero, “Hardware support for WCET analysis of hard real-time multicore systems,” ACM SIGARCH Comput. Archit. News, vol. Volume 37, pp. 57–68, 2009.

Digital Library

[38]

K. Lakshmanan, S. Kato, and R. Rajkumar, “Scheduling parallel real-time tasks on multi-core processors,” in Proc. IEEE 31st Real-Time Syst. Symp., 2010, pp. 259–268.

Digital Library

[39]

D. FerryJ. Li, M. Mahadevan, K. Agrawal, C. Gill, and C. Lu, “A real-time scheduling service for parallel tasks,” in Proc. IEEE 19th Real-Time Embedded Technol. Appl. Symp., 2013, pp. 261–272.

Digital Library

[40]

M. I. Gordon, W. Thies, and S. Amarasinghe, “Exploiting coarse-grained task, data, and pipeline parallelism in stream programs,” in Proc. ACM SIGOPS Operating Syst. Rev., 2006, pp. 151–162.

Digital Library

[41]

S. Chattopadhyay, L. K. Chong, A. Roychoudhury, T. Kelter, P. Marwedel, and H. Falk, “A unified WCET analysis framework for multicore platforms,” ACM Trans. Embedded Comput. Syst., vol. Volume 13, no. Issue 4, 2014, Art. no. 124.

Digital Library

[42]

L. Benini, A. Bogliolo, and G. De Micheli, “A survey of design techniques for system-level dynamic power management,” IEEE Trans. Very Large Scale Integr. Syst., vol. Volume 8, no. Issue 3, pp. 299–316, 2000.

Digital Library

[43]

M. Sparsh, “A survey of techniques for improving energy efficiency in embedded computing systems,” Int. J. Comput. Aided Eng. Technol., vol. Volume 6, no. Issue 4, pp. 440–459, 2014.

[44]

V. Devadas and H. Aydin, “DFR-EDF: A unified energy management framework for real-time systems,” in Proc. 16th IEEE Real-Time Embedded Technol. Appl. Symp., 2010, pp. 121–130.

Digital Library

[45]

, “MicroBlaze processor reference guide,” 2010. {Online}. Available: http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/mb_ref_guide.pdf

[46]

, “Vivado design suite—AXI reference guide,” 2014. {Online}. Available: http://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf

[47]

, “LogiCore IP AXI interconnect v2.0,” 2013. {Online}. Available: http://www.xilinx.com/support/documentation/ip_documentation/axi_interconnect/v2_0/pg059-axi-interconnect.pdf

[48]

, “Simple AMP running linux and bare-metal system on both Zynq soc processors”, 2013. {Online}. Available: http://www.xilinx.com/support/documentation/application_notes/xapp1078-amp-linux-bare-metal.pdf

[49]

X. Li, Y. Liang, T. Mitra, and A. Roychoudhury, “Chronos: A timing analyzer for embedded software,” Sci. Comput. Program., vol. Volume 69, no. Issue 1, pp. 56–67, 2007.

Digital Library

[50]

B. K. Huynh, L. Ju, and A. Roychoudhury, “Scope-aware data cache analysis for WCET estimation,” in Proc. 17th IEEE Real-Time Embedded Technol. Appl. Symp., 2011, pp. 203–212.

Digital Library

[51]

A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller, “Virtual simple architecture (VISA): Exceeding the complexity limit in safe real-time systems,” in Proc. 30th Annu. Int. Symp. Comput. Archit., 2003, pp. 350–361.

Digital Library

[52]

A. I. Reuther, “Preliminary design review: GMTI processing for the PCA integrated radar-tracker application,” DTIC Document, MIT Lincoln Laboratory Project. Report PCA-IRT-2, 2004.

[53]

, “CACTI 5.3 (rev 174),” 2016. {Online}. Available: quid.hpl.hp.com:9081/cacti

[54]

, “Intel atom processor Z5xx series—Datasheet,” 2011. {Online}. Available: http://www.intel.com/design/processor/datashts/319535.pdf

[55]

, “AM3715, AM3703 Sitara ARM microprocessors,” 2011. {Online}. Available: http://www.ti.com/lit/ds/symlink/am3715.pdf

[56]

, “System power calculators,” 2012. {Online}. Available: http://www.micron.com/products/support/power-calc

[57]

T. Lundqvist and P. Stenstrom, “Timing anomalies in dynamically scheduled microprocessors,” in Proc. 20th IEEE Real-Time Syst. Symp., 1999, pp. 12–21.

Digital Library

Cited By

Hoffmann JFröhlich A(2022)Online Machine Learning for Energy-Aware Multicore Real-Time Embedded SystemsIEEE Transactions on Computers10.1109/TC.2021.305607071:2(493-505)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TC.2021.3056070
Ghosh SBhargava LSahula V(2021)SRCP: sharing and reuse-aware replacement policy for the partitioned cache in multicore systemsDesign Automation for Embedded Systems10.1007/s10617-021-09251-z25:3(193-211)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10617-021-09251-z

SPACE: Semi-Partitioned CachE for Energy Efficient, Hard Real-Time Systems
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

SPACE: sharing pattern-based directory coherence for multicore scalability
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

An important challenge in multicore processors is the maintenance of cache coherence in a scalable manner. Directory-based protocols save bandwidth and achieve scalability by associating information about sharer cores with every cache block. As the ...
WCET driven design space exploration of an object cache
JTRES '10: Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems

In order to guarantee that real-time systems meet their timing specification, static execution time bounds need to be calculated. Not considering execution time predictability led to architectures which perform well in the average case, but require very ...
RTEMS Core Analysis for Space Applications
SBESC '13: Proceedings of the 2013 III Brazilian Symposium on Computing Systems Engineering

Overheads due to context switching and external interrupt management are core characteristics for Real-Time Operating Systems (RTOS) since they play a central role in their performance and timeliness. In this paper we evaluate two core characteristics ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 66, Issue 4

April 2017

185 pages

ISSN:0018-9340

Issue’s Table of Contents

Copyright © 2017.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hoffmann JFröhlich A(2022)Online Machine Learning for Energy-Aware Multicore Real-Time Embedded SystemsIEEE Transactions on Computers10.1109/TC.2021.305607071:2(493-505)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TC.2021.3056070
Ghosh SBhargava LSahula V(2021)SRCP: sharing and reuse-aware replacement policy for the partitioned cache in multicore systemsDesign Automation for Embedded Systems10.1007/s10617-021-09251-z25:3(193-211)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10617-021-09251-z

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents