skip to main content
research-article

SPACE: Semi-Partitioned CachE for Energy Efficient, Hard Real-Time Systems

Published: 01 April 2017 Publication History

Abstract

Multi-core processors are increasingly popular because they yield higher performance, but they also present new challenges for hard real-time systems in that they make it much more difficult to estimate a task's worst-case execution time (WCET). Partitioned cache architecture is being used to ease the problem by providing an isolated execution environment for each thread. Although simple to implement and use, this method may be sub-optimal with respect to both energy consumption and performance since it prevents taking advantage of information shared across threads for both instructions and data. This work presents a new cache architecture termed SPACE (Semi-Partitioned CachE) that makes it possible to leverage information sharing, yielding in turn a tighter WCET. The SPACE architecture together with our new WCET algorithm can be used to maintain the predictability of the execution time of the parallel threads while reducing the overall energy consumption of the system. The new proposed cache architecture was implemented using Verilog and deployed on a Xilinx MicroBlaze multi-core design for testing, validation and measurements. The application level experiments were conducted using the Chronos tool for estimation and the Wattch/SimpleScalar simulator for execution. Using three real-time programs–a radar tracker, a DES encryption algorithm, and an FM radio–we showed that SPACE together with the enhanced WCET algorithm reduce the average system WCET of these applications by 31 percent and reduce the actual energy consumption by 18 percent in comparison with other cache architectures.

References

[1]
R. Obermaisser, C. El Salloum, B. Huber, and H. Kopetz, “From a federated to an integrated automotive architecture,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. Volume 28, no. Issue 7, pp. 956–965, 2009.
[2]
, “ARMADA1000/88DE3010 HD media processor SoC,” Mar. 2013. {Online}. Available: http://www.marvell.com/digital-entertainment/armada-1000/
[3]
, “The Cisco QuantumFlow processor: Cisco's next generation network processor,” 2013. {Online}. Avaliable: http://www.cisco.com/en/US/prod/collateral/routers/ps9343/solution
[4]
, “EyeQ2,” 2011. {Online}. Available: http://mobileye.com/technology/processing-platforms/eyeq2/
[5]
, “ARM Processors Selection Guide,” 2013. {Online}. Available: http://www.ti.com/lit/sg/sprt596f/sprt596f.pdf
[6]
, “TriCore—Architecture overview handbook, v1.3.3,” 2002. {Online}. Available: http://www.infineon.com/dgdl/TC1_3_ArchOverview_1.pdf
[7]
A. Stegmeier, M. Frieb, R. Jahr, and T. Ungerer, “Algorithmic skeletons for parallelization of embedded real-time systems,” in Proc. 3rd Workshop High-Perform. Real-Time Embedded Syst., 2015.
[8]
M. Frieb, R. Jahr, H. Ozaktas, A. Hugl, H. Regler, and T. Ungerer, “A parallelization approach for hard real-time systems and its application on two industrial programs,” Int. J. Parallel Program., vol. Volume 44, no. Issue 6, pp. 1296–1336, 2016.
[9]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” in Proc. 25 Years Int. Symp. Comput. Archit., 1998, pp. 376–387.
[10]
P. Subedi and W. Zhang, “WCET estimation of multi-core processors with the MSI cache coherency protocol,” in Proc. Work-in-Progress Session LCTES, 2012, pp. 17–20.
[11]
J. Harnisch, “Predictable hardware: The aurix microcontroller family,” in presented at the Workshop on Worst-Case Execution Time Analysis, Paris, France, 2013.
[12]
E. Bost, “Hardware support for robust partitioning in freescale QorlQ multicore SoCs,” 2013. {Online}. Available: freescale.com/files/32bit/doc/white_paper
[13]
M. E. T Gerards and J. Kuper, “Optimal DPM and DVFS for frame-based real-time systems,” ACM Trans. Archit. Code Optimization, vol. Volume 9, no. Issue 4, pp. 41–64, 2013.
[14]
Z. Navabi, Verilog Digital System Design . New York, NY, USA: McGraw-Hill, 1999.
[15]
, “Synopsys.com,” 2013. {Online}. Available: http://www.synopsys.com/home.aspx
[16]
C. Zhang, F. Vahid, and W. Najjar, “A highly configurable cache for low energy embedded systems,” ACM Trans. Embedded Comput. Syst., vol. Volume 4, no. Issue 2, pp. 363–387, 2005.
[17]
, “Xilinx Virtex-7 FPGA VC709 connectivity kit,” 2013. {Online}. Available: http://www.xilinx.com/products/boards-and-kits/dk-v7-vc709-g.html
[18]
D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for architectural-level power analysis and optimizations,” in Proc. 27th Annu. Int. Symp. Comput. Archit., 2000, vol. Volume 28, pp. 83–94.
[19]
A. Saman, “StreamIt benchmark,” 2013. {Online}. Available: http://groups.csail.mit.edu.cag/streamit/shtml/documentation.shtml
[20]
H. Yun, P.-L. Wu, A. Arya, C. Kim, T. Abdelzaher, and L. Sha, “System-wide energy optimization for multiple DVS components and real-time tasks,” Real-Time Syst., vol. Volume 47, no. Issue 5, pp. 489–515, 2011.
[21]
T. Simunic, L. Benini, A. Acquaviva, P. Glynn, and G. De Micheli, “Dynamic voltage scaling and power management for portable systems,” in Proc. 38th Annu. Des. Autom. Conf., 2001, pp. 524–529.
[22]
J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, “The Mälardalen WCET benchmarks—Past, present and future,” in Proc. 10th Int. Workshop Worst-Case Execution Time Anal., 2010, pp. 137–147.
[23]
Y. Li, V. Suhendra, Y. Liang, T. Mitra, and A. Roychoudhury, “Timing analysis of concurrent programs running on shared cache multi-cores,” in Proc. 30th IEEE Real-Time Syst. Symp., 2009, pp. 57–67.
[24]
S. Chattopadhyay, A. Roychoudhury, and T. Mitra, “Modeling shared cache and bus in multi-cores for timing analysis,” in Proc. 13th Int. Workshop Softw. Compilers Embedded Syst., 2010, pp. 1–10.
[25]
K. Nagar and Y.N. Srikant, “Precise shared cache analysis using optimal interference placement,” in Proc. IEEE 20th Real-Time Embedded Technol. Appl. Symp., 2014, pp. 125–134.
[26]
V. Suhendra and T. Mitra, “Exploring locking & partitioning for predictable shared caches on multi-cores,” in Proc. 45th ACM/EDAC/IEEE Des. Autom. Conf., 2008, pp. 300–303.
[27]
W. Wang, P. Mishra, and S. Ranka, “Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems,” in Proc. 48th ACM/EDAC/IEEE Des. Autom. Conf., 2011, pp. 948–953.
[28]
H. Kim, A. Kandhalu, and R. Rajkumar, “A coordinated approach for practical OS-level cache management in multi-core real-time systems,” in Proc. 25th Euromicro Conf. Real-Time Syst., 2013, pp. 80–89.
[29]
J. Yan and W. Zhang, “Time-predictable multicore cache architectures,” in Proc. 3rd Int. Conf. Comput. Res. Develop., 2011, pp. 1–5.
[30]
A. Pyka, M. Rohde, and S. Uhrig, “A real-time capable first-level cache for multi-cores,” in Proc. Workshop High Perform. Real-Time Embedded Syst. Conjunction HiPEAC, 2013.
[31]
A. Pyka, et al., “WCET analysis of parallel benchmarks using on-demand coherent cache,” in Proc. 3rd Workshop on High-performance and Real-Time Embedded Syst., 2015.
[32]
A. Pyka, M. Rohde, and S. Uhrig, “Performance evaluation of the time analysable on-demand coherent cache,” in Proc. 12th IEEE Int. Conf. Trust Secur. Privacy Comput. Commun., 2013, pp. 1887–1892.
[33]
M. Paolieri, et al., “A software-pipelined approach to multicore execution of timing predictable multi-threaded hard real-time tasks,” in Proc. IEEE Symp. Real-Time Comput., 2011, pp. 233–240.
[34]
I. Puaut and D. Decotigny, “Low-complexity algorithms for static cache locking in multitasking hard real-time systems,” in Proc. 23rd IEEE Real-Time Syst. Symp., 2002, pp. 114–123.
[35]
W. A. Wong and J.-L. Baer, “Modified LRU policies for improving second-level cache behavior,” in Proc. 6th Int. Symp. High-Perform. Comput. Archit., 2000, pp. 49–60.
[36]
D. Hardy and I. Puaut, “WCET analysis of multi-level non-inclusive set-associative instruction caches,” in Proc. 29th IEEE Real-Time Syst. Symp., 2008, pp. 456–466.
[37]
M. Paolieri, E. Quiõnes, F. J. Cazorla, G. Bernat, and M. Valero, “Hardware support for WCET analysis of hard real-time multicore systems,” ACM SIGARCH Comput. Archit. News, vol. Volume 37, pp. 57–68, 2009.
[38]
K. Lakshmanan, S. Kato, and R. Rajkumar, “Scheduling parallel real-time tasks on multi-core processors,” in Proc. IEEE 31st Real-Time Syst. Symp., 2010, pp. 259–268.
[39]
D. FerryJ. Li, M. Mahadevan, K. Agrawal, C. Gill, and C. Lu, “A real-time scheduling service for parallel tasks,” in Proc. IEEE 19th Real-Time Embedded Technol. Appl. Symp., 2013, pp. 261–272.
[40]
M. I. Gordon, W. Thies, and S. Amarasinghe, “Exploiting coarse-grained task, data, and pipeline parallelism in stream programs,” in Proc. ACM SIGOPS Operating Syst. Rev., 2006, pp. 151–162.
[41]
S. Chattopadhyay, L. K. Chong, A. Roychoudhury, T. Kelter, P. Marwedel, and H. Falk, “A unified WCET analysis framework for multicore platforms,” ACM Trans. Embedded Comput. Syst., vol. Volume 13, no. Issue 4, 2014, Art. no. 124.
[42]
L. Benini, A. Bogliolo, and G. De Micheli, “A survey of design techniques for system-level dynamic power management,” IEEE Trans. Very Large Scale Integr. Syst., vol. Volume 8, no. Issue 3, pp. 299–316, 2000.
[43]
M. Sparsh, “A survey of techniques for improving energy efficiency in embedded computing systems,” Int. J. Comput. Aided Eng. Technol., vol. Volume 6, no. Issue 4, pp. 440–459, 2014.
[44]
V. Devadas and H. Aydin, “DFR-EDF: A unified energy management framework for real-time systems,” in Proc. 16th IEEE Real-Time Embedded Technol. Appl. Symp., 2010, pp. 121–130.
[45]
, “MicroBlaze processor reference guide,” 2010. {Online}. Available: http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/mb_ref_guide.pdf
[46]
, “Vivado design suite—AXI reference guide,” 2014. {Online}. Available: http://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf
[47]
, “LogiCore IP AXI interconnect v2.0,” 2013. {Online}. Available: http://www.xilinx.com/support/documentation/ip_documentation/axi_interconnect/v2_0/pg059-axi-interconnect.pdf
[48]
, “Simple AMP running linux and bare-metal system on both Zynq soc processors”, 2013. {Online}. Available: http://www.xilinx.com/support/documentation/application_notes/xapp1078-amp-linux-bare-metal.pdf
[49]
X. Li, Y. Liang, T. Mitra, and A. Roychoudhury, “Chronos: A timing analyzer for embedded software,” Sci. Comput. Program., vol. Volume 69, no. Issue 1, pp. 56–67, 2007.
[50]
B. K. Huynh, L. Ju, and A. Roychoudhury, “Scope-aware data cache analysis for WCET estimation,” in Proc. 17th IEEE Real-Time Embedded Technol. Appl. Symp., 2011, pp. 203–212.
[51]
A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller, “Virtual simple architecture (VISA): Exceeding the complexity limit in safe real-time systems,” in Proc. 30th Annu. Int. Symp. Comput. Archit., 2003, pp. 350–361.
[52]
A. I. Reuther, “Preliminary design review: GMTI processing for the PCA integrated radar-tracker application,” DTIC Document, MIT Lincoln Laboratory Project. Report PCA-IRT-2, 2004.
[53]
, “CACTI 5.3 (rev 174),” 2016. {Online}. Available: quid.hpl.hp.com:9081/cacti
[54]
, “Intel atom processor Z5xx series—Datasheet,” 2011. {Online}. Available: http://www.intel.com/design/processor/datashts/319535.pdf
[55]
, “AM3715, AM3703 Sitara ARM microprocessors,” 2011. {Online}. Available: http://www.ti.com/lit/ds/symlink/am3715.pdf
[56]
, “System power calculators,” 2012. {Online}. Available: http://www.micron.com/products/support/power-calc
[57]
T. Lundqvist and P. Stenstrom, “Timing anomalies in dynamically scheduled microprocessors,” in Proc. 20th IEEE Real-Time Syst. Symp., 1999, pp. 12–21.

Cited By

View all
  1. SPACE: Semi-Partitioned CachE for Energy Efficient, Hard Real-Time Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Computers
    IEEE Transactions on Computers  Volume 66, Issue 4
    April 2017
    185 pages

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 01 April 2017

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media