skip to main content
research-article

Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors

Published: 06 February 2008 Publication History

Abstract

Maintaining local caches coherently in shared-memory multiprocessors results in significant power consumption. The customization methodology we propose exploits the fact that in embedded systems, important knowledge is available to the system designers regarding memory sharing between tasks. We demonstrate how the snoop-induced cache probings can be significantly reduced by identifying and exploiting in a deterministic way the shared memory regions between the processors. Snoop activity is enabled only for the accesses referring to known shared regions. The hardware support is not only cost efficient, but also software programmable, which allows for reprogrammability and customization across different tasks and applications.

References

[1]
Barroso, L., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. 2000. Piranha: A scalable architecture based on single-chip multiprocessing. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, New York, 282--293.
[2]
Bashirullah, R., Liu, W., and Cavin, R. K. 2003. Low-Power design methodology for an on-chip bus with adaptive bandwidth capability. In Proceedings of the Design Automation Conference (DAC). ACM Press, New York, 628--633.
[3]
Berndl, M., Lhotak, O., Qian, F., Hendren, L., and Umanee, N. 2003. Points-To analysis using BDDS. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). 103--114.
[4]
Binkert, N., Dreslinski, R., Hsu, L., Lim, K., Saidi, A., and Reinhardt, S. 2006. The m5 simulator: Modeling networked systems. IEEE Micro. 26, 4, 52--60.
[5]
Cantin, J. F., Lipasti, M. H., and Smith, J. E. 2005. Improving multiprocessor performance with coarse-grain coherence tracking. SIGARCH Comput. Archit. News 33, 2, 246--257.
[6]
Cekleov, M. and Dubois, M. 1997. Virtual-address caches. Part 1: Problems and solutions in uniprocessors. IEEE Micro. 17, 5 (Sept.), 64--71.
[7]
Cumming, P. 2003. The TI OMAP platform approach to SoC. In Winning the SOC Revolution. Kluwer Academic.
[8]
Das, M. 2000. Unification-Based pointer analysis with directional assignments. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), 35--46.
[9]
Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. TLB and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), 243--246.
[10]
Furber, S. B. 2000. ARM System-on-Chip Architecture. Addison-Wesley, Boston, MA.
[11]
Gonzalez, R. E. 2000. Xtensa: A configurable and extensible processor. IEEE Micro. 20, 2, 60--70.
[12]
Hind, M. 2001. Pointer analysis: Haven't we solved this problem yet? In ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE).
[13]
Intel Corporation. 2007. Intel XScale Microarchitecture. http://www.intel.com/design/intelxscale/316283.htm.
[14]
Kathail, V., Aditya, S., Schreiber, R., Rau, B. R., Cronquist, D. C., and Sivaraman, M. 2002. Pico: Automatically designing custom computers. IEEE Comput. 35, 9, 39--47.
[15]
Landi, W. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 4 (Dec.), 323--337.
[16]
Lenoski, D., Laudon, J., Gharachorloo, K., Gupta, A., and Hennessy, J. 1990. The directory-based cache-coherence protocol for the dash multiprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, New York, 148--159.
[17]
Li, M.-L., Sasanka, R., Adve, S., Chen, Y.-K., and Debes, E. 2005. The ALPbench benchmark suite for complex multimedia applications. In Proceedings of the International Symposium on Workload Characterization, 34--45.
[18]
Loghi, M., Letis, M., Benini, L., and Poncino, M. 2005. Exploring the energy efficiency of cache-coherence protocols in single-chip multi-processors. In Proceedings of the 15th Great Lakes Symposium on VLSI (GLSVLSI), 276--281.
[19]
Lyonnard, D., Yoo, S., Baghdadi, A., and Jerraya, A. 2001. Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip. In Proceedings of the Design Automation Conference (DAC). ACM Press, New York, 518--523.
[20]
Martin, M. K., Hill, M. D., and Wood, D. A. 2003. Token coherence: Decoupling performance and correctness. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, New York, 182--193.
[21]
Martin, M. M. K., Sorin, D. J., Hill, M. D., and Wood, D. A. 2002. Bandwidth adaptive snooping. In Proceedings of the Intrnational Symposium on High-Performance Computer Architecture (HPCA), 251--262.
[22]
Moshovos, A. 2005. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Washington, DC, 234--245.
[23]
Moshovos, A., Memik, G., Choudhary, A., and Falsafi, B. 2001. Jetty: Filtering snoops for reduced energy consumption in SMP servers. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Washington, DC, 85--96.
[24]
Nilsson, J., Landin, A., and Stenstrom, P. 2003. The coherence predictor cache: A resource-efficient and accurate coherence prediction infrastructure. In Proceedings of the International Symposium on Parallel and Distributed Processing. IEEE Computer Society, Washington, DC, 10--17.
[25]
Ramalingam, G. 1994. The undecidability of aliasing. ACM Trans. Program. Lang. Syst. 16, 5, 1467--1471.
[26]
Rowen, C. 2004. Engineering the Complex SOC. Fast, Flexible Design with Configurable Processors. Prentice Hall, NJ.
[27]
Rugina, R. and Rinard, M. 1999. Pointer analysis for multithreaded programs. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 34, 5, 77--90.
[28]
Salcianu, A. and Rinard, M. 2001. Pointer and escape analysis for multithreaded programs. In Proceedings of the Symposium on Principles and Practices of Parallel Programming (PPoPP), 12--23.
[29]
Saldanha, C. and Lipasti, M. 2001. Power efficient cache-coherence. In Workshop on Memory Performance Issues.
[30]
Sangiovanni-Vincentelli, A. and Martin, G. 2001. Platform-Based design and software design methodology for embeddedsystems. IEEE Des. Test Comput. 18, 23--33.
[31]
Singh, J. P., Weber, W.-D., and Gupta, A. 1992. Splash: Stanford parallel applications for shared-memory. SIGARCH Comput. Archit. News 20, 1, 5--44.
[32]
Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0: An integrated cache timing, power and area model. Tech. Rep., HP Laboratories, Palo Alto, CA. June.
[33]
Wenisch, T. F., Somogyi, S., Hardavellas, N., Kim, J., Ailamaki, A., and Falsafi, B. 2005. Temporal streaming of shared memory. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Washington, DC, 222--233.
[34]
Wolf, W. 2001. Computers as Components: Principles of Embedded Computing Systems Design. Morgan Kaufmann, San Francisco, CA.
[35]
Wolf, W. 2004. The future of multiprocessor systems-on-chips. In Proceedings of the Design Automation Conference (DAC), 681--685.

Cited By

View all

Index Terms

  1. Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 13, Issue 1
        January 2008
        496 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/1297666
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Journal Family

        Publication History

        Published: 06 February 2008
        Accepted: 01 July 2007
        Revised: 01 May 2007
        Received: 01 May 2006
        Published in TODAES Volume 13, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Cache coherence
        2. embedded multiprocessors
        3. low-power embedded systems
        4. snoop filtering

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 03 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media