skip to main content
research-article

Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

Published: 01 September 2010 Publication History

Abstract

In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this article, a unified hardware-software approach for the design of our QuadroCore is presented. This design flow is enabled via compiler-driven reconfiguration which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle.
The QuadroCore has been implemented on Xilinx XC2V6000 for functional validation and on UMC’s 90nm standard cell technology for performance estimation. A diverse set of applications have been mapped onto the reconfigurable multiprocessor to meet orthogonal performance characteristics in terms of time and power. Speedup measurements show a 2--11 times performance increase in comparison to a single processor. Additionally, the reconfiguration scheme has been applied to save power in data-parallel applications. Gate-level simulations have been performed to measure the power-performance trade-offs for two computationally complex applications. The power reports confirm that introducing this scheme of reconfiguration results in power savings in the range of 15--24%.

References

[1]
}}Barretta, D., Fornaciari, W., Sami, M., and Pau, D. 2002. SIMD extension to VLIW multicluster processors for embedded applications. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’02). IEEE Computer Society, Los Alamitos, CA, 523.
[2]
}}Bonorden, O., Brüls, N., Le, D. K., Kastens, U., Meyer auf der Heide, F., Niemann, J.-C., Porrmann, M., Rueckert, U., Slowik, A., and Thies, M. 2003. A holistic methodology for network processor design. In Proceedings of the Workshop on High-Speed Local Networks held in conjunction with the 28th Annual IEEE Conference on Local Computer Networks (LCN’03). 583--592.
[3]
}}Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 2, 171--210.
[4]
}}Dietz, H., Schwederski, T., O’Keefe, M., and Zaafrani, A. 1989. Static synchronization beyond VLIW. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’89). ACM Press, New York, 416--425.
[5]
}}Dreesen, R., Hussmann, M., Thies, M., and Kastens, U. 2007. Register allocation for processors with dynamically reconfigurable register banks. In Proceedings of the 5th Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 5th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’07).
[6]
}}Ellis, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. MIT Press.
[7]
}}Fischer, D., Teich, J., Weper, R., and Thies, M. 2003. BUILDABONG: A framework for architecture/compiler co-exploration for ASIPs. J. Circ. Syst. Comput. 12, 3, 353--375.
[8]
}}Gonzalez, R. E. 2006. A software-configurable processor architecture. IEEE Micro 26, 5, 42--51.
[9]
}}Gruenewald, M., Kastens, U., Le, D. K., Niemann, J.-C., Porrmann, M., Rueckert, U., Thies, M., and Slowik, A. 2004. Network application driven instruction set extensions for embedded processing clusters. In Proceedings of the International Conference on Parallel Computing in Electrical Engineering (PARELEC’04). 209--214.
[10]
}}Gupta, R. 1990. Employing register channels for the exploitation of instruction level parallelism. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’90). ACM Press, New York, 118--127.
[11]
}}Halfhill, T. R. 2006. Ambric’s new parallel processor. Tech. rep., (microprocessors report). http://www.ambric.com.
[12]
}}Hennessy, J. L. and Patterson, D. L. 2006. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco, CA.
[13]
}}Hussmann, M. 2008. Compiler-Driven dynamic reconfiguration of architectural variants. Ph.D. thesis, University of Paderborn.
[14]
}}Hussmann, M., Thies, M., and Kastens, U. 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 3rd IEEE/ACM International Symposium on Code Generation and Optimization (CGO’05).
[15]
}}Hussmann, M., Thies, M., Kastens, U., Purnaprajna, M., Porrmann, M., and Rueckert, U. 2007. Compiler-driven reconfiguration of multiprocessors. In Proceedings of the Workshop on Application Specific Processors (WASP) held in conjunction with the Embedded Systems Week (CODES+ISSS, EMSOFT, and CASES), 3--10.
[16]
}}Ito, M., Hattori, T., Yoshida, Y., Hayase, K., Hayashi, T., Nishii, O., Yasu, Y., Hasegawa, A., Takada, M., Ito, M., Mizuno, H., Uchiyama, K., Odaka, T., Shirako, J., Mase, M., Kimura, K., and Kasahara, H. 2008. An 8640 MIPS SoC with independent power-off control of 8 CPUs and 8 RAMs by an automatic parallelizing compiler. In Digest of Technical Papers on IEEE International Solid-State Circuits Conference (ISSCC’08). 90--598.
[17]
}}Karypis, G. and Kumar, V. 1998. Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’98). IEEE Computer Society, Los Alamitos, CA, 1--13.
[18]
}}Kennedy, K. and Allen, J. R. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, San Francisco, CA.
[19]
}}Kohonen, T. 1989. Self-Organization and Associative Memory. Springer, New York.
[20]
}}Lambrechts, A., Raghavan, P., Leroy, A., Talavera, G., Aa, T., Jayapala, M., Catthoor, F., Verkest, D., Deconinck, G., Corporaal, H., Robert, F., and Carrabina, J. 2005. Power breakdown analysis for a heterogeneous NoC platform running a video application. In Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP’05). 179--184.
[21]
}}Larsen, S. and Amarasinghe, S. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM Press, New York, 145--156.
[22]
}}Larsen, S., Rugina, R., and Amarasinghe, S. 2000. Alignment analysis. Tech. rep. LCS-TM-605, Massachusetts Institute of Technology.
[23]
}}Mei, B., Lambrechts, A., Verkest, D., Mignolet, J.-Y., and Lauwereins, R. 2005. Architecture exploration for a reconfigurable architecture template. IEEE Des. Test 22, 2, 90--101.
[24]
}}Muchnik, S. S. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, San Francisco, CA.
[25]
}}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2006. Giganetic: A scalable embedded on-chip multiprocessor architecture for network applications. In Proceedings of the Conference on Architecture of Computing Systems (ARCS’06).
[26]
}}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2007. Resource efficiency of the GigaNetIC chip multiprocessor architecture. J. Syst. Archit. 53, 5-6, 285--299 (Special issue on architectural premises for pervasive computing).
[27]
}}Porrmann, M., Hagemeyer, J., Romoth, J., and Strugholtz, M. 2009. Rapid prototyping of next-generation multiprocessor SoCs. In Proceedings of the Semiconductor Conference (SCD’09).
[28]
}}Purnaprajna, M., Puttmann, C., and Porrmann, M. 10-14 March 2008. Power aware reconfigurable multiprocessor for elliptic curve cryptography. In Proceedings of the Design, Automation, and Test in Europe (DATE’08). 1462--1467.
[29]
}}Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M. S., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., and Burger, D. 2006. Distributed microarchitectural protocols in the trips prototype processor. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Los Alamitos, CA, 480--491.
[30]
}}Silicore. 2002. Wishbone system-on-chip (SoC) interconnection architecture for portable IP cores. Tech. rep. http://www.opencores.org.
[31]
}}Zhong, H., Lieberman, S. A., and Mahlke, S. A. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE Computer Society, Los Alamitos, CA, 25--36.

Cited By

View all
  • (2014)Methods for the Design and DevelopmentDesign Methodology for Intelligent Technical Systems10.1007/978-3-642-45435-6_5(183-350)Online publication date: 2014

Index Terms

  1. Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 3, Issue 3
    September 2010
    231 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/1839480
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2010
    Accepted: 01 May 2009
    Revised: 01 March 2009
    Received: 01 July 2008
    Published in TRETS Volume 3, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Reconfigurable multiprocessors
    2. compilation for multiprocessors

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Methods for the Design and DevelopmentDesign Methodology for Intelligent Technical Systems10.1007/978-3-642-45435-6_5(183-350)Online publication date: 2014

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media