research-article

Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

Authors:

Madhura Purnaprajna,

Mario Porrmann,

Ulrich Rueckert,

Michael Hussmann,

Uwe KastensAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 3, Issue 3

Article No.: 17, Pages 1 - 25

https://doi.org/10.1145/1839480.1839487

Published: 01 September 2010 Publication History

Abstract

In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this article, a unified hardware-software approach for the design of our QuadroCore is presented. This design flow is enabled via compiler-driven reconfiguration which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle.

The QuadroCore has been implemented on Xilinx XC2V6000 for functional validation and on UMC’s 90nm standard cell technology for performance estimation. A diverse set of applications have been mapped onto the reconfigurable multiprocessor to meet orthogonal performance characteristics in terms of time and power. Speedup measurements show a 2--11 times performance increase in comparison to a single processor. Additionally, the reconfiguration scheme has been applied to save power in data-parallel applications. Gate-level simulations have been performed to measure the power-performance trade-offs for two computationally complex applications. The power reports confirm that introducing this scheme of reconfiguration results in power savings in the range of 15--24%.

References

[1]

}}Barretta, D., Fornaciari, W., Sami, M., and Pau, D. 2002. SIMD extension to VLIW multicluster processors for embedded applications. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’02). IEEE Computer Society, Los Alamitos, CA, 523.

Digital Library

[2]

}}Bonorden, O., Brüls, N., Le, D. K., Kastens, U., Meyer auf der Heide, F., Niemann, J.-C., Porrmann, M., Rueckert, U., Slowik, A., and Thies, M. 2003. A holistic methodology for network processor design. In Proceedings of the Workshop on High-Speed Local Networks held in conjunction with the 28th Annual IEEE Conference on Local Computer Networks (LCN’03). 583--592.

Digital Library

[3]

}}Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 2, 171--210.

Digital Library

[4]

}}Dietz, H., Schwederski, T., O’Keefe, M., and Zaafrani, A. 1989. Static synchronization beyond VLIW. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’89). ACM Press, New York, 416--425.

Digital Library

[5]

}}Dreesen, R., Hussmann, M., Thies, M., and Kastens, U. 2007. Register allocation for processors with dynamically reconfigurable register banks. In Proceedings of the 5th Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 5th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’07).

[6]

}}Ellis, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. MIT Press.

Digital Library

[7]

}}Fischer, D., Teich, J., Weper, R., and Thies, M. 2003. BUILDABONG: A framework for architecture/compiler co-exploration for ASIPs. J. Circ. Syst. Comput. 12, 3, 353--375.

[8]

}}Gonzalez, R. E. 2006. A software-configurable processor architecture. IEEE Micro 26, 5, 42--51.

Digital Library

[9]

}}Gruenewald, M., Kastens, U., Le, D. K., Niemann, J.-C., Porrmann, M., Rueckert, U., Thies, M., and Slowik, A. 2004. Network application driven instruction set extensions for embedded processing clusters. In Proceedings of the International Conference on Parallel Computing in Electrical Engineering (PARELEC’04). 209--214.

Digital Library

[10]

}}Gupta, R. 1990. Employing register channels for the exploitation of instruction level parallelism. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’90). ACM Press, New York, 118--127.

Digital Library

[11]

}}Halfhill, T. R. 2006. Ambric’s new parallel processor. Tech. rep., (microprocessors report). http://www.ambric.com.

[12]

}}Hennessy, J. L. and Patterson, D. L. 2006. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco, CA.

Digital Library

[13]

}}Hussmann, M. 2008. Compiler-Driven dynamic reconfiguration of architectural variants. Ph.D. thesis, University of Paderborn.

[14]

}}Hussmann, M., Thies, M., and Kastens, U. 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 3rd IEEE/ACM International Symposium on Code Generation and Optimization (CGO’05).

[15]

}}Hussmann, M., Thies, M., Kastens, U., Purnaprajna, M., Porrmann, M., and Rueckert, U. 2007. Compiler-driven reconfiguration of multiprocessors. In Proceedings of the Workshop on Application Specific Processors (WASP) held in conjunction with the Embedded Systems Week (CODES+ISSS, EMSOFT, and CASES), 3--10.

[16]

}}Ito, M., Hattori, T., Yoshida, Y., Hayase, K., Hayashi, T., Nishii, O., Yasu, Y., Hasegawa, A., Takada, M., Ito, M., Mizuno, H., Uchiyama, K., Odaka, T., Shirako, J., Mase, M., Kimura, K., and Kasahara, H. 2008. An 8640 MIPS SoC with independent power-off control of 8 CPUs and 8 RAMs by an automatic parallelizing compiler. In Digest of Technical Papers on IEEE International Solid-State Circuits Conference (ISSCC’08). 90--598.

[17]

}}Karypis, G. and Kumar, V. 1998. Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’98). IEEE Computer Society, Los Alamitos, CA, 1--13.

Digital Library

[18]

}}Kennedy, K. and Allen, J. R. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, San Francisco, CA.

Digital Library

[19]

}}Kohonen, T. 1989. Self-Organization and Associative Memory. Springer, New York.

Digital Library

[20]

}}Lambrechts, A., Raghavan, P., Leroy, A., Talavera, G., Aa, T., Jayapala, M., Catthoor, F., Verkest, D., Deconinck, G., Corporaal, H., Robert, F., and Carrabina, J. 2005. Power breakdown analysis for a heterogeneous NoC platform running a video application. In Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP’05). 179--184.

Digital Library

[21]

}}Larsen, S. and Amarasinghe, S. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM Press, New York, 145--156.

Digital Library

[22]

}}Larsen, S., Rugina, R., and Amarasinghe, S. 2000. Alignment analysis. Tech. rep. LCS-TM-605, Massachusetts Institute of Technology.

[23]

}}Mei, B., Lambrechts, A., Verkest, D., Mignolet, J.-Y., and Lauwereins, R. 2005. Architecture exploration for a reconfigurable architecture template. IEEE Des. Test 22, 2, 90--101.

Digital Library

[24]

}}Muchnik, S. S. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, San Francisco, CA.

Digital Library

[25]

}}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2006. Giganetic: A scalable embedded on-chip multiprocessor architecture for network applications. In Proceedings of the Conference on Architecture of Computing Systems (ARCS’06).

Digital Library

[26]

}}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2007. Resource efficiency of the GigaNetIC chip multiprocessor architecture. J. Syst. Archit. 53, 5-6, 285--299 (Special issue on architectural premises for pervasive computing).

Digital Library

[27]

}}Porrmann, M., Hagemeyer, J., Romoth, J., and Strugholtz, M. 2009. Rapid prototyping of next-generation multiprocessor SoCs. In Proceedings of the Semiconductor Conference (SCD’09).

[28]

}}Purnaprajna, M., Puttmann, C., and Porrmann, M. 10-14 March 2008. Power aware reconfigurable multiprocessor for elliptic curve cryptography. In Proceedings of the Design, Automation, and Test in Europe (DATE’08). 1462--1467.

Digital Library

[29]

}}Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M. S., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., and Burger, D. 2006. Distributed microarchitectural protocols in the trips prototype processor. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Los Alamitos, CA, 480--491.

Digital Library

[30]

}}Silicore. 2002. Wishbone system-on-chip (SoC) interconnection architecture for portable IP cores. Tech. rep. http://www.opencores.org.

[31]

}}Zhong, H., Lieberman, S. A., and Mahlke, S. A. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE Computer Society, Los Alamitos, CA, 25--36.

Digital Library

Cited By

Anacker HDellnitz MFlaßkamp KGroesbrink SHartmann PHeinzemann CHorenkamp CKleinjohann BKleinjohann LKorf SKrüger MMüller WOber-Blöbaum SOberthür SPorrmann MPriesterjahn CRadkowski RRasche CRieke JRingkamp MStahl KSteenken DStöcklein JTimmermann RTrächtler AWitting KXie TZiegert S(2014)Methods for the Design and DevelopmentDesign Methodology for Intelligent Technical Systems10.1007/978-3-642-45435-6_5(183-350)Online publication date: 2014
https://doi.org/10.1007/978-3-642-45435-6_5

Index Terms

Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time
RECONFIG '09: Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs

Large applications that exceed available FPGA resources must time-multiplex these resources using smaller hardware modules. In order to orchestrate this time-multiplexing, temporal partitioning partitions these hardware modules into multiple subsets, ...
Exploiting Partial Runtime Reconfiguration for High-Performance Reconfigurable Computing

Runtime Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might ...
Wormhole run-time reconfiguration: conceptualization and VLSI design of a high performance computing system

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 3, Issue 3

September 2010

231 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/1839480

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2010

Accepted: 01 May 2009

Revised: 01 March 2009

Received: 01 July 2008

Published in TRETS Volume 3, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
320
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Anacker HDellnitz MFlaßkamp KGroesbrink SHartmann PHeinzemann CHorenkamp CKleinjohann BKleinjohann LKorf SKrüger MMüller WOber-Blöbaum SOberthür SPorrmann MPriesterjahn CRadkowski RRasche CRieke JRingkamp MStahl KSteenken DStöcklein JTimmermann RTrächtler AWitting KXie TZiegert S(2014)Methods for the Design and DevelopmentDesign Methodology for Intelligent Technical Systems10.1007/978-3-642-45435-6_5(183-350)Online publication date: 2014
https://doi.org/10.1007/978-3-642-45435-6_5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents