Towards minimizing execution delays on dynamically reconfigurable processors: a case study on REDEFINE
Pages 77 - 86
Abstract
In Dynamically Reconfigurable Processors (DRPs), compilation involves breaking an application into sub-tasks for piecewise execution on the fabric. These sub-tasks are sequenced based on data and control dependences. In DRPs, sub-task prefetching is used to hide the reconfiguration time while another sub-task executes. In REDEFINE, our target DRP, subtasks are referred to as HyperOps. Determining the successor for a HyperOp requires merging information from the control flow graph and the HyperOp dataflow graph. Succession in many cases is data dependent. Since hardware branch predictors cannot be applied due to the non-binary branches, we employ a speculative prefetch unit together with a profile based prediction scheme. Simulation results show around 7-33% reduction in overall execution time, when compared to the execution time without prefetching. We observe better performance when fewer resources on the fabric are used to execute prefetched HyperOps.
References
[1]
Manvi Agarwal, S. K. Nandy, J. v. Eijndhoven, and S. Balakrishanan. Multithreaded architectural support for speculative trace scheduling in vliw processors. In Proceedings of the 15th symposium on Integrated circuits and systems design, pages 43, Washington, DC, USA, 2002. IEEE Computer Society.
[2]
Mythri Alle, Keshavan Varadarajan, Alexander Fell, Ramesh Reddy C., Nimmy Joseph, Saptarsi Das, Prasenjit Biswas, Jugantor Chetia, Adarsh Rao, S. K. Nandy, and Ranjani Narayan. Redefine: Runtime reconfigurable polymorphic asic. ACM Trans. Embed. Comput. Syst., 9(2):1--48, 2009.
[3]
Mythri Alle, Keshavan Varadarajan, Alexander Fell, S. K. Nandy, and Ranjani Narayan. Compiling techniques for coarse grained runtime reconfigurable architectures. In Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications, pages 204--215, Berlin, Heidelberg, 2009. Springer-Verlag.
[4]
Hideharu Amano. A survey on dynamically reconfigurable processors. IEICE - Trans. Commn, E89-B(12):3179--3187, December 2006.
[5]
Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO'04: Proceedings of the international symposium on Code generation and optimization, page 75, Washington, DC, USA, 2004. IEEE Computer Society.
[6]
André DeHon and John Wawrzynek. Reconfigurable computing: what, why, and implications for design automation. In DAC-99: Proceedings of the 36th annual ACM/IEEE Design Automation Conference, pages 610--615, New York, NY, USA, 1999. ACM.
[7]
Alexander Fell, Mythri Alle, Keshavan Varadarajan, Prasenjit Biswas,Saptarsi Das, Jugantor Chetia, S. K. Nandy, and Ranjani Narayan. Streaming fft on redefine-v2: an application-architecture design space exploration. In Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, CASES-09, pages 127--136, New York, NY, USA, 2009. ACM.
[8]
Jinhwan Kim, Jeonghun Cho, and Tag Gon Kim. Temporal partitioning to amortize reconfiguration overhead for dynamically reconfigurable architectures. IEICE - Trans. Inf. Syst., E90-D(12):1977--1985, 2007.
[9]
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. Effective Compiler Support for Predicated Execution Using the Hyperblock. In MICRO 25: Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 45--54, Portland, Oregon, December 1-4, 1992. IEEE Computer Society TC-MICRO and ACM SIGMICRO.
[10]
Michael. S Schlansker and B. Ramakrishna Rau. EPIC: An Architecture for Instruction Level Parallel Processors. Technical Report HPL-1999-111, HP Laboratories, February 2000.
[11]
Karthikeya M. Gajjala Purna and Dinesh Bhatia. Temporal partitioning and scheduling for reconfigurable computing. In FCCM-98: Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, page 329, Washington, DC, USA, 1998. IEEE Computer Society.
[12]
Eric Rotenberg, Steve Bennett, and James E. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 24--35, Washington, DC, USA, 1996. IEEE Computer Society.
[13]
James E. Smith. A study of branch prediction strategies. In 25 years of the international symposia on Computer architecture (selected papers), ISCA-98, pages 202--215, New York, NY, USA, 1998. ACM.
[14]
Takayuki SUGAWARA, Keisuke IDE, and Tomoyoshi SATO. Dynamically reconfigurable processor implemented with ipflex's dapdna technology(recornfigurable systems)(special section reconfigurable systems). IEICE transactions on information and systems, 87(8):1997--2003, 2004-08-01.
[15]
Atsushi Takayama, Yuichiro Shibata, Keisuke Iwai, and Hideharu Amano. Dataflow partitioning and scheduling algorithms for wasmii, a virtual hardware. In FPL-00: Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications, pages 685--694, London, UK, 2000. Springer-Verlag.
[16]
Takao Toi, Noritsugu Nakamura, Yoshinosuke Kato, Toru Awashima, Kazutoshi Wakabayashi, and Li Jing. High-level synthesis challenges and solutions for a dynamically reconfigurable processor. In ICCAD-06: Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, pages 702--708, New York, NY, USA, 2006. ACM.
[17]
Vasutan Tunbunheng and Hideharu Amano. A retargetable compiler based on graph representation for dynamically reconfigurable processor arrays. IEICE - Trans. Inf. Syst., E91-D(11):2655--2665, 2008.
Index Terms
- Towards minimizing execution delays on dynamically reconfigurable processors: a case study on REDEFINE
Recommendations
Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution
Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution(1, 2) and predicated execution(3---9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution ...
Comments
Information & Contributors
Information
Published In
October 2010
276 pages
Copyright © 2010 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
- CEDA
- IEEE CAS
- IEEE CS
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 24 October 2010
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
Acceptance Rates
Overall Acceptance Rate 52 of 230 submissions, 23%
Upcoming Conference
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 195Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in