research-article

Public Access

Post-Fabrication Microarchitecture

Authors:

Chanchal Kumar,

Anirudh Seshadri,

Aayush Chaudhary,

Shubham Bhawalkar,

Eric RotenbergAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 1270 - 1281

https://doi.org/10.1145/3466752.3480119

Published: 17 October 2021 Publication History

All formats PDF

Abstract

Microarchitectural enhancements that improve performance generally, across many workloads, are favored in superscalar processor design. Targeting general performance is necessary but it also constrains some microarchitecture innovation. We explore relieving this constraint, via a new paradigm called Post-Fabrication Microarchitecture (PFM). A high-performance superscalar core is coupled with a reconfigurable logic fabric, RF. A programmable interface, or Agent, allows for RF to observe and microarchitecturally intervene at key pipeline stages of the superscalar core. New microarchitectural components, specific to applications, are synthesized on-demand to RF. All instructions still flow through the superscalar pipeline, as usual, but their execution is streamlined (better instructions per cycle (IPC)) through microarchitectural intervention by RF. Our research shows that one can achieve large speedups of individual applications, by analyzing their bottlenecks and providing customized microarchitectural solutions to target these bottlenecks. Examples of PFM use-cases explored in this paper include custom branch predictors and data prefetchers.

References

[1]

Muawya Al-Otoom, Elliott Forbes, and Eric Rotenberg. 2010. EXACT: Explicit Dynamic-Branch Prediction with Active Updates. In Proceedings of the 7th ACM International Conference on Computing Frontiers(CF ’10). Association for Computing Machinery, New York, NY, USA, 165–176. https://doi.org/10.1145/1787275.1787321

Digital Library

[2]

Peter M. Athanas and Harvey F. Silverman. 1993. Processor Reconfiguration Through Instruction-Set Metamorphosis. Computer 26, 3 (1993), 11–18. https://doi.org/10.1109/2.204677

Digital Library

[3]

Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP Benchmark Suite. arxiv:1508.03619

[4]

Jamison D. Collins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan Lavery, and John P. Shen. 2001. Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In Proceedings of the 28th Annual International Symposium on Computer Architecture(ISCA ’01). Association for Computing Machinery, New York, NY, USA, 14–25. https://doi.org/10.1145/379240.379248

Digital Library

[5]

Philippe Coussy and Adam Morawiec. 2010. High-Level Synthesis: From Algorithm to Digital Circuit (1st ed.). Springer Publishing Company, Incorporated.

[6]

Alok Garg and Michael C. Huang. 2008. A Performance-Correctness Explicitly-Decoupled Architecture. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-41). IEEE Computer Society, USA, 306–317. https://doi.org/10.1109/MICRO.2008.4771800

Digital Library

[7]

Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. 1999. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. In Proceedings of the 26th Annual International Symposium on Computer Architecture(ISCA ’99). IEEE Computer Society, USA, 28–39. https://doi.org/10.1109/ISCA.1999.765937

[8]

Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically Specialized Datapaths for Energy Efficient Computing. In Proceedings of the 17th Annual IEEE International Symposium on High Performance Computer Architecture(HPCA ’11). 503–514. https://doi.org/10.1109/HPCA.2011.5749755

[9]

Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. 2011. Bundled Execution of Recurring Traces for Energy-Efficient General Purpose Processing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-44). Association for Computing Machinery, New York, NY, USA, 12–23. https://doi.org/10.1145/2155620.2155623

Digital Library

[10]

John R. Hauser and John Wawrzynek. 1997. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In Proceedings of the 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 12–21. https://doi.org/10.1109/FPGA.1997.624600

[11]

Chanchal Kumar, Aayush Chaudhary, Shubham Bhawalkar, Utkarsh Mathur, Saransh Jain, Adith Vastrad, and Eric Rotenberg. 2020. Post-Silicon Microarchitecture. IEEE Computer Architecture Letters 19, 1 (2020), 26–29. https://doi.org/10.1109/LCA.2020.2978841

[12]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.

[13]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-42). Association for Computing Machinery, New York, NY, USA, 469–480. https://doi.org/10.1145/1669112.1669172

Digital Library

[14]

Zach Purser, Karthik Sundaramoorthy, and Eric Rotenberg. 2000. A Study of Slipstream Processors. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture(MICRO-33). Association for Computing Machinery, New York, NY, USA, 269–280. https://doi.org/10.1145/360128.360155

Digital Library

[15]

Rahul Razdan and Michael D. Smith. 1994. A High-Performance Microarchitecture with Hardware-Programmable Functional Units. In Proceedings of the 27th Annual International Symposium on Microarchitecture(MICRO-27). Association for Computing Machinery, New York, NY, USA, 172–180. https://doi.org/10.1145/192724.192749

Digital Library

[16]

Amir Roth and Gurindar S. Sohi. 2001. Speculative Data-Driven Multithreading. In Proceedings of the 7th Annual IEEE International Symposium on High-Performance Computer Architecture(HPCA ’01). 37–48. https://doi.org/10.1109/HPCA.2001.903250

[17]

André Seznec. 2016. TAGE-SC-L Branch Predictors Again. In 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5). Seoul, South Korea. https://hal.inria.fr/hal-01354253

[18]

Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS X). Association for Computing Machinery, New York, NY, USA, 45–57. https://doi.org/10.1145/605397.605403

Digital Library

[19]

Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chishti. 2015. Efficiently Prefetching Complex Address Patterns. In Proceedings of the 48th Annual International Symposium on Microarchitecture(MICRO-48). Association for Computing Machinery, New York, NY, USA, 141–152. https://doi.org/10.1145/2830772.2830793

Digital Library

[20]

Vinesh Srinivasan, Rangeen Basu Roy Chowdhury, and Eric Rotenberg. 2020. Slipstream Processors Revisited: Exploiting Branch Sets. In Proceedings of the 47th Annual ACM/IEEE International Symposium on Computer Architecture(ISCA ’20). IEEE Press, 105–117. https://doi.org/10.1109/ISCA45697.2020.00020

Digital Library

[21]

Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. Technical Report UCB/EECS-2014-54. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-54.html

[22]

Matthew A. Watkins and David H. Albonesi. 2011. ReMAP: A Reconfigurable Architecture for Chip Multiprocessors. IEEE Micro 31, 1 (2011), 65–77. https://doi.org/10.1109/MM.2011.14

Digital Library

[23]

Matthew A. Watkins, Mark J. Cianchetti, and David H. Albonesi. 2008. Shared Reconfigurable Architectures for CMPs. In 2008 International Conference on Field Programmable Logic and Applications. 299–304. https://doi.org/10.1109/FPL.2008.4629948

[24]

Ralph D. Wittig and Paul Chow. 1996. OneChip: An FPGA Processor With Reconfigurable Logic. In 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines. 126–135. https://doi.org/10.1109/FPGA.1996.564773

[25]

Zhi Alex Ye, Andreas Moshovos, Scott Hauck, and Prithviraj Banerjee. 2000. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In Proceedings of the 27th Annual International Symposium on Computer Architecture(ISCA ’00). Association for Computing Machinery, New York, NY, USA, 225–235. https://doi.org/10.1145/339647.339687

Digital Library

[26]

Craig Zilles and Gurindar Sohi. 2001. Execution-Based Prediction Using Speculative Slices. In Proceedings of the 28th Annual International Symposium on Computer Architecture(ISCA ’01). Association for Computing Machinery, New York, NY, USA, 2–13. https://doi.org/10.1145/379240.379246

Digital Library

Cited By

Goudarzi MAzimi RHumecki JRehman FZhang RSethi CBomman TYang Y(2023)By-Software Branch Prediction in LoopsIEEE Computer Architecture Letters10.1109/LCA.2023.330461322:2(129-132)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/LCA.2023.3304613
Li ANing AWentzlaff D(2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Mar-2023
https://doi.org/10.1109/HPCA56546.2023.10070989

Recommendations

Control-Flow Decoupling
MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Mobile and PC/server class processor companies continue to roll out flagship core micro architectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption ...
Slipstream processors revisited: exploiting branch sets
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

Delinquent branches and loads remain key performance limiters in some applications. One approach to mitigate them is pre-execution. Broadly, there are two classes of pre-execution: one class repeatedly forks small helper threads, each targeting an ...
Control-Flow Decoupling: An Approach for Timely, Non-Speculative Branching
Mobile and PC/server class processor companies continue to roll out flagship core microarchitectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption at a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,180
Total Downloads

Downloads (Last 12 months)265
Downloads (Last 6 weeks)46

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Goudarzi MAzimi RHumecki JRehman FZhang RSethi CBomman TYang Y(2023)By-Software Branch Prediction in LoopsIEEE Computer Architecture Letters10.1109/LCA.2023.330461322:2(129-132)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/LCA.2023.3304613
Li ANing AWentzlaff D(2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Mar-2023
https://doi.org/10.1109/HPCA56546.2023.10070989

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents