skip to main content
10.1145/3466752.3480119acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

Post-Fabrication Microarchitecture

Published: 17 October 2021 Publication History

Abstract

Microarchitectural enhancements that improve performance generally, across many workloads, are favored in superscalar processor design. Targeting general performance is necessary but it also constrains some microarchitecture innovation. We explore relieving this constraint, via a new paradigm called Post-Fabrication Microarchitecture (PFM). A high-performance superscalar core is coupled with a reconfigurable logic fabric, RF. A programmable interface, or Agent, allows for RF to observe and microarchitecturally intervene at key pipeline stages of the superscalar core. New microarchitectural components, specific to applications, are synthesized on-demand to RF. All instructions still flow through the superscalar pipeline, as usual, but their execution is streamlined (better instructions per cycle (IPC)) through microarchitectural intervention by RF. Our research shows that one can achieve large speedups of individual applications, by analyzing their bottlenecks and providing customized microarchitectural solutions to target these bottlenecks. Examples of PFM use-cases explored in this paper include custom branch predictors and data prefetchers.

References

[1]
Muawya Al-Otoom, Elliott Forbes, and Eric Rotenberg. 2010. EXACT: Explicit Dynamic-Branch Prediction with Active Updates. In Proceedings of the 7th ACM International Conference on Computing Frontiers(CF ’10). Association for Computing Machinery, New York, NY, USA, 165–176. https://doi.org/10.1145/1787275.1787321
[2]
Peter M. Athanas and Harvey F. Silverman. 1993. Processor Reconfiguration Through Instruction-Set Metamorphosis. Computer 26, 3 (1993), 11–18. https://doi.org/10.1109/2.204677
[3]
Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP Benchmark Suite. arxiv:1508.03619
[4]
Jamison D. Collins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan Lavery, and John P. Shen. 2001. Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. In Proceedings of the 28th Annual International Symposium on Computer Architecture(ISCA ’01). Association for Computing Machinery, New York, NY, USA, 14–25. https://doi.org/10.1145/379240.379248
[5]
Philippe Coussy and Adam Morawiec. 2010. High-Level Synthesis: From Algorithm to Digital Circuit (1st ed.). Springer Publishing Company, Incorporated.
[6]
Alok Garg and Michael C. Huang. 2008. A Performance-Correctness Explicitly-Decoupled Architecture. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-41). IEEE Computer Society, USA, 306–317. https://doi.org/10.1109/MICRO.2008.4771800
[7]
Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. 1999. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. In Proceedings of the 26th Annual International Symposium on Computer Architecture(ISCA ’99). IEEE Computer Society, USA, 28–39. https://doi.org/10.1109/ISCA.1999.765937
[8]
Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically Specialized Datapaths for Energy Efficient Computing. In Proceedings of the 17th Annual IEEE International Symposium on High Performance Computer Architecture(HPCA ’11). 503–514. https://doi.org/10.1109/HPCA.2011.5749755
[9]
Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. 2011. Bundled Execution of Recurring Traces for Energy-Efficient General Purpose Processing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-44). Association for Computing Machinery, New York, NY, USA, 12–23. https://doi.org/10.1145/2155620.2155623
[10]
John R. Hauser and John Wawrzynek. 1997. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In Proceedings of the 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 12–21. https://doi.org/10.1109/FPGA.1997.624600
[11]
Chanchal Kumar, Aayush Chaudhary, Shubham Bhawalkar, Utkarsh Mathur, Saransh Jain, Adith Vastrad, and Eric Rotenberg. 2020. Post-Silicon Microarchitecture. IEEE Computer Architecture Letters 19, 1 (2020), 26–29. https://doi.org/10.1109/LCA.2020.2978841
[12]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[13]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-42). Association for Computing Machinery, New York, NY, USA, 469–480. https://doi.org/10.1145/1669112.1669172
[14]
Zach Purser, Karthik Sundaramoorthy, and Eric Rotenberg. 2000. A Study of Slipstream Processors. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture(MICRO-33). Association for Computing Machinery, New York, NY, USA, 269–280. https://doi.org/10.1145/360128.360155
[15]
Rahul Razdan and Michael D. Smith. 1994. A High-Performance Microarchitecture with Hardware-Programmable Functional Units. In Proceedings of the 27th Annual International Symposium on Microarchitecture(MICRO-27). Association for Computing Machinery, New York, NY, USA, 172–180. https://doi.org/10.1145/192724.192749
[16]
Amir Roth and Gurindar S. Sohi. 2001. Speculative Data-Driven Multithreading. In Proceedings of the 7th Annual IEEE International Symposium on High-Performance Computer Architecture(HPCA ’01). 37–48. https://doi.org/10.1109/HPCA.2001.903250
[17]
André Seznec. 2016. TAGE-SC-L Branch Predictors Again. In 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5). Seoul, South Korea. https://hal.inria.fr/hal-01354253
[18]
Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS X). Association for Computing Machinery, New York, NY, USA, 45–57. https://doi.org/10.1145/605397.605403
[19]
Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chishti. 2015. Efficiently Prefetching Complex Address Patterns. In Proceedings of the 48th Annual International Symposium on Microarchitecture(MICRO-48). Association for Computing Machinery, New York, NY, USA, 141–152. https://doi.org/10.1145/2830772.2830793
[20]
Vinesh Srinivasan, Rangeen Basu Roy Chowdhury, and Eric Rotenberg. 2020. Slipstream Processors Revisited: Exploiting Branch Sets. In Proceedings of the 47th Annual ACM/IEEE International Symposium on Computer Architecture(ISCA ’20). IEEE Press, 105–117. https://doi.org/10.1109/ISCA45697.2020.00020
[21]
Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. Technical Report UCB/EECS-2014-54. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-54.html
[22]
Matthew A. Watkins and David H. Albonesi. 2011. ReMAP: A Reconfigurable Architecture for Chip Multiprocessors. IEEE Micro 31, 1 (2011), 65–77. https://doi.org/10.1109/MM.2011.14
[23]
Matthew A. Watkins, Mark J. Cianchetti, and David H. Albonesi. 2008. Shared Reconfigurable Architectures for CMPs. In 2008 International Conference on Field Programmable Logic and Applications. 299–304. https://doi.org/10.1109/FPL.2008.4629948
[24]
Ralph D. Wittig and Paul Chow. 1996. OneChip: An FPGA Processor With Reconfigurable Logic. In 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines. 126–135. https://doi.org/10.1109/FPGA.1996.564773
[25]
Zhi Alex Ye, Andreas Moshovos, Scott Hauck, and Prithviraj Banerjee. 2000. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In Proceedings of the 27th Annual International Symposium on Computer Architecture(ISCA ’00). Association for Computing Machinery, New York, NY, USA, 225–235. https://doi.org/10.1145/339647.339687
[26]
Craig Zilles and Gurindar Sohi. 2001. Execution-Based Prediction Using Speculative Slices. In Proceedings of the 28th Annual International Symposium on Computer Architecture(ISCA ’01). Association for Computing Machinery, New York, NY, USA, 2–13. https://doi.org/10.1145/379240.379246

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. branch prediction
  2. field-programmable gate array (FPGA)
  3. instruction-level parallelism (ILP)
  4. pre-execution
  5. prefetching
  6. reconfigurable logic
  7. superscalar processor

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)265
  • Downloads (Last 6 weeks)46
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media