research-article

Cost effective speculation with the omnipredictor

Authors:

André SeznecAuthors Info & Claims

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

Article No.: 25, Pages 1 - 13

https://doi.org/10.1145/3243176.3243208

Published: 01 November 2018 Publication History

Abstract

Modern superscalar processors heavily rely on out-of-order and speculative execution to achieve high performance. The conditional branch predictor, the indirect branch predictor and the memory dependency predictor are among the key structures that enable efficient speculative out-of-order execution. Therefore, processors implement these three predictors as distinct hardware components.

In this paper, we propose the omnipredictor that predicts conditional branches, memory dependencies and indirect branches at state-of-the-art accuracies without paying the hardware cost of the memory dependency predictor and the indirect jump predictor.

We first show that the TAGE prediction scheme based on global branch history can be used to concurrently predict both branch directions and memory dependencies. Thus, we unify these two predictors within a regular TAGE conditional branch predictor whose prediction is interpreted according to the type of the instruction accessing the predictor. Memory dependency prediction is provided at almost no hardware overhead.

We further show that the TAGE conditional predictor can be used to accurately predict indirect branches through using TAGE entries as pointers to Branch Target Buffer entries. Indirect target prediction can be blended into the conditional predictor along with memory dependency prediction, forming the omnipredictor.

References

[1]

D. R. Kaeli and P. G. Emma, "Branch history table prediction of moving target branches due to subroutine returns," in Proceedings of the International Symposium on Computer Architecture, pp. 34--42, 1991.

Digital Library

[2]

J. E. Smith, "A study of branch prediction strategies," in Proceedings of the International Symposium on Computer Architecture, pp. 135--148, 1981.

Digital Library

[3]

P.-Y. Chang, E. Hao, and Y. Patt, "Target prediction for indirect jumps," in Proceedings of the Annual International Symposium on Computer Architecture, pp. 274--283, 1997.

Digital Library

[4]

J. K. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," in IEEE Comput. Mag., pp. 6--22, 1984.

Digital Library

[5]

T.-Y. Yeh and Y. Patt, "Alternative implementations of two-level adaptive branch prediction," in Proceedings of the Annual International Symposium on Computer Architecture, pp. 124--134, 1992.

Digital Library

[6]

T.-Y. Yeh and Y. N. Patt, "Two-level adaptive training branch prediction," in Proceedings of the international symposium on Microarchitecture, pp. 51--61, 1991.

Digital Library

[7]

D. Jiménez, "Fast path-based neural branch prediction," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, dec 2003.

Digital Library

[8]

D. Jiménez, "Piecewise linear branch prediction," in Proceedings of the 32nd Annual International Symposium on Computer Architecture, june 2005.

Digital Library

[9]

A. Seznec, J. S. Miguel, and J. Albericio, "The inner most loop iteration counter: a new dimension in branch history," in Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA, December 5--9, 2015, pp. 347--357, 2015.

Digital Library

[10]

S. McFarling, "Combining branch predictors," TN 36, DEC WRL, June 1993.

[11]

D. A. Jiménez, S. W. Keckler, and C. Lin, "The impact of delay on the design of branch predictors," in Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pp. 67--76, 2000.

Digital Library

[12]

S.-T. Pan, K. So, and J. T. Rahmen, "Improving the accuracy of dynamic branch prediction using branch correlation," in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ACM, 1992.

Digital Library

[13]

I.-C. K. Chen, J. T. Coffey, and T. N. Mudge, "Analysis of branch prediction via data compression," in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, 1996.

Digital Library

[14]

A. Seznec, "Analysis of the o-geometric history length branch predictor," in Proceedings of the International Symposium on Computer Architecture, pp. 394--405, 2005.

Digital Library

[15]

A. N. Eden and T. Mudge, "The yags branch prediction scheme," in Proceedings of the international symposium on Microarchitecture, pp. 69--77, 1998.

Digital Library

[16]

A. Seznec and P. Michaud, "A case for (partially) tagged geometric history length branch prediction," Journal of Instruction Level Parallelism, vol. 8, pp. 1--23, 2006.

[17]

A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides, "Design tradeoffs for the alpha EV8 conditional branch predictor," in Proceedings of the International Symposium on Computer Architecture, pp. 295--306, 2002.

Digital Library

[18]

D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 197--206, 2001.

Digital Library

[19]

A. Seznec, "A new case for the tage branch predictor," in Proceedings of International Symposium on Microarchitecture, pp. 117--127, 2011.

Digital Library

[20]

K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark, "Improving prediction for procedure returns with return-address-stack repair mechanisms," in Proceedings of the International Symposium on Microarchitecture, pp. 259--271, 1998.

Digital Library

[21]

A. Moshovos, Memory dependence prediction. PhD thesis, University of Wisconsin-Madison, 1998.

Digital Library

[22]

R. E. Kessler, "The alpha 21264 microprocessor," IEEE Micro, vol. 19, no. 2, pp. 24--36, 1999.

Digital Library

[23]

G. Z. Chrysos and J. S. Emer, "Memory dependence prediction using store sets," in Proceedings of the International Symposium on Computer Architecture, pp. 142--153, 1998.

Digital Library

[24]

S. Subramaniam and G. H. Loh, "Store vectors for scalable memory dependence prediction and scheduling," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 65--76, 2006.

[25]

B. Sinharoy, J. Van Norstrand, R. Eickemeyer, H. le, J. Leenstra, D. Nguyen, B. Konigsburg, K. Ward, M. Brown, J. Moreira, D. Levitan, S. Tung, D. Hrusecky, J. Bishop, M. Gschwind, M. Boersma, M. Kroener, M. Kaltenbach, T. Karkhanis, and K. Fernsler, "Ibm power8 processor core microarchitecture," vol. 59, 01 2015.

Digital Library

[26]

K. Driesen and U. Hölzle, "The cascaded predictor: Economical and adaptive branch target prediction," in Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 249--258, 1998.

Digital Library

[27]

E. Rohou, B. N. Swamy, and A. Seznec, "Branch prediction and the performance of interpreters: Don't trust folklore," in Proceedings of the International Symposium on Code Generation and Optimization, pp. 103--114, 2015.

Digital Library

[28]

H. Kim, J. A. Joao, O. Mutlu, C. J. Lee, Y. N. Patt, and R. Cohn, "Virtual program counter (VPC) prediction: Very low cost indirect branch prediction using conditional branch prediction hardware," IEEE Transactions on Computers, vol. 58, no. 9, pp. 1153--1170, 2009.

Digital Library

[29]

J. A. Joao, O. Mutlu, H. Kim, R. Agarwal, and Y. N. Patt, "Improving the performance of object-oriented languages with dynamic predication of indirect jumps," in Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems, pp. 80--90, 2008.

Digital Library

[30]

S. Subramaniam and G. H. Loh, "Fire-and-forget: Load/store scheduling with no store queue at all," in Proceedings of the international symposium on Microarchitecture, pp. 273--284, 2006.

Digital Library

[31]

A. Moshovos and G. S. Sohi, "Streamlining inter-operation memory communication via data dependence prediction," in Proceedings of the International symposium on Microarchitecture, pp. 235--245, 1997.

Digital Library

[32]

G. S. Tyson and T. M. Austin, "Improving the accuracy and performance of memory communication through renaming," in Proceedings of the International Symposium on Microarchitecture, pp. 218--227, 1997.

Digital Library

[33]

T. Sha, M. M. K. Martin, and A. Roth, "NoSQ: Store-load communication without a store queue," in Proceedings of the International Symposium on Microarchitecture, pp. 285--296, IEEE Computer Society, 2006.

Digital Library

[34]

A. Moshovos and G. S. Sohi, "Read-after-read memory dependence prediction," in Proceedings of the International Symposium on Microarchitecture, pp. 177--185, 1999.

Digital Library

[35]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, pp. 1--7, Aug. 2011.

Digital Library

[36]

A. Seznec, "A 64-kbytes ittage indirect branch predictor," in JWAC-2: Championship Branch Prediction, 2011.

[37]

E. Tune, D. Liang, D. M. Tullsen, and B. Calder, "Dynamic prediction of critical path instructions," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 185--195, 2001.

Digital Library

[38]

E. Tune, D. M. Tullsen, and B. Calder, "Quantifying instruction criticality," in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 104--113, 2002.

Digital Library

[39]

B. A. Fields, S. Rubin, and R. Bodík, "Focusing processor policies via critical-path prediction," in Proceedings of the International Symposium on Computer Architecture, pp. 74--85, 2001.

Digital Library

[40]

R. E. Kessler, E. J. Mclellan, and D. A. Webb, "The Alpha 21264 microprocessor architecture," in Proceedings of the International Conference on Computer Design, pp. 90--95, 1998.

Digital Library

[41]

A. Yoaz, R. Erez, M.and Ronen, and S. Jourdan, "Speculation techniques for improving load related instruction scheduling," in Proceedings of the International Symposium on Computer Architecture, vol. 27, pp. 42--53, 1999.

Digital Library

Cited By

Kim SRos A(2024)Effective Context-Sensitive Memory Dependence Prediction2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00045(515-527)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00045
Panayi LGandhi RWhittaker JChouliaras VBerger MKelly P(2024)Improving Memory Dependence Prediction with Static AnalysisArchitecture of Computing Systems10.1007/978-3-031-66146-4_20(301-315)Online publication date: 1-Aug-2024
https://doi.org/10.1007/978-3-031-66146-4_20
Kalaitzidis KSeznec A(2020)Leveraging Value Equality Prediction for Value SpeculationACM Transactions on Architecture and Code Optimization10.1145/343682118:1(1-20)Online publication date: 30-Dec-2020
https://dl.acm.org/doi/10.1145/3436821

Recommendations

Cost Effective Memory Dependence Prediction using Speculation Levels and Color Sets
PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques

Memory dependence prediction allows out-of-order issue processors to achieve high degrees of instruction level parallelism by issuing load instructions at the earliest time without causing a significant number of memory order violations. We present a ...
Complexity-Effective rename table design for rapid speculation recovery
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing Systems

Register renaming is a widely used technique to remove false data dependencies in contemporary superscalar microprocessors. The register rename logic includes a mapping table that holds the physical register identifiers assigned to each architectural ...
Boosting SMT Performance by Speculation Control
IPDPS '01: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS'01) - Volume 1

Simultaneous Multithreading (SMT) is a technique that permits multiple threads to execute in parallel within a single processor. Usually, an SMT processor uses shared instruction queues to collect instructions from the different threads. Hence, an SMT ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

November 2018

494 pages

ISBN:9781450359863

DOI:10.1145/3243176

General Chair:
Skevos Evripidou
University of Cyprus, Cyprus
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Michael O'Boyle
University of Edinburgh, UK

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IFIP WG 10.3: IFIP WG 10.3
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

PACT '18

Sponsor:

SIGARCH

PACT '18: International conference on Parallel Architectures and Compilation Techniques

November 1 - 4, 2018

Limassol, Cyprus

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)5

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim SRos A(2024)Effective Context-Sensitive Memory Dependence Prediction2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00045(515-527)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00045
Panayi LGandhi RWhittaker JChouliaras VBerger MKelly P(2024)Improving Memory Dependence Prediction with Static AnalysisArchitecture of Computing Systems10.1007/978-3-031-66146-4_20(301-315)Online publication date: 1-Aug-2024
https://doi.org/10.1007/978-3-031-66146-4_20
Kalaitzidis KSeznec A(2020)Leveraging Value Equality Prediction for Value SpeculationACM Transactions on Architecture and Code Optimization10.1145/343682118:1(1-20)Online publication date: 30-Dec-2020
https://dl.acm.org/doi/10.1145/3436821

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents