skip to main content
10.1145/3243176.3243208acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Cost effective speculation with the omnipredictor

Published: 01 November 2018 Publication History

Abstract

Modern superscalar processors heavily rely on out-of-order and speculative execution to achieve high performance. The conditional branch predictor, the indirect branch predictor and the memory dependency predictor are among the key structures that enable efficient speculative out-of-order execution. Therefore, processors implement these three predictors as distinct hardware components.
In this paper, we propose the omnipredictor that predicts conditional branches, memory dependencies and indirect branches at state-of-the-art accuracies without paying the hardware cost of the memory dependency predictor and the indirect jump predictor.
We first show that the TAGE prediction scheme based on global branch history can be used to concurrently predict both branch directions and memory dependencies. Thus, we unify these two predictors within a regular TAGE conditional branch predictor whose prediction is interpreted according to the type of the instruction accessing the predictor. Memory dependency prediction is provided at almost no hardware overhead.
We further show that the TAGE conditional predictor can be used to accurately predict indirect branches through using TAGE entries as pointers to Branch Target Buffer entries. Indirect target prediction can be blended into the conditional predictor along with memory dependency prediction, forming the omnipredictor.

References

[1]
D. R. Kaeli and P. G. Emma, "Branch history table prediction of moving target branches due to subroutine returns," in Proceedings of the International Symposium on Computer Architecture, pp. 34--42, 1991.
[2]
J. E. Smith, "A study of branch prediction strategies," in Proceedings of the International Symposium on Computer Architecture, pp. 135--148, 1981.
[3]
P.-Y. Chang, E. Hao, and Y. Patt, "Target prediction for indirect jumps," in Proceedings of the Annual International Symposium on Computer Architecture, pp. 274--283, 1997.
[4]
J. K. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," in IEEE Comput. Mag., pp. 6--22, 1984.
[5]
T.-Y. Yeh and Y. Patt, "Alternative implementations of two-level adaptive branch prediction," in Proceedings of the Annual International Symposium on Computer Architecture, pp. 124--134, 1992.
[6]
T.-Y. Yeh and Y. N. Patt, "Two-level adaptive training branch prediction," in Proceedings of the international symposium on Microarchitecture, pp. 51--61, 1991.
[7]
D. Jiménez, "Fast path-based neural branch prediction," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, dec 2003.
[8]
D. Jiménez, "Piecewise linear branch prediction," in Proceedings of the 32nd Annual International Symposium on Computer Architecture, june 2005.
[9]
A. Seznec, J. S. Miguel, and J. Albericio, "The inner most loop iteration counter: a new dimension in branch history," in Proceedings of the 48th International Symposium on Microarchitecture, MICRO 2015, Waikiki, HI, USA, December 5--9, 2015, pp. 347--357, 2015.
[10]
S. McFarling, "Combining branch predictors," TN 36, DEC WRL, June 1993.
[11]
D. A. Jiménez, S. W. Keckler, and C. Lin, "The impact of delay on the design of branch predictors," in Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pp. 67--76, 2000.
[12]
S.-T. Pan, K. So, and J. T. Rahmen, "Improving the accuracy of dynamic branch prediction using branch correlation," in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ACM, 1992.
[13]
I.-C. K. Chen, J. T. Coffey, and T. N. Mudge, "Analysis of branch prediction via data compression," in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, 1996.
[14]
A. Seznec, "Analysis of the o-geometric history length branch predictor," in Proceedings of the International Symposium on Computer Architecture, pp. 394--405, 2005.
[15]
A. N. Eden and T. Mudge, "The yags branch prediction scheme," in Proceedings of the international symposium on Microarchitecture, pp. 69--77, 1998.
[16]
A. Seznec and P. Michaud, "A case for (partially) tagged geometric history length branch prediction," Journal of Instruction Level Parallelism, vol. 8, pp. 1--23, 2006.
[17]
A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides, "Design tradeoffs for the alpha EV8 conditional branch predictor," in Proceedings of the International Symposium on Computer Architecture, pp. 295--306, 2002.
[18]
D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 197--206, 2001.
[19]
A. Seznec, "A new case for the tage branch predictor," in Proceedings of International Symposium on Microarchitecture, pp. 117--127, 2011.
[20]
K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark, "Improving prediction for procedure returns with return-address-stack repair mechanisms," in Proceedings of the International Symposium on Microarchitecture, pp. 259--271, 1998.
[21]
A. Moshovos, Memory dependence prediction. PhD thesis, University of Wisconsin-Madison, 1998.
[22]
R. E. Kessler, "The alpha 21264 microprocessor," IEEE Micro, vol. 19, no. 2, pp. 24--36, 1999.
[23]
G. Z. Chrysos and J. S. Emer, "Memory dependence prediction using store sets," in Proceedings of the International Symposium on Computer Architecture, pp. 142--153, 1998.
[24]
S. Subramaniam and G. H. Loh, "Store vectors for scalable memory dependence prediction and scheduling," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 65--76, 2006.
[25]
B. Sinharoy, J. Van Norstrand, R. Eickemeyer, H. le, J. Leenstra, D. Nguyen, B. Konigsburg, K. Ward, M. Brown, J. Moreira, D. Levitan, S. Tung, D. Hrusecky, J. Bishop, M. Gschwind, M. Boersma, M. Kroener, M. Kaltenbach, T. Karkhanis, and K. Fernsler, "Ibm power8 processor core microarchitecture," vol. 59, 01 2015.
[26]
K. Driesen and U. Hölzle, "The cascaded predictor: Economical and adaptive branch target prediction," in Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 249--258, 1998.
[27]
E. Rohou, B. N. Swamy, and A. Seznec, "Branch prediction and the performance of interpreters: Don't trust folklore," in Proceedings of the International Symposium on Code Generation and Optimization, pp. 103--114, 2015.
[28]
H. Kim, J. A. Joao, O. Mutlu, C. J. Lee, Y. N. Patt, and R. Cohn, "Virtual program counter (VPC) prediction: Very low cost indirect branch prediction using conditional branch prediction hardware," IEEE Transactions on Computers, vol. 58, no. 9, pp. 1153--1170, 2009.
[29]
J. A. Joao, O. Mutlu, H. Kim, R. Agarwal, and Y. N. Patt, "Improving the performance of object-oriented languages with dynamic predication of indirect jumps," in Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems, pp. 80--90, 2008.
[30]
S. Subramaniam and G. H. Loh, "Fire-and-forget: Load/store scheduling with no store queue at all," in Proceedings of the international symposium on Microarchitecture, pp. 273--284, 2006.
[31]
A. Moshovos and G. S. Sohi, "Streamlining inter-operation memory communication via data dependence prediction," in Proceedings of the International symposium on Microarchitecture, pp. 235--245, 1997.
[32]
G. S. Tyson and T. M. Austin, "Improving the accuracy and performance of memory communication through renaming," in Proceedings of the International Symposium on Microarchitecture, pp. 218--227, 1997.
[33]
T. Sha, M. M. K. Martin, and A. Roth, "NoSQ: Store-load communication without a store queue," in Proceedings of the International Symposium on Microarchitecture, pp. 285--296, IEEE Computer Society, 2006.
[34]
A. Moshovos and G. S. Sohi, "Read-after-read memory dependence prediction," in Proceedings of the International Symposium on Microarchitecture, pp. 177--185, 1999.
[35]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, pp. 1--7, Aug. 2011.
[36]
A. Seznec, "A 64-kbytes ittage indirect branch predictor," in JWAC-2: Championship Branch Prediction, 2011.
[37]
E. Tune, D. Liang, D. M. Tullsen, and B. Calder, "Dynamic prediction of critical path instructions," in Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 185--195, 2001.
[38]
E. Tune, D. M. Tullsen, and B. Calder, "Quantifying instruction criticality," in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 104--113, 2002.
[39]
B. A. Fields, S. Rubin, and R. Bodík, "Focusing processor policies via critical-path prediction," in Proceedings of the International Symposium on Computer Architecture, pp. 74--85, 2001.
[40]
R. E. Kessler, E. J. Mclellan, and D. A. Webb, "The Alpha 21264 microprocessor architecture," in Proceedings of the International Conference on Computer Design, pp. 90--95, 1998.
[41]
A. Yoaz, R. Erez, M.and Ronen, and S. Jourdan, "Speculation techniques for improving load related instruction scheduling," in Proceedings of the International Symposium on Computer Architecture, vol. 27, pp. 42--53, 1999.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
November 2018
494 pages
ISBN:9781450359863
DOI:10.1145/3243176
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • IFIP WG 10.3: IFIP WG 10.3
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

PACT '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media