research-article

Randomized Cache Placement for Eliminating Conflicts

Authors:

Antonio GonzálezAuthors Info & Claims

IEEE Transactions on Computers, Volume 48, Issue 2

Pages 185 - 192

https://doi.org/10.1109/12.752660

Published: 01 February 1999 Publication History

Abstract

Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficientparallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of Instructions committed Per Cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.

References

[1]

Semiconductor Industry Assoc., “The National Technology Roadmap for Semiconductors,” 1994.

[2]

J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan-Kauffman 1996.

Digital Library

[3]

M. Lam E. Rothberg, and M. Wolf, “The Cache Performance and Optimization of Blocked Algorithms,” Proc. ASPLOS IV, pp. 63-74, 1991.

Digital Library

[4]

S. Gosh M. Martonosi, and S. Malik, “Cache Miss Equations: An Analytic Representation of Cache Misses,” Proc. Int'l Conf. Supercomputing, Vienna, Austria, pp. 317-342, July 1997.

Digital Library

[5]

A. Srivastava and A. Eustace, “ATOM: A System for Building Customized Program Analysis Tools,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, 1994.

Digital Library

[6]

A. Agarwal and S. Pudar, “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. Int'l Symp. Computer Architecture, pp. 179-190, 1993.

Digital Library

[7]

N. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. Int'l Symp. Computer Architecture, pp. 364-373, 1990.

Digital Library

[8]

A. Seznec, “A Case for Two-Way Skewed Associative Caches,” Proc. Int'l Symp. on Computer Architecture, pp. 169-173, 1993.

Digital Library

[9]

A. Seznec F. Bodin, “Skewed-Associative Caches,” Proc. Int'l Conf. Parallel Architectures and Languages (PARLE), pp. 305-316, 1993.

Digital Library

[10]

A. González M. Valero N. Topham, and J. Parcerisa, “Eliminating Cache Conflict Misses Through XOR-Based Placement Functions,” Proc. Int'l Conf. Supercomputing, Vienna, Austria, pp. 76-83, July 1997.

Digital Library

[11]

B. Bershad D. Lee T. Romer, and J. Chen, “Avoiding Cache Conflict Misses Dynamically in Large Direct-Mapped Caches,” Proc. ASPLOS VI, pp. 158-170, 1994.

Digital Library

[12]

D. Lawrie and C. Vora, “The Prime Memory System for Array Access,” IEEE Trans. Computers, vol. 31, no. 5, pp. 435-442, May 1982.

Digital Library

[13]

D.T. Harper III and J.R. Jump, “Vector Access Performance in Parallel Memories,” IEEE Trans. Computers, vol. 36, no. 12, pp. 1,440-1,449, Dec. 1987.

Digital Library

[14]

G. Sohi, “Logical Data Skewing Schemes for Interleaved Memories in Vector Processors,” Technical Report 753, Univ. of Wisconsin-Madison, 1988.

[15]

J. Frailong W. Jalby, and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in Parallel Memories,” Proc. Int'l Conf. Parallel Processing, pp. 276-283, 1985.

[16]

R. Raghavan and J. Hayes, “On Randomly Interleaved Memories,” Proc. Supercomputing '90, pp. 49-58, 1990.

Digital Library

[17]

B. Rau M. Schlansker, and D. Yen, “The Cydra 5 Stride-Insensitive Memory System,” Proc. Int'l Conf. Parallel Processing, pp. 242-246, 1989.

[18]

B. Rau, “Pseudo-Randomly Interleaved Memories,” Proc. Int'l Symp. Computer Architecture, pp. 74-83, 1991.

Digital Library

[19]

IBM, IBM 3033 Processor Complex: Theory of Operations Manual—Processor Storage Control Function, vol. 4, 1978.

[20]

Amdahl Corp., 470V/6 Machine Reference Manual, 1976.

[21]

D.A. Fotland et al., “Hardware Design of the First HP Precision Architecture Computers,” Hewlett-Packard J., vol. 38, pp. 4-17, Mar. 1987.

[22]

A. Smith, “Cache Memories,” ACM Computing Surveys, vol. 14, pp. 473-530, Sept. 1982.

Digital Library

[23]

A. Agarwal, Analysis of Cache Performance for Operating Systems and Multiprogramming, pp. 120-122. Kluwer Academic, 1989.

Digital Library

[24]

N. Topham A. González, and J. González, “The Design and Performance of a Conflict-Avoiding Cache,” Proc. Int'l Symp. Microarchitecture, pp. 71-80, Dec. 1997.

Digital Library

[25]

K. Olukotun B. Nayfeh L. Hammond K. Wilson, and K. Chang, “The Case for a Single-Chip Multiprocessor,” Proc. ASPLOS-VII, Oct. 1996.

Digital Library

[26]

W.-H. Wang J.-L. Baer, and H. Levy, “Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy,” Proc. Int'l Symp. Computer Architecture, 1989.

Digital Library

[27]

J. González and A. González, “Speculative Execution Via Address Prediction and Data Prefetching,” Proc. Int'l Conf. Supercomputing, Vienna, Austria, pp. 196-203, July 1997.

Digital Library

[28]

M. Golden and T. Mudge, “Hardware Support for Hiding Cache Latency,” Technical Report CSE-TR-152-93, Univ. of Michigan, 1993.

[29]

T. Austin D. Pnevmastikatos, and G. Sohi, “Streamlining Data Cache Access with Fast Address Calculation,” Proc. Int'l Symp. Computer Architecture, pp. 369-380, 1995.

Digital Library

[30]

T. Austin G. Sohi, “Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency,” Proc. Int'l Symp. Microarchitecture, pp. 82-92, 1995.

Digital Library

[31]

J. González and A. González, “Memory Address Prediction for Data Speculation,” Proc. EUROPAR '97, pp. 1,084-1,091, 1997.

Digital Library

[32]

Y. Sazeides S. Vassiliadis, and J. Smith, “The Performance Potential of Data Dependence Speculation and Collapsing,” Proc. Int'l Symp. Microarchitecture, pp. 238-257, Dec. 1996.

Digital Library

[33]

D. Kroft, “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. Int'l Symp. Computer Architecture, pp. 81-87, 1981.

Digital Library

[34]

M. Franklin and G. Sohi, “ARB: A Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, vol. 45, no. 5, pp. 552-571, May 1996.

Digital Library

[35]

D. Hunt, “Advanced Performance Features of the 64-bit PA-8000,” Proc. CompCon '95, pp. 123-128, 1995.

Digital Library

Cited By

Bender MDas RFarach-Colton MTagliavini GAgrawal KShun J(2023)An Associativity Threshold Phenomenon in Set-Associative CachesProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591084(117-127)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591084
Cazorla FKosmidis LMezzetti EHernandez CAbella JVardanega T(2019)Probabilistic Worst-Case Timing AnalysisACM Computing Surveys10.1145/330128352:1(1-35)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3301283
Chen CBeltrame G(2017)An Adaptive Markov Model for the Timing Analysis of Probabilistic CachesACM Transactions on Design Automation of Electronic Systems10.1145/312387723:1(1-24)Online publication date: 31-Aug-2017
https://dl.acm.org/doi/10.1145/3123877
Show More Cited By

Index Terms

Randomized Cache Placement for Eliminating Conflicts

Recommendations

Design and performance evaluation of a cache assist to implement selective caching
ICCD '97: Proceedings of the 1997 International Conference on Computer Design (ICCD '97)

Conventional cache architectures exploit locality, but do so rather blindly. By forcing all references through a single structure, the cache's effectiveness on many references is reduced. This paper presents a cache assist namely the annex cache which ...
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags

This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memory of microprocessors. The main idea of the technique is Caching Address Tags, or CAT cache, for short. The CAT cache exploits locality property that ...
Performance of One's Complement Caches

On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 48, Issue 2

Special issue on cache memory and related problems

February 1999

168 pages

ISSN:0018-9340

Editors:
Jean-Luc Gaudiot
Univ. of Southern California
,
Veljko Milutinovic
Univ. of Belgrade, Serbia, Yugoslavia
,
Mateo Valero
Univ. Politecnica de Catalunya, Barcelona, Spain

Issue’s Table of Contents

Copyright © Copyright © 1999 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 February 1999

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bender MDas RFarach-Colton MTagliavini GAgrawal KShun J(2023)An Associativity Threshold Phenomenon in Set-Associative CachesProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591084(117-127)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591084
Cazorla FKosmidis LMezzetti EHernandez CAbella JVardanega T(2019)Probabilistic Worst-Case Timing AnalysisACM Computing Surveys10.1145/330128352:1(1-35)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3301283
Chen CBeltrame G(2017)An Adaptive Markov Model for the Timing Analysis of Probabilistic CachesACM Transactions on Design Automation of Electronic Systems10.1145/312387723:1(1-24)Online publication date: 31-Aug-2017
https://dl.acm.org/doi/10.1145/3123877
Suh GDevadas SRudolph L(2014)Analytical cache models with applications to cache partitioningACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667181(323-334)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2667181
Kosmidis LAbella JQuiñones ECazorla FMacii E(2013)A cache design for probabilistically analysable real-time systemsProceedings of the Conference on Design, Automation and Test in Europe10.5555/2485288.2485416(513-518)Online publication date: 18-Mar-2013
https://dl.acm.org/doi/10.5555/2485288.2485416
Valamehr JChase MKamara SPutnam AShumow DVaikuntanathan VSherwood TLu STorrellas J(2012)Inspection resistant memoryProceedings of the 39th Annual International Symposium on Computer Architecture10.5555/2337159.2337174(130-141)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.5555/2337159.2337174
Valamehr JChase MKamara SPutnam AShumow DVaikuntanathan VSherwood T(2012)Inspection resistant memoryACM SIGARCH Computer Architecture News10.1145/2366231.233717440:3(130-141)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.1145/2366231.2337174
Alisafaee M(2012)Spatiotemporal Coherence TrackingProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.39(341-350)Online publication date: 1-Dec-2012
https://dl.acm.org/doi/10.1109/MICRO.2012.39
Sarkar STullsen D(2011)Data layout for cache performance on a multithreaded architectureTransactions on high-performance embedded architectures and compilers III10.5555/1980776.1980780(43-68)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.5555/1980776.1980780
Sarkar STullsen D(2011)Data Layout for Cache Performance on a Multithreaded ArchitectureProceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 659010.1007/978-3-642-19448-1_3(43-68)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1007/978-3-642-19448-1_3
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents