skip to main content
research-article

Randomized Cache Placement for Eliminating Conflicts

Published: 01 February 1999 Publication History

Abstract

Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficientparallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of Instructions committed Per Cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.

References

[1]
Semiconductor Industry Assoc., “The National Technology Roadmap for Semiconductors,” 1994.
[2]
J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan-Kauffman 1996.
[3]
M. Lam E. Rothberg, and M. Wolf, “The Cache Performance and Optimization of Blocked Algorithms,” Proc. ASPLOS IV, pp. 63-74, 1991.
[4]
S. Gosh M. Martonosi, and S. Malik, “Cache Miss Equations: An Analytic Representation of Cache Misses,” Proc. Int'l Conf. Supercomputing, Vienna, Austria, pp. 317-342, July 1997.
[5]
A. Srivastava and A. Eustace, “ATOM: A System for Building Customized Program Analysis Tools,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, 1994.
[6]
A. Agarwal and S. Pudar, “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. Int'l Symp. Computer Architecture, pp. 179-190, 1993.
[7]
N. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. Int'l Symp. Computer Architecture, pp. 364-373, 1990.
[8]
A. Seznec, “A Case for Two-Way Skewed Associative Caches,” Proc. Int'l Symp. on Computer Architecture, pp. 169-173, 1993.
[9]
A. Seznec F. Bodin, “Skewed-Associative Caches,” Proc. Int'l Conf. Parallel Architectures and Languages (PARLE), pp. 305-316, 1993.
[10]
A. González M. Valero N. Topham, and J. Parcerisa, “Eliminating Cache Conflict Misses Through XOR-Based Placement Functions,” Proc. Int'l Conf. Supercomputing, Vienna, Austria, pp. 76-83, July 1997.
[11]
B. Bershad D. Lee T. Romer, and J. Chen, “Avoiding Cache Conflict Misses Dynamically in Large Direct-Mapped Caches,” Proc. ASPLOS VI, pp. 158-170, 1994.
[12]
D. Lawrie and C. Vora, “The Prime Memory System for Array Access,” IEEE Trans. Computers, vol. 31, no. 5, pp. 435-442, May 1982.
[13]
D.T. Harper III and J.R. Jump, “Vector Access Performance in Parallel Memories,” IEEE Trans. Computers, vol. 36, no. 12, pp. 1,440-1,449, Dec. 1987.
[14]
G. Sohi, “Logical Data Skewing Schemes for Interleaved Memories in Vector Processors,” Technical Report 753, Univ. of Wisconsin-Madison, 1988.
[15]
J. Frailong W. Jalby, and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in Parallel Memories,” Proc. Int'l Conf. Parallel Processing, pp. 276-283, 1985.
[16]
R. Raghavan and J. Hayes, “On Randomly Interleaved Memories,” Proc. Supercomputing '90, pp. 49-58, 1990.
[17]
B. Rau M. Schlansker, and D. Yen, “The Cydra 5 Stride-Insensitive Memory System,” Proc. Int'l Conf. Parallel Processing, pp. 242-246, 1989.
[18]
B. Rau, “Pseudo-Randomly Interleaved Memories,” Proc. Int'l Symp. Computer Architecture, pp. 74-83, 1991.
[19]
IBM, IBM 3033 Processor Complex: Theory of Operations Manual—Processor Storage Control Function, vol. 4, 1978.
[20]
Amdahl Corp., 470V/6 Machine Reference Manual, 1976.
[21]
D.A. Fotland et al., “Hardware Design of the First HP Precision Architecture Computers,” Hewlett-Packard J., vol. 38, pp. 4-17, Mar. 1987.
[22]
A. Smith, “Cache Memories,” ACM Computing Surveys, vol. 14, pp. 473-530, Sept. 1982.
[23]
A. Agarwal, Analysis of Cache Performance for Operating Systems and Multiprogramming, pp. 120-122. Kluwer Academic, 1989.
[24]
N. Topham A. González, and J. González, “The Design and Performance of a Conflict-Avoiding Cache,” Proc. Int'l Symp. Microarchitecture, pp. 71-80, Dec. 1997.
[25]
K. Olukotun B. Nayfeh L. Hammond K. Wilson, and K. Chang, “The Case for a Single-Chip Multiprocessor,” Proc. ASPLOS-VII, Oct. 1996.
[26]
W.-H. Wang J.-L. Baer, and H. Levy, “Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy,” Proc. Int'l Symp. Computer Architecture, 1989.
[27]
J. González and A. González, “Speculative Execution Via Address Prediction and Data Prefetching,” Proc. Int'l Conf. Supercomputing, Vienna, Austria, pp. 196-203, July 1997.
[28]
M. Golden and T. Mudge, “Hardware Support for Hiding Cache Latency,” Technical Report CSE-TR-152-93, Univ. of Michigan, 1993.
[29]
T. Austin D. Pnevmastikatos, and G. Sohi, “Streamlining Data Cache Access with Fast Address Calculation,” Proc. Int'l Symp. Computer Architecture, pp. 369-380, 1995.
[30]
T. Austin G. Sohi, “Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency,” Proc. Int'l Symp. Microarchitecture, pp. 82-92, 1995.
[31]
J. González and A. González, “Memory Address Prediction for Data Speculation,” Proc. EUROPAR '97, pp. 1,084-1,091, 1997.
[32]
Y. Sazeides S. Vassiliadis, and J. Smith, “The Performance Potential of Data Dependence Speculation and Collapsing,” Proc. Int'l Symp. Microarchitecture, pp. 238-257, Dec. 1996.
[33]
D. Kroft, “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. Int'l Symp. Computer Architecture, pp. 81-87, 1981.
[34]
M. Franklin and G. Sohi, “ARB: A Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, vol. 45, no. 5, pp. 552-571, May 1996.
[35]
D. Hunt, “Advanced Performance Features of the 64-bit PA-8000,” Proc. CompCon '95, pp. 123-128, 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 48, Issue 2
Special issue on cache memory and related problems
February 1999
168 pages
ISSN:0018-9340
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 February 1999

Author Tags

  1. Conflict avoidance
  2. cache architectures
  3. performance evaluation.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media