research-article

Transactional prefetching: narrowing the window of contention in hardware transactional memory

Authors:

Adrià Armejach,

Adrián Cristal,

Osman S. Unsal,

Per StenstromAuthors Info & Claims

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 181 - 190

https://doi.org/10.1145/2370816.2370844

Published: 19 September 2012 Publication History

Abstract

Memory access latency is the primary performance bottleneck in modern computer systems. Prefetching data before it is needed by a processing core allows substantial performance gains by overlapping significant portions of memory latency with useful work. Prior work has investigated this technique and measured potential benefits in a variety of scenarios. However, its use in speeding up Hardware Transactional Memory (HTM) has remained hitherto unexplored. In several HTM designs transactions invalidate speculatively updated cache lines when they abort. Such cache lines tend to have high locality and are likely to be accessed again when the transaction re-executes. Coarse grained transactions that update several cache lines are particularly susceptible to performance degradation even under moderate contention. However, such transactions show strong locality of reference, especially when contention is high. Prefetching cache lines with high locality can, therefore, improve overall concurrency by speeding up transactions and, thereby, narrowing the window of time in which such transactions persist and can cause contention. Such transactions are important since they are likely to form a common TM use-case. We note that traditional prefetch techniques may not be able to track such lines adequately or issue prefetches quickly enough. This paper investigates the use of prefetching in HTMs, proposing a simple design to identify and request prefetch candidates, and measures performance gains to be had for several representative TM workloads.

References

[1]

A. Armejach, A. Seyedi, R. Titos-Gil, I. Hur, O. S. Unsal, A. Cristal, and M. Valero. Using a reconfigurable L1 data cache for efficient version management in hardware transactional memory. In PACT '11: Proc. 20th International Conference on Parallel Architectures and Compilation Techniques, 2011.

Digital Library

[2]

N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi, and S. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 2006.

Digital Library

[3]

B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13:422--426, July 1970.

Digital Library

[4]

J. Bobba, K. E. Moore, L. Yen, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. Performance pathologies in hardware transactional memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007.

Digital Library

[5]

C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of The IEEE International Symposium on Workload Characterization, Sep 2008.

[6]

H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, 2007.

Digital Library

[7]

Y. Chou. Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007, pages 301--313, dec. 2007.

Digital Library

[8]

J. Dundas and T. Mudge. Improving data cache performance by pre-executing instructions under a cache miss. In In Proceedings of the 1997 International Conference on Supercomputing, pages 68--75, 1997.

Digital Library

[9]

L. Hammond, B. D. Carlstrom, V. Wong, M. Chen, C. Kozyrakis, and K. Olukotun. Transactional coherence and consistency: Simplifying parallel hardware and software. IEEE Micro, 24(6), Nov-Dec 2004.

Digital Library

[10]

L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proceedings of the 31st International Symposium on Computer Architecture, June 2004.

Digital Library

[11]

M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, May 1993.

Digital Library

[12]

N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th annual International Symposium on Computer Architecture, ISCA '90, pages 364--373, 1990.

Digital Library

[13]

J. R. Larus and R. Rajwar. Transactional Memory. Morgan & Claypool, 2006.

[14]

M. Lupon, G. Magklis, and A. Gonzalez. FAS™: A log-based hardware transactional memory with fast abort recovery. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Sept. 2009.

Digital Library

[15]

M. Lupon, G. Magklis, and A. González. A dynamically adaptable hardware transactional memory. In In Proc. of 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Dec 2010.

Digital Library

[16]

O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In In HPCA-9, pages 129--140, 2003.

Digital Library

[17]

A. Negi, R. Titos-Gil, M. E. Acacio, J. M. Garcia, and P. Stenstrom. Eager Meets Lazy: The impact of write-buffering on hardware transactional memory. International Conference on Parallel Processing (ICPP), pages 73--82, 2011.

Digital Library

[18]

A. Negi, R. Titos-Gil, M. E. Acacio, J. M. Garcia, and P. Stenstrom. π-™: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory (poster). In Parallel Architectures and Compilation Techniques (PACT) 2011, Oct. 2011.

Digital Library

[19]

A. Negi, M. Waliullah, and P. Stenstrom. LV*: A low complexity lazy versioning H™ infrastructure. In Proc. of the Intl. Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2010), pages 231--240, July 2010.

[20]

S. H. Pugsley, M. Awasthi, N. Madan, N. Muralimanohar, and R. Balasubramonian. Scalable and reliable communication for hardware transactional memory. In PACT '08: Proc. 17th international conference on Parallel architectures and compilation techniques, pages 144--154, Oct. 2008.

Digital Library

[21]

X. Qian, W. Ahn, and J. Torrellas. Scalablebulk: Scalable cache coherence for atomic blocks in a lazy environment. In In Proc. of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, pages 447--458, 2010.

Digital Library

[22]

D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing signatures for transactional memory. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.

Digital Library

[23]

T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. In Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 42--53, 2000.

Digital Library

[24]

R. Titos-Gil, A. Negi, M. E. Acacio, J. M. García, and P. Stenstrom. ZEBRA: a data-centric, hybrid-policy hardware transactional memory design. In Proceedings of the international conference on Supercomputing, ICS '11, 2011.

Digital Library

[25]

S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, and M. Valero. EazyHTM: eager-lazy hardware transactional memory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009.

Digital Library

[26]

L. Yen, J. Bobba, M. M. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, Feb. 2007.

Digital Library

Cited By

Shimchenko MTitos-Gil RFernández-Pascual RAcacio MKaxiras SRos AJimborean A(2021)Analysing software prefetching opportunities in hardware transactional memoryThe Journal of Supercomputing10.1007/s11227-021-03897-zOnline publication date: 2-Jun-2021
https://doi.org/10.1007/s11227-021-03897-z
Atoofian E(2014)Adaptive Snoop Granularity and Transactional Snoop Filtering in Hardware Transactional MemoryCanadian Journal of Electrical and Computer Engineering10.1109/CJECE.2014.231221737:2(76-85)Online publication date: Sep-2015
https://doi.org/10.1109/CJECE.2014.2312217
Armejach ATitos-Gil RNegi AUnsal OCristal A(2013)Techniques to improve performance in requester-wins hardware transactional memoryACM Transactions on Architecture and Code Optimization10.1145/2541228.255529910:4(1-25)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2541228.2555299

Index Terms

Transactional prefetching: narrowing the window of contention in hardware transactional memory
1. Computer systems organization
  1. Architectures
    1. Parallel architectures

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

September 2012

512 pages

ISBN:9781450311823

DOI:10.1145/2370816

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '12

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '12: International Conference on Parallel Architectures and Compilation Techniques

September 19 - 23, 2012

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
212
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shimchenko MTitos-Gil RFernández-Pascual RAcacio MKaxiras SRos AJimborean A(2021)Analysing software prefetching opportunities in hardware transactional memoryThe Journal of Supercomputing10.1007/s11227-021-03897-zOnline publication date: 2-Jun-2021
https://doi.org/10.1007/s11227-021-03897-z
Atoofian E(2014)Adaptive Snoop Granularity and Transactional Snoop Filtering in Hardware Transactional MemoryCanadian Journal of Electrical and Computer Engineering10.1109/CJECE.2014.231221737:2(76-85)Online publication date: Sep-2015
https://doi.org/10.1109/CJECE.2014.2312217
Armejach ATitos-Gil RNegi AUnsal OCristal A(2013)Techniques to improve performance in requester-wins hardware transactional memoryACM Transactions on Architecture and Code Optimization10.1145/2541228.255529910:4(1-25)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2541228.2555299

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents