skip to main content
10.1145/2370816.2370844acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Transactional prefetching: narrowing the window of contention in hardware transactional memory

Published: 19 September 2012 Publication History

Abstract

Memory access latency is the primary performance bottleneck in modern computer systems. Prefetching data before it is needed by a processing core allows substantial performance gains by overlapping significant portions of memory latency with useful work. Prior work has investigated this technique and measured potential benefits in a variety of scenarios. However, its use in speeding up Hardware Transactional Memory (HTM) has remained hitherto unexplored. In several HTM designs transactions invalidate speculatively updated cache lines when they abort. Such cache lines tend to have high locality and are likely to be accessed again when the transaction re-executes. Coarse grained transactions that update several cache lines are particularly susceptible to performance degradation even under moderate contention. However, such transactions show strong locality of reference, especially when contention is high. Prefetching cache lines with high locality can, therefore, improve overall concurrency by speeding up transactions and, thereby, narrowing the window of time in which such transactions persist and can cause contention. Such transactions are important since they are likely to form a common TM use-case. We note that traditional prefetch techniques may not be able to track such lines adequately or issue prefetches quickly enough. This paper investigates the use of prefetching in HTMs, proposing a simple design to identify and request prefetch candidates, and measures performance gains to be had for several representative TM workloads.

References

[1]
A. Armejach, A. Seyedi, R. Titos-Gil, I. Hur, O. S. Unsal, A. Cristal, and M. Valero. Using a reconfigurable L1 data cache for efficient version management in hardware transactional memory. In PACT '11: Proc. 20th International Conference on Parallel Architectures and Compilation Techniques, 2011.
[2]
N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi, and S. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 2006.
[3]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13:422--426, July 1970.
[4]
J. Bobba, K. E. Moore, L. Yen, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. Performance pathologies in hardware transactional memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007.
[5]
C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of The IEEE International Symposium on Workload Characterization, Sep 2008.
[6]
H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, 2007.
[7]
Y. Chou. Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007, pages 301--313, dec. 2007.
[8]
J. Dundas and T. Mudge. Improving data cache performance by pre-executing instructions under a cache miss. In In Proceedings of the 1997 International Conference on Supercomputing, pages 68--75, 1997.
[9]
L. Hammond, B. D. Carlstrom, V. Wong, M. Chen, C. Kozyrakis, and K. Olukotun. Transactional coherence and consistency: Simplifying parallel hardware and software. IEEE Micro, 24(6), Nov-Dec 2004.
[10]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proceedings of the 31st International Symposium on Computer Architecture, June 2004.
[11]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, May 1993.
[12]
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th annual International Symposium on Computer Architecture, ISCA '90, pages 364--373, 1990.
[13]
J. R. Larus and R. Rajwar. Transactional Memory. Morgan & Claypool, 2006.
[14]
M. Lupon, G. Magklis, and A. Gonzalez. FAS™: A log-based hardware transactional memory with fast abort recovery. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Sept. 2009.
[15]
M. Lupon, G. Magklis, and A. González. A dynamically adaptable hardware transactional memory. In In Proc. of 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Dec 2010.
[16]
O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In In HPCA-9, pages 129--140, 2003.
[17]
A. Negi, R. Titos-Gil, M. E. Acacio, J. M. Garcia, and P. Stenstrom. Eager Meets Lazy: The impact of write-buffering on hardware transactional memory. International Conference on Parallel Processing (ICPP), pages 73--82, 2011.
[18]
A. Negi, R. Titos-Gil, M. E. Acacio, J. M. Garcia, and P. Stenstrom. π-™: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory (poster). In Parallel Architectures and Compilation Techniques (PACT) 2011, Oct. 2011.
[19]
A. Negi, M. Waliullah, and P. Stenstrom. LV*: A low complexity lazy versioning H™ infrastructure. In Proc. of the Intl. Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 2010), pages 231--240, July 2010.
[20]
S. H. Pugsley, M. Awasthi, N. Madan, N. Muralimanohar, and R. Balasubramonian. Scalable and reliable communication for hardware transactional memory. In PACT '08: Proc. 17th international conference on Parallel architectures and compilation techniques, pages 144--154, Oct. 2008.
[21]
X. Qian, W. Ahn, and J. Torrellas. Scalablebulk: Scalable cache coherence for atomic blocks in a lazy environment. In In Proc. of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, pages 447--458, 2010.
[22]
D. Sanchez, L. Yen, M. D. Hill, and K. Sankaralingam. Implementing signatures for transactional memory. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.
[23]
T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. In Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 42--53, 2000.
[24]
R. Titos-Gil, A. Negi, M. E. Acacio, J. M. García, and P. Stenstrom. ZEBRA: a data-centric, hybrid-policy hardware transactional memory design. In Proceedings of the international conference on Supercomputing, ICS '11, 2011.
[25]
S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, and M. Valero. EazyHTM: eager-lazy hardware transactional memory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009.
[26]
L. Yen, J. Bobba, M. M. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, Feb. 2007.

Cited By

View all

Index Terms

  1. Transactional prefetching: narrowing the window of contention in hardware transactional memory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
    September 2012
    512 pages
    ISBN:9781450311823
    DOI:10.1145/2370816
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 September 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hardware transactional memory
    2. multicores
    3. prefetching

    Qualifiers

    • Research-article

    Conference

    PACT '12
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media