skip to main content
10.1145/2628071.2628073acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs

Published: 24 August 2014 Publication History

Abstract

Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential applications running on a multicore system. In contrast to multiple independent applications, a single parallel application running on a multicore system exhibits different behavior. In case of a parallel application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication.
We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.

References

[1]
http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25.80-Proce%ssors2-epub/HC25.27.820-Haswell-Hammarlund-Intel.pdf.
[2]
Intel 64 and ia32 architecture software developer's manuals. http://www.intel.com/products/processor/manuals/.
[3]
Spec benchmark suite. http://www.spec.org/cpu2006/.
[4]
N. Barrow Williams, C. Fensch, and S. Moore. A communication characterisation of splash-2 and parsec. In IISWC, 2009.
[5]
A. Bhattacharjee and M. Martonosi. Inter-core cooperative tlb for chip multiprocessors. In ASPLOS, 2010.
[6]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: characterization and architectural implications. In PACT, 2008.
[7]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2), Aug. 2011.
[8]
C. Kaynak, B. Grot, and B. Falsafi. Shift: Shared history instruction fetch for lean-core server processors. In MICRO, 2013.
[9]
D. Koufaty, X. Chen, D. K. Poulsen, and J. Torrellas. Data forwarding in scalable shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst., 7(12), 1996.
[10]
R. B. N. Muralimanohar and N. P. Jouppi. Cacti 6.0: A tool to understand large caches. technical report, university of utah and hewlett packard laboratories. 2007.
[11]
B. Panda and S. Balachandran. Hardware prefetchers for emerging parallel applications. In PACT, 2012.
[12]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In PACT, 2012.
[13]
S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi. Spatio-temporal memory streaming. In ISCA, 2009.
[14]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Spatial memory streaming. In ISCA, 2006.
[15]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA, 2007.
[16]
M. A. Suleman, O. Mutlu, J. A. Joao, Khubaib, and Y. N. Patt. Data marshaling for multi-core architectures. In ISCA, 2010.
[17]
T. F. Wenisch. Temporal memory streaming. phd thesis, carnegie mellon university. 2007.
[18]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos. Practical off-chip meta-data for temporal memory streaming. In HPCA, 2009.
[19]
T. F. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi. Temporal streaming of shared memory. In ISCA, 2005.
[20]
T. F. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, C. Gniady, A. Ailamaki, and B. Falsafi. Store-ordered streaming of shared memory. In PACT, 2005.
[21]
X. Zhuang and H. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP, 2003.

Cited By

View all

Index Terms

  1. XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
    August 2014
    514 pages
    ISBN:9781450328098
    DOI:10.1145/2628071
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multi-core
    2. prefetching

    Qualifiers

    • Research-article

    Conference

    PACT '14
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;
    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media