skip to main content
10.1145/1854273.1854312acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Efficient sequential consistency using conditional fences

Published: 11 September 2010 Publication History

Abstract

Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models, can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program.
We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism (C-Fence) that utilizes compiler information to decide dynamically if there is a need to stall at each fence. Our experiments with SPLASH-2 benchmarks show that, with C-Fences, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (<300 bytes of on-chip-storage) and it avoids the use of speculation and its associated costs.

References

[1]
}}S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66--76, 1995.
[2]
}}W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J.-W. Lee, X. Fang, S. Midkiff, and D. Wong. BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support. In Proceedings of MICRO-42, pages 133--144, New York, NY, USA, 2009. ACM.
[3]
}}C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of ISCA-36, pages 233--244, New York, NY, USA, 2009. ACM.
[4]
}}L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of ISCA-34, pages 278--289, 2007.
[5]
}}H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In HPCA-13, pages 97--108, Washington, DC, USA, 2007. IEEE Computer Society.
[6]
}}W.-Y. Chen, A. Krishnamurthy, and K. Yelick. Polynomial-time algorithms for enforcing sequential consistency in SPMD programs with arrays. In LCPC, pages 2--4. Springer-Verlag, 2003.
[7]
}}E. W. Dijkstra. Cooperating sequential processes. The origin of concurrent programming: from semaphores to remote procedure calls, pages 65--138, 2002.
[8]
}}Y. Duan, X. Feng, L. Wang, C. Zhang, and P.-C. Yew. Detecting and eliminating potential violations of sequential consistency for concurrent C/C++ programs. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 25--34, Washington, DC, USA, 2009. IEEE Computer Society.
[9]
}}X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 285--294, New York, NY, USA, 2003. ACM.
[10]
}}K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, pages 355--364, 1991.
[11]
}}C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Washington, DC, USA, 2002. IEEE Computer Society.
[12]
}}C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In Proceedings of ISCA-26, pages 162--171, Washington, DC, USA, 1999. IEEE Computer Society.
[13]
}}L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. SIGARCH Comput. Archit. News, 32(2):102, 2004.
[14]
}}A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in titanium. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15, Washington, DC, USA, 2005. IEEE Computer Society.
[15]
}}A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, 1995.
[16]
}}A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Journal of Parallel and Distributed Computing, 38, 1996.
[17]
}}L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess progranm. IEEE Trans. Comput., 28(9):690--691, 1979.
[18]
}}J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Trans. Comput., 50(8):824--833, 2001.
[19]
}}K. Lee, X. Fang, and S. P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd international conference on Virtual execution environments, pages 180--190, New York, NY, USA, 2007. ACM.
[20]
}}S. P. Midkiff. Dependence analysis in parallel loops with i±k subscripts. In LCPC, pages 331--345, 1995.
[21]
}}S. P. Midkiff and D. A. Padua. Issues in the optimization of parallel programs. In Proceedings of the 1990 International Conference on Parallel Processing, Volume 2: Software, pages 105--113, Urbana-Champaign, IL, USA, 1990.
[22]
}}P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, page pages, 1997.
[23]
}}J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.
[24]
}}D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282--312, 1988.
[25]
}}Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent Java programs. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--13, New York, NY, USA, 2005. ACM.
[26]
}}T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of ISCA-34, pages 266--277, New York, NY, USA, 2007. ACM.
[27]
}}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of ISCA-22, pages 24--36, New York, NY, USA, 1995. ACM.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
September 2010
596 pages
ISBN:9781450301787
DOI:10.1145/1854273
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active table
  2. associates
  3. conditional fences
  4. interprocessor delay
  5. memory consistency
  6. sequential consistency

Qualifiers

  • Research-article

Conference

PACT '10
Sponsor:
  • IFIP WG 10.3
  • IEEE CS TCPP
  • SIGARCH
  • IEEE CS TCAA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media