skip to main content
10.1145/1375527.1375568acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

Published: 07 June 2008 Publication History

Abstract

Coordination and synchronization of parallel tasks is a major source of complexity in parallel programming. These constructs take many forms in practice including mutual exclusion in accesses to shared resources, termination detection of child tasks, collective barrier synchronization, and point-to-point synchronization. In this paper, we introduce phasers, a new coordination construct that unifies collective and point-to-point synchronizations. We establish two safety properties for phasers: deadlock-freedom and phase-ordering. Performance results obtained from a portable implementation of phasers on three different SMP platforms demonstrate that phasers can deliver superior performance to existing barrier implementations, in addition to the productivity benefits that result from their generality and safety properties.

References

[1]
S. Agarwal, R. Barik, D. Bonachea, V. Sarkar, R. K. Shyamasundar, and K. Yelick. Deadlock-free scheduling of x10 computations with bounded resources. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 229--240, New York, NY, USA, 2007. ACM.
[2]
P. Charles, C. Donawa, K. Ebcioglu, C. Grothoff, A. Kielstra, C. von Praun, V. Saraswat, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA 2005 Onward! Track, 2005.
[3]
S. Deitz. Parallel programming in chapel. http://www.cct.lsu.edu/ estrabd/LACSI2006/Programming%20Models/deitz.pdf, 2006.
[4]
B. Goetz. Java Concurrency In Practice. Addison-Wesley, 2007.
[5]
R. Gupta. The fuzzy barrier: a mechanism for high speed synchronization of processors. In ASPLOS-III: Proceedings of the third international conference on Architectural support for programming languages and operating systems, pages 54--63, New York, NY, USA, 1989. ACM.
[6]
Habanero multicore software research project web page. http://habanero.rice.edu, 2008.
[7]
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 289--300, New York, NY, USA, 1993. ACM Press.
[8]
P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium Language Reference Manual. Technical Report CSD-01-1163, University of California at Berkeley, Berkeley, Ca, USA, 2001.
[9]
The Java Grande Forum benchmark suite. http://www.epcc.ed.ac.uk/javagrande/javag.html.
[10]
OpenMP specifications. http://www.openmp.org/blog/specifications/.
[11]
C. F. J. C. K. R. D. Blumofe, C. E. Leiserson, K. H. Randall, and Y. Zhou. CILK: An efficient multithreaded runtime system. Proceedings of Symposium on Principles and Practice of Parallel Programming (PPoPP'95), pages 207--216, July 1995.
[12]
V. Sarkar. Synchronization Using Counting Semaphores. Proceedings of the ACM 1988 International Conference on Supercomputing, pages 627--637, July 1988.
[13]
J. Shirako, H. Kasahara, and V. Sarkar. Language extensions in support of compiler parallelization. In The 20th International Workshop on Languages and Compilers for Parallel Computing (LCPC'07), 2007.
[14]
L. A. Smith and J. M. Bull. A multithreaded java grande benchmark suite. In Proceedings of the Third Workshop on Java for High Performance Computing, June 2001.
[15]
Release 1.5 of x10 system dated 2007-06-29. http://sourceforge.net/project/showfiles.php?group_id=181722&package_id=210532&release_id=519811, 2007.
[16]
K. Yelick, D. Bonachea, W.-Y. Chen, P. Colella, K. Datta, J. Duell, S. L. Graham, P. Hargrove, P. Hilfinger, P. Husbands, C. Iancu, A. Kamil, R. Nishtala, J. Su, M. Welcome, and T. Wen. Productivity and performance using partitioned global address space languages. In PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computation, pages 24--32, New York, NY, USA, 2007. ACM.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
June 2008
390 pages
ISBN:9781605581583
DOI:10.1145/1375527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. barriers
  2. semaphores

Qualifiers

  • Research-article

Conference

ICS08
Sponsor:
ICS08: International Conference on Supercomputing
June 7 - 12, 2008
Island of Kos, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media