skip to main content
10.1145/70082.68187acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

The fuzzy barrier: a mechanism for high speed synchronization of processors

Published: 01 April 1989 Publication History

Abstract

Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using shared variables has two major drawbacks. Firstly, the execution of the barrier may be slow as it may not only require execution of several instructions and but also result in hot-spot accesses. Secondly, processors that are stalled waiting for other processors to reach the barrier are essentially idling and cannot do any useful work. In this paper, the notion of the fuzzy barrier is presented, that avoids the above drawbacks. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by extending the barrier concept to include a region of statements that can be executed by a processor while it awaits synchronization. The barrier regions are constructed by a compiler and consist of several instructions such that a processor is ready to synchronize upon reaching the first instruction in this region and must synchronize before exiting the region. When synchronization does occur, the processors could be executing at any point in their respective barrier regions. The larger the barrier region, the more likely it is that none of the processors will have to stall. Preliminary investigations show that barrier regions can be large and the use of program transformations can significantly increase their size. Examples of situations where such a mechanism can result in improved performance are presented. Results based on a software implementation of the fuzzy barrier on the Encore multiprocessor indicate that the synchronization overhead can be greatly reduced using the mechanism.

References

[1]
P. Tang and P.C. Yew, "Processor Self-Scheduling for Multiple-Nested Parallel Loops," Proc. International Conf. on Parallel Processing, pp. 528-535, August, 1986.
[2]
R. Gupta, "Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems," Tech. Report TR-88-019, Philips Laboratoriea, Briarcliff Manor, NY, 1988.
[3]
H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Publishing Company, 1987.
[4]
P.C. Yew, N.F. Tzeng, and D.H. Lawrie, "Distributing Hot-Spot Addressing in Large Scale Multiprocessors," IEEE Trans. on Computers, vol. 0- 36, no. 4, April, 1987.
[5]
C.D. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design," IEEE Trans. on Computers, vol. 37, no. 8, pp. 991-1004, August, 1988.
[6]
J.R. Ellis, Bulldog: A Compiler for VLIW Architectures, MIT Press, 1986.
[7]
R. Gupta, "A Reconfigurable LIW Architecture and its Compiler," Dept. of Computer Science; Ph.D. dissertation, Tech. Report 87-3, University of Pittsburgh, August, 1987.
[8]
R. Gupta and M.L. Sofia, "A Reconfigurable LIW Architecture," Proc. of the International Conf. on Parallel Processing, pp. 893-900, August, 1987.
[9]
D.A. Patterson, "Reduced instruction Set Computers,'' Communications of the A CM, vol. 28, no. 1, pp. 8-21, Jan., 1985.
[10]
A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.
[11]
J. Hennessy and T. Gross, "Postpass Code Optimization of Pipeline Constraints," A CM Trans. on Programming Languages and Systems, vol. 3, no. 5, pp. 422-448, 1983.
[12]
W.C. Hsu, "Register Allocation and Code Scheduling for Load/Store Architectures," Dept. of Computer Science; Ph.D. dissertation, University of Wisconsin, Madison, 1987.
[13]
D.J Kuck, R.H. Kuhn, D.A. Padua, B. Leasure, and M. Wolfe, "Dependence Graphs and Compiler Optimizations," 8th Annual A CM Syrup. on Principles of Programming Languages, pp. 207-218, 1981.
[14]
"Multimax Technical Summary," Encore Computer Corporation, Marlboro MA, 1987.
[15]
A. Osterhaug, "Guide to Parallel Programming on Sequent Computer Systems," Sequent Computer Systems, Inc., Beaverton, Oregan, 1987.
[16]
R. Gupta and M. Epstein, "Achieving Low Cost Synchronization in a Multiprocessor System," Philips Laboratories; Tech. Note TN-88-140, Briarcliff Manor, NY, October, 1988.
[17]
R. Cytron, "Doacross: Beyond Vectorization for Multiprocessors," Proc. International Conf. on Parallel Processing, pp. 836-844, August, 1986.
[18]
C.D. Polychronopoulos, D.J. Kuck, and D.A. Padua, "Execution of Parallel Loops on Parallel Processor Systems," Proc. International Conf. on Parallel Processing, pp. 235-242, August, 1986.
[19]
C.D. Polychronopoulos and D.J. Kuck, "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers," IEEE Trans. on Computers, vol. C-36, no. 12, pp. 1425-1439, Dec., 1987.
[20]
M. Byler, J.R.B. Davies, C. Huson, B. Leasure, and M. Wolfe, "Multiple Version Loops," International Conf. on Parallel Processing, pp. 312-318, August, 1987.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
April 1989
303 pages
ISBN:0897913000
DOI:10.1145/70082
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 17, Issue 2
    Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
    April 1989
    291 pages
    ISSN:0163-5964
    DOI:10.1145/68182
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPLOS89
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)151
  • Downloads (Last 6 weeks)20
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media