skip to main content
10.1145/263764.263771acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free access

Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

Published: 21 June 1997 Publication History

Abstract

Loop tiling and communication optimization, such as message pipelining and aggregation, can achieve optimized and robust memory performance by proactively managing storage and data movement. In this paper, we generalize these techniques to pointer-based data structures (PBDSs). Our approach, dynamic pointer alignment (DPA), has two components. The compiler decomposes a program into non-blocking threads that operate on specific pointers and labels thread creation sites with their corresponding pointers. At runtime, an explicit mapping from pointers to dependent threads is updated at thread creation and is used to dynamically schedule both threads and communication, such that threads using the same objects execute together, communication overlaps with local work, and messages are aggregated. We have implemented DPA to optimize remote reads to global PBDSs on parallel machines. Our empirical results on the force computation phases of two applications that use sophisticated PBDSs, Barnes-Hut and FMM, show that DPA achieves good absolute performance and speedups by enabling tiling and communication optimization on the CRAY T3D.

References

[1]
Jennifer M. Anderson and Monica S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 1993.
[2]
Thomas Ball and Susan Horwitz. Constructing control flow from control dependence. Technical Report 1091, University of Wisconsin-Madison, 1992.
[3]
Martin C. Carlisle and Anne Rogers. Software caching and computation migration in olden. In Proceedings of Fifth Symposium on Principles and Practice of Parallel Programming, 1995.
[4]
Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS- VI), 1994.
[5]
K. Mani Chandy and Carl Kesselman. Compositional C++: Compositional parallel programming. In Proceedings of the Fifth Workshop on Compilers and Languages for Parallel Computing, New Haven, Connecticut, 1992. YALEU/DCS/RR-915, Springer-Verlag Lecture Notes in Computer Science, 1993.
[6]
A. A. Chien, U. S. Reddy, J. Plevyak, and J. Dolby. ICC++ - a C++ dialect for high performance parallel computing. In Proceedings of the 2nd International Symposium on Object Technologies for Advanced Software. Springer-Verlag, LNCS 742, 1996.
[7]
Andrew Chien, Julian Dolby, Bishwaroop Ganguly, Vijay Karamcheti, and Xingbin Zhang. Supporting high level programming with high performance: The Illinois Concert system. In Proceedings of the Second International Workshop on High-level Parallel Programming Models and Supportive Environments, April 1997.
[8]
Andrew Chien and Uday Reddy. ICC++ language definition. Concurrent Systems Architecture Group Memo, Also available from http://www-csag.cs.uiuc.edu/, February 1995.
[9]
Cray Research, Inc. Cray T3D System Architecture Overview, March 1993.
[10]
David Culler, Anurag Sah, Klaus Erik Schauser, Thorsten von Eicken, and John Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 164-75, 1991.
[11]
David E. Culler. Managing Parallelism and Resources in Scientific Dataflow Programs. PhD thesis, Massachusetts Institute of Technology, Laboratory of Computer Science, Cambridge, MA, June 1989.
[12]
R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and E Zadeck. An efficient method of computing static single assignment form and the control dependence graph. A CM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991.
[13]
Julian Dolby. Automatic inline allocation of objects. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and implementation, June 1997.
[14]
B. Falsafi, A. Lebeck, S. Reinhardt, I. Schoinas, M. Hill, J. Larus, A. Rogers, and D. Wood. Application-specific protocols for user-level shared memory. In Proceedings of Supercomputing '94, Washington, D.C., 1994.
[15]
Jeanne Fen'ante, Karl J. Ottenstein, and Joe D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319-49, July 1987.
[16]
High Performance Fortran Forum. High performance Fortran language specification version 1.0. Technical Report CRPC- TR92225, Rice University, January 1993.
[17]
Rakesh Ghiya and Laurie J. Hendren. Connection analysis: A practical interprocedural heap analysis for C. In Proceedings of the Workshop for Languages and Compilers for Parallel Computing, 1995.
[18]
Rakesh Ghiya and Laurie J. Hendren. Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C. In Proceedings of POPL, 1996.
[19]
L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, I (1):35-47, January 1990.
[20]
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiler optimizations for FORTRAN D on MIMD distributed-memory machines. Communications of the A CM, August 1992.
[21]
James Philbin, et. al. Thread scheduling for cache locality. In Proceedings of the Seventh Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS- VII), 1996.
[22]
Vijay Karamcheti and Andrew A. Chien. FM-- fast messaging on the Cray T3D. Available from http: //www-csag. cs. uiuc. edu/papers/ t3d- fm-manual, ps, February 1995.
[23]
Chi-Keung Luk and Todd. C. Mowry. Compiler-based prefetching for recursive data structures. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages an Operating Systems, pages 62-73, 1996.
[24]
Kathryn S. McKinley. Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors. In Proceedings of the International Conference on Supercomputing, 1994.
[25]
D. Padua and M. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12), 1986.
[26]
Scott Pakin, Vijay Karamcheti, and Andrew A. Chien. Fast Messages (FM): Efficient, portable communication for workstation clusters and massively-parallel processors. IEEE Concurrency, to appear in 1997. Available from http: //www-csag. cs. uiuc. edu/papers / fm-pdt, ps.
[27]
John Plevyak and Andrew A. Chien. Precise concrete type inference of object-oriented programs. In Proceedings of OOP- SLA'94, Object-Oriented Programming Systems, Languages and Architectures, pages 324-340, 1994.
[28]
John Plevyak and Andrew A. Chien. Type directed cloning for object-oriented programs. In Proceedings of the Workshop for Languages and Compilers for Parallel Computing, pages 566-580, 1995.
[29]
William T. Rankin and John A. Board Jr. A portable distributed implementation of the parallel multiple tree algorithm. Technical Report 95-002, Duke University, Department of Electrical Engineering, 1995.
[30]
Martin Rinard and Pedro C. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the 1996 A CM SIGPLAN Conference on Programming Language Design and Implementation, pages 54--67, 1996.
[31]
J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-time scheduling and execution of loops on message passing machines. Journal of Parallel and Distributed Computing, 8:303-312, 1990.
[32]
Vivek Sarkar and Radhika Thekkath. A general framework for iteration-reordering loop transformations. In Proceedings of the A CM SIGPLAN Conference on Programming Language Design and Implementation, 1992.
[33]
Klaus Erik Schauser, David E. Culler, and Thorsten von Eicken. Compiler-controlled multithreading for lenient parallel languages. In Proceedings of the Conference on Functional Programming and Computer Architecture, 1991.
[34]
Jaswinder Pal Singh. Parallel Hierarchical N-Body Methods and Their Implicatrons For Multiprocessors. PhD thesis, Stanford University Department of Computer Science, Stanford, CA, February 1993.
[35]
Jaswinder Pal Singh, Chris Holt, John L. Hennessy, and Anoop Gupta. A parallel adaptive fast multipole method. In Proceedings of Supercomputing Conference, pages 54--65, 1993.
[36]
Kenneth R. Traub, David E. Culler, and Klause E. Schauser. Global analysis for partitioning non-strict programs into sequential threads. In Proc. 1992 ACM Conference on Lisp and Functional Programming, June 1992.
[37]
M. Warren and J. Salmon. A parallel hashed oct-tree N-body algorithm. In Proceedings of Supercomputing Conference, pages 12-21, 1993.
[38]
Steven Cameron Woo, Moriyoshi Oham, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture, pages 24-36, 1995.

Cited By

View all

Index Terms

  1. Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

                        Recommendations

                        Comments

                        Information & Contributors

                        Information

                        Published In

                        cover image ACM Conferences
                        PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
                        June 1997
                        287 pages
                        ISBN:0897919068
                        DOI:10.1145/263764
                        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                        Sponsors

                        Publisher

                        Association for Computing Machinery

                        New York, NY, United States

                        Publication History

                        Published: 21 June 1997

                        Permissions

                        Request permissions for this article.

                        Check for updates

                        Qualifiers

                        • Article

                        Conference

                        PPoPP97
                        Sponsor:
                        PPoPP97: Principles & Practices of Parallel Programming
                        June 18 - 21, 1997
                        Nevada, Las Vegas, USA

                        Acceptance Rates

                        PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;
                        Overall Acceptance Rate 230 of 1,014 submissions, 23%

                        Contributors

                        Other Metrics

                        Bibliometrics & Citations

                        Bibliometrics

                        Article Metrics

                        • Downloads (Last 12 months)79
                        • Downloads (Last 6 weeks)9
                        Reflects downloads up to 06 Jan 2025

                        Other Metrics

                        Citations

                        Cited By

                        View all

                        View Options

                        View options

                        PDF

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader

                        Login options

                        Media

                        Figures

                        Other

                        Tables

                        Share

                        Share

                        Share this Publication link

                        Share on social media