Article

Free access

Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

Authors:

Andrew A. ChienAuthors Info & Claims

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 37 - 47

https://doi.org/10.1145/263764.263771

Published: 21 June 1997 Publication History

Abstract

Loop tiling and communication optimization, such as message pipelining and aggregation, can achieve optimized and robust memory performance by proactively managing storage and data movement. In this paper, we generalize these techniques to pointer-based data structures (PBDSs). Our approach, dynamic pointer alignment (DPA), has two components. The compiler decomposes a program into non-blocking threads that operate on specific pointers and labels thread creation sites with their corresponding pointers. At runtime, an explicit mapping from pointers to dependent threads is updated at thread creation and is used to dynamically schedule both threads and communication, such that threads using the same objects execute together, communication overlaps with local work, and messages are aggregated. We have implemented DPA to optimize remote reads to global PBDSs on parallel machines. Our empirical results on the force computation phases of two applications that use sophisticated PBDSs, Barnes-Hut and FMM, show that DPA achieves good absolute performance and speedups by enabling tiling and communication optimization on the CRAY T3D.

References

[1]

Jennifer M. Anderson and Monica S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 1993.

Digital Library

[2]

Thomas Ball and Susan Horwitz. Constructing control flow from control dependence. Technical Report 1091, University of Wisconsin-Madison, 1992.

[3]

Martin C. Carlisle and Anne Rogers. Software caching and computation migration in olden. In Proceedings of Fifth Symposium on Principles and Practice of Parallel Programming, 1995.

Digital Library

[4]

Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS- VI), 1994.

Digital Library

[5]

K. Mani Chandy and Carl Kesselman. Compositional C++: Compositional parallel programming. In Proceedings of the Fifth Workshop on Compilers and Languages for Parallel Computing, New Haven, Connecticut, 1992. YALEU/DCS/RR-915, Springer-Verlag Lecture Notes in Computer Science, 1993.

Digital Library

[6]

A. A. Chien, U. S. Reddy, J. Plevyak, and J. Dolby. ICC++ - a C++ dialect for high performance parallel computing. In Proceedings of the 2nd International Symposium on Object Technologies for Advanced Software. Springer-Verlag, LNCS 742, 1996.

Digital Library

[7]

Andrew Chien, Julian Dolby, Bishwaroop Ganguly, Vijay Karamcheti, and Xingbin Zhang. Supporting high level programming with high performance: The Illinois Concert system. In Proceedings of the Second International Workshop on High-level Parallel Programming Models and Supportive Environments, April 1997.

Digital Library

[8]

Andrew Chien and Uday Reddy. ICC++ language definition. Concurrent Systems Architecture Group Memo, Also available from http://www-csag.cs.uiuc.edu/, February 1995.

[9]

Cray Research, Inc. Cray T3D System Architecture Overview, March 1993.

[10]

David Culler, Anurag Sah, Klaus Erik Schauser, Thorsten von Eicken, and John Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 164-75, 1991.

Digital Library

[11]

David E. Culler. Managing Parallelism and Resources in Scientific Dataflow Programs. PhD thesis, Massachusetts Institute of Technology, Laboratory of Computer Science, Cambridge, MA, June 1989.

[12]

R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and E Zadeck. An efficient method of computing static single assignment form and the control dependence graph. A CM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991.

Digital Library

[13]

Julian Dolby. Automatic inline allocation of objects. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and implementation, June 1997.

Digital Library

[14]

B. Falsafi, A. Lebeck, S. Reinhardt, I. Schoinas, M. Hill, J. Larus, A. Rogers, and D. Wood. Application-specific protocols for user-level shared memory. In Proceedings of Supercomputing '94, Washington, D.C., 1994.

Digital Library

[15]

Jeanne Fen'ante, Karl J. Ottenstein, and Joe D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319-49, July 1987.

Digital Library

[16]

High Performance Fortran Forum. High performance Fortran language specification version 1.0. Technical Report CRPC- TR92225, Rice University, January 1993.

[17]

Rakesh Ghiya and Laurie J. Hendren. Connection analysis: A practical interprocedural heap analysis for C. In Proceedings of the Workshop for Languages and Compilers for Parallel Computing, 1995.

Digital Library

[18]

Rakesh Ghiya and Laurie J. Hendren. Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C. In Proceedings of POPL, 1996.

Digital Library

[19]

L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, I (1):35-47, January 1990.

Digital Library

[20]

Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiler optimizations for FORTRAN D on MIMD distributed-memory machines. Communications of the A CM, August 1992.

Digital Library

[21]

James Philbin, et. al. Thread scheduling for cache locality. In Proceedings of the Seventh Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS- VII), 1996.

Digital Library

[22]

Vijay Karamcheti and Andrew A. Chien. FM-- fast messaging on the Cray T3D. Available from http: //www-csag. cs. uiuc. edu/papers/ t3d- fm-manual, ps, February 1995.

[23]

Chi-Keung Luk and Todd. C. Mowry. Compiler-based prefetching for recursive data structures. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages an Operating Systems, pages 62-73, 1996.

Digital Library

[24]

Kathryn S. McKinley. Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors. In Proceedings of the International Conference on Supercomputing, 1994.

Digital Library

[25]

D. Padua and M. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12), 1986.

Digital Library

[26]

Scott Pakin, Vijay Karamcheti, and Andrew A. Chien. Fast Messages (FM): Efficient, portable communication for workstation clusters and massively-parallel processors. IEEE Concurrency, to appear in 1997. Available from http: //www-csag. cs. uiuc. edu/papers / fm-pdt, ps.

[27]

John Plevyak and Andrew A. Chien. Precise concrete type inference of object-oriented programs. In Proceedings of OOP- SLA'94, Object-Oriented Programming Systems, Languages and Architectures, pages 324-340, 1994.

Digital Library

[28]

John Plevyak and Andrew A. Chien. Type directed cloning for object-oriented programs. In Proceedings of the Workshop for Languages and Compilers for Parallel Computing, pages 566-580, 1995.

Digital Library

[29]

William T. Rankin and John A. Board Jr. A portable distributed implementation of the parallel multiple tree algorithm. Technical Report 95-002, Duke University, Department of Electrical Engineering, 1995.

[30]

Martin Rinard and Pedro C. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the 1996 A CM SIGPLAN Conference on Programming Language Design and Implementation, pages 54--67, 1996.

Digital Library

[31]

J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-time scheduling and execution of loops on message passing machines. Journal of Parallel and Distributed Computing, 8:303-312, 1990.

Digital Library

[32]

Vivek Sarkar and Radhika Thekkath. A general framework for iteration-reordering loop transformations. In Proceedings of the A CM SIGPLAN Conference on Programming Language Design and Implementation, 1992.

Digital Library

[33]

Klaus Erik Schauser, David E. Culler, and Thorsten von Eicken. Compiler-controlled multithreading for lenient parallel languages. In Proceedings of the Conference on Functional Programming and Computer Architecture, 1991.

Digital Library

[34]

Jaswinder Pal Singh. Parallel Hierarchical N-Body Methods and Their Implicatrons For Multiprocessors. PhD thesis, Stanford University Department of Computer Science, Stanford, CA, February 1993.

[35]

Jaswinder Pal Singh, Chris Holt, John L. Hennessy, and Anoop Gupta. A parallel adaptive fast multipole method. In Proceedings of Supercomputing Conference, pages 54--65, 1993.

Digital Library

[36]

Kenneth R. Traub, David E. Culler, and Klause E. Schauser. Global analysis for partitioning non-strict programs into sequential threads. In Proc. 1992 ACM Conference on Lisp and Functional Programming, June 1992.

Digital Library

[37]

M. Warren and J. Salmon. A parallel hashed oct-tree N-body algorithm. In Proceedings of Supercomputing Conference, pages 12-21, 1993.

Digital Library

[38]

Steven Cameron Woo, Moriyoshi Oham, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture, pages 24-36, 1995.

Digital Library

Cited By

Sundararajah KKulkarni MMcKinley KFisher K(2019)Composable, sound transformations of nested recursion and loopsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314592(902-917)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314592
Sundararajah KSakka LKulkarni M(2017)Locality Transformations for Nested Recursive Iteration SpacesACM SIGARCH Computer Architecture News10.1145/3093337.303772045:1(281-295)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037720
Sundararajah KSakka LKulkarni M(2017)Locality Transformations for Nested Recursive Iteration SpacesACM SIGPLAN Notices10.1145/3093336.303772052:4(281-295)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037720
Show More Cited By

Index Terms

Recommendations

Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

Loop tiling and communication optimization, such as message pipelining and aggregation, can achieve optimized and robust memory performance by proactively managing storage and data movement. In this paper, we generalize these techniques to pointer-based ...
Interprocedural pointer alias analysis

We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework ...
Practical pointer aliasing analysis

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming

June 1997

287 pages

ISBN:0897919068

DOI:10.1145/263764

Chairmen:
Rob Schreiber
Hewlett-Packard Labs, Palo Alto, CA
,
Keshav Pingali
Cornell Univ., Ithaca, NY
,
Editor:
Michael A. Berman

ACM SIGPLAN Notices Volume 32, Issue 7
July 1997
287 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/263767
Chairmen:
Rob Schreiber
Hewlett-Packard Labs, Palo Alto, CA
,
Keshav Pingali
Cornell Univ., Ithaca, NY
,
Editor:
A. Michael Berman
Issue’s Table of Contents

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PPoPP97

Sponsor:

SIGPLAN

PPoPP97: Principles & Practices of Parallel Programming

June 18 - 21, 1997

Nevada, Las Vegas, USA

Acceptance Rates

PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
516
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)9

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sundararajah KKulkarni MMcKinley KFisher K(2019)Composable, sound transformations of nested recursion and loopsProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314592(902-917)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314592
Sundararajah KSakka LKulkarni M(2017)Locality Transformations for Nested Recursive Iteration SpacesACM SIGARCH Computer Architecture News10.1145/3093337.303772045:1(281-295)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037720
Sundararajah KSakka LKulkarni M(2017)Locality Transformations for Nested Recursive Iteration SpacesACM SIGPLAN Notices10.1145/3093336.303772052:4(281-295)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037720
Sundararajah KSakka LKulkarni M(2017)Locality Transformations for Nested Recursive Iteration SpacesACM SIGOPS Operating Systems Review10.1145/3093315.303772051:2(281-295)Online publication date: 4-Apr-2017
https://doi.org/10.1145/3093315.3037720
Sundararajah KSakka LKulkarni MChen YTemam OCarter J(2017)Locality Transformations for Nested Recursive Iteration SpacesProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037720(281-295)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3037697.3037720
Hegde NLiu JSundararajah KKulkarni M(2017)Treelogy: A benchmark suite for tree traversals2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975294(227-238)Online publication date: Apr-2017
https://doi.org/10.1109/ISPASS.2017.7975294
Liu JHegde NKulkarni M(2016)Hybrid CPU-GPU scheduling and execution of tree traversalsProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926261(1-12)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926261
Ganguly BChien A(2002)High-Level Parallel Programming of an Adaptive Mesh Application Using the Illinois Concert SystemComputing in Object-Oriented Parallel Environments10.1007/3-540-49372-7_5(47-58)Online publication date: 15-Aug-2002
https://doi.org/10.1007/3-540-49372-7_5
Zoppetti GAgrawal GPollock LAmaral JTang XGao GReynders JVeidenbaum A(2000)Automatic compiler techniques for thread coarsening for multithreaded architecturesProceedings of the 14th international conference on Supercomputing10.1145/335231.335261(306-315)Online publication date: 8-May-2000
https://dl.acm.org/doi/10.1145/335231.335261
Liu LSun HLi CHu YZheng NLi T(2016)Towards an Adaptive Multi-Power-Source DatacenterProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926276(1-11)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926276

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents