research-article

MCAMP: communication optimization on massively parallel machines with hierarchical scratch-pad memory

Authors:

Hiroshige Hayashizaki,

Yutaka Sugawara,

Mary Inaba,

Kei HirakiAuthors Info & Claims

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Pages 102 - 111

https://doi.org/10.1145/1454115.1454132

Published: 25 October 2008 Publication History

Get Access

Abstract

Massively parallel machines that integrate a large number of simple processors and small scratch-pad memories (SPMs) into a single chip can achieve a high peak performance per watt of power. In these machines, communication optimizations are important because the communication bandwidth tends to be a bottleneck. Previously proposed communication optimizations using copy candidates, which have been shown to be effective, detect frequently reused array regions by compile-time analysis and copy the regions to scratch-pad memories nearer to the processors. However, they have been proposed for uniprocessor systems or small parallel machines with one or more layers of scratch-pad memories, and the analysis time increases when they are applied to massively parallel machines. In this paper, we propose Multilayer Copy-candidate Analysis for Massively Parallel machines (MCAMP), a communication optimization method for massively parallel machines. MCAMP re-formalizes the framework used in earlier works and improves the scalability of the analysis by assuming the homogeneity of the target systems. We implemented an MCAMP optimizer, which takes an input program that consists of perfectly nested loops containing array references and computation codes, and generates optimized communication. We measured the performance of the output programs of the MCAMP optimizer by executing them on a real massively parallel machine GRAPE-DR using a software tool chain that we also implemented. We showed that MCAMP can achieve optimal data transfer patterns and comparable performance to that of hand-optimized codes with a short analysis time.

References

[1]

lp_solve. version 5.5.0.11 http://sourceforge.net/projects/lpsolve.

Google Scholar

[2]

J. Absar and F. Catthoor. Reuse analysis of indirectly indexed arrays. ACM Trans. Des. Autom. Electron. Syst., 11(2):282--305, 2006.

Digital Library

Google Scholar

[3]

P. Briggs, K. D. Cooper, and L. Torczon. Coloring register pairs. ACM Lett. Program. Lang. Syst., 1(1):3--13, 1992.

Digital Library

Google Scholar

[4]

E. Brockmeyer, M. Miranda, H. Corporaal, and F. Catthoor. Layer assignment techniques for low energy in multi-layered memory organisations. In DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, page 11070, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

Google Scholar

[5]

J. P. Diguet, S. Wuytack, F. Catthoor, and H. D. Man. Formalized methodology for data reuse exploration in hierarchical memory mappings. In ISLPED '97: Proceedings of the 1997 international symposium on Low power electronics and design, pages 30--35, New York, NY, USA, 1997. ACM.

Digital Library

Google Scholar

[6]

S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiler optimizations for fortran D on MIMD distributed-memory machines. In Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 86--100, New York, NY, USA, 1991. ACM.

Digital Library

Google Scholar

[7]

I. Issenin, E. Brockmeyer, B. Durinck, and N. Dutt. Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies. In DAC '06: Proceedings of the 43rd annual conference on Design automation, pages 49--52, New York, NY, USA, 2006. ACM.

Digital Library

Google Scholar

[8]

I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt.Drdu: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12(2):15, 2007.

Digital Library

Google Scholar

[9]

I. Issenin and N. Dutt. Data reuse driven energy-aware MPSoC co-synthesis of memory and communication architecture for streaming applications. In CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, pages 294--299, New York, NY, USA, 2006. ACM.

Digital Library

Google Scholar

[10]

M. Kandemir and A. Choudhary. Compiler-directed scratch pad memory hierarchy design and management. In DAC '02: Proceedings of the 39th conference on Design automation, pages 628--633, New York, NY, USA, 2002. ACM.

Digital Library

Google Scholar

[11]

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 02(3):361--376, 1991.

Digital Library

Google Scholar

[12]

J. Makino, K. Hiraki, and M. Inaba. GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--11, New York, NY, USA, 2007. ACM.

Digital Library

Google Scholar

[13]

O. Ozturk, M. Kandemir, M. J. Irwin, and S. Tosun. Multi-level on-chip memory hierarchy design for embedded chip multiprocessors. In ICPADS '06: Proceedings of the 12th International Conference on Parallel and Distributed Systems, pages 383--390, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

Google Scholar

[14]

T. Van Achteren, G. Deconinck, F. Catthoor, and R. Lauwereins. Data reuse exploration techniques for loop-dominated applications. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, page 428, Washington, DC, USA, 2002. IEEE Computer Society.

Digital Library

Google Scholar

Index Terms

MCAMP: communication optimization on massively parallel machines with hierarchical scratch-pad memory
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Experiences on Porting a Parallel Objects Environment from a Transputer Network to a PVM-Based System
PDP '96: Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)

Abstract: Parallel Objects is a powerful model for distributed/parallel Object-Oriented programming. Goal of this paper is to present the approach adopted in porting the support of the Parallel Objects environment, originally implemented for a massively ...
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

In multimedia and other streaming applications, a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce this energy by making copies of frequently used data in a small local ...
Dynamic scratch-pad memory management with data pipelining for embedded systems
Advanced Topics on Scalable Computing

In this paper, we propose an effective data pipelining technique, SPDP (Scratch-Pad Data Pipelining), for dynamic scratch-pad memory (SPM) management with DMA (Direct Memory Access). Our basic idea is to overlap the execution of CPU instructions and DMA ...

Comments

Information & Contributors

Information

Published In

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

328 pages

ISBN:9781605582825

DOI:10.1145/1454115

General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '08

Sponsor:

PACT '08: International Conference on Parallel Architectures and Compilation Techniques

October 25 - 29, 2008

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
247
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Experiences on Porting a Parallel Objects Environment from a Transputer Network to a PVM-Based System

DRDU: A data reuse analysis technique for efficient scratch-pad memory management

Dynamic scratch-pad memory management with data pipelining for embedded systems

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Experiences on Porting a Parallel Objects Environment from a Transputer Network to a PVM-Based System

DRDU: A data reuse analysis technique for efficient scratch-pad memory management

Dynamic scratch-pad memory management with data pipelining for embedded systems

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations