research-article

Public Access

PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations

Authors:

Vitus J. Leung,

Ayse K. CoskunAuthors Info & Claims

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 37 - 46

https://doi.org/10.1145/2751205.2751225

Published: 08 June 2015 Publication History

Abstract

In high performance computing (HPC), applications usually have many parallel tasks running on multiple machine nodes. As these tasks intensively communicate with each other, the communication overhead has a significant impact on an application's execution time. This overhead is determined by the application's communication pattern as well as the network distances between communicating tasks. By mapping the tasks to the available machine nodes in a communication-aware manner, the network distances and the execution times can be significantly reduced.

Existing techniques first allocate available nodes to an application, and then map the tasks onto the allocated nodes. In this paper, we discuss the potential benefits of simultaneous allocation and mapping for applications with irregular communication patterns. We also propose a novel graph-based allocation and mapping technique to reduce the execution time in HPC machines that use non-contiguous allocation, such as Cray XK series. Simulations calibrated with real-life experiments show that our technique reduces hop-bytes up to 30% compared to the state-of-the-art.

References

[1]

Cray Inc. http://www.cray.com/.

[2]

Top 500 supercomputer sites. http://www.top500.org/.

[3]

N. Adiga et al. Blue gene/l torus interconnection network. IBM Journal of Research and Development, 49(2.3):265--276, March 2005.

Digital Library

[4]

C. Albing et al. Scalable node allocation for improved performance in regular and anisotropic 3d torus supercomputers. In Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface, EuroMPI'11, pages 61--70, 2011.

Digital Library

[5]

D. Auble and B. Christiansen. SLURM workload manager overview. Presented at the 2014 ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC'14).

[6]

R. F. Barrett, C. T. Vaughan, and M. A. Heroux. Minighost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing. Technical report, Sandia National Laboratories, Albuquerque, NM, 2012.

[7]

M. A. Bender et al. Communication-aware processor allocation for supercomputers: Finding point sets of small average distance. Algorithmica, 50(2):279--298, Jan. 2008.

Digital Library

[8]

A. Bhatele et al. Automated mapping of regular communication graphs on mesh interconnects. In International Conference on High Performance Computing (HiPC), pages 1--10, Dec 2010.

[9]

A. Bhatelé and L. Kalé. Heuristic-based techniques for mapping irregular communication graphs to mesh topologies. In IEEE 13th International Conference on High Performance Computing and Communications (HPCC), 2011, pages 765--771, Sept 2011.

Digital Library

[10]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2nd edition, 2001.

Digital Library

[11]

E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference, ACM '69, pages 157--172.

Digital Library

[12]

T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.

Digital Library

[13]

M. Deveci et al. Exploiting geometric partitioning in task mapping for parallel computers. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 27--36, 2014.

Digital Library

[14]

J. Dongarra et al. The international exascale software project roadmap. Int. J. High Perform. Comput. Appl., 25(1):3--60, Feb. 2011.

Digital Library

[15]

D. G. Feitelson, D. Tsafrir, and D. Krakov. Experience with the parallel workloads archive, 2012.

[16]

E. S. Hertel et al. CTH: A software family for multi-dimensional shock physics analysis. In Proceedings of the 19th International Symposium on Shock Waves, pages 377--382, 1993.

[17]

T. Hoefler and M. Snir. Generic topology mapping strategies for large-scale parallel architectures. In Proceedings of the International Conference on Supercomputing, ICS '11, pages 75--84, 2011.

Digital Library

[18]

M. Holtgrewe, P. Sanders, and C. Schulz. Engineering a scalable high quality graph partitioner. IPDPS'10, pages 1--12, April 2010.

[19]

G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, Jan. 1998.

Digital Library

[20]

J. Kim et al. Technology-driven, highly-scalable dragonfly topology. In 35th International Symposium on Computer Architecture, 2008. ISCA '08., pages 77--88, June 2008.

Digital Library

[21]

S. Krumke et al. Compact location problems. Theoretical Computer Science, 181:238--247, 1996.

Digital Library

[22]

V. Leung et al. Processor allocation on cplant: achieving general processor locality using one-dimensional allocation strategies. In IEEE International Conference on Cluster Computing (CLUSTER), pages 296--304, 2002.

Digital Library

[23]

D. A. Lifka. The anl/ibm sp scheduling system. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, IPPS '95, pages 295--303, 1995.

Digital Library

[24]

V. Lo, K. J. Windisch, W. Liu, and B. Nitzberg. Noncontiguous processor allocation algorithms for mesh-connected multicomputers. IEEE Trans. Parallel Distrib. Syst., 8(7):712--726, jul 1997.

Digital Library

[25]

Los Alamos National Laboratory. Cielo supercomputer. http://www.lanl.gov/projects/cielo/index.php.

[26]

J. Mache, V. Lo, and K. Windisch. Minimizing message-passing contention in fragmentation-free processor allocation. In Proceedings of the 10th International Conference on Parallel and Distributed Computing Systems, pages 120--124, 1997.

[27]

MPI Forum. MPI: A message-passing interface standart. Version 3.0, Sept. 2012. http://www.mpi-forum.org/.

[28]

F. Pellegrini and J. Roman. Experimental analysis of the dual recursive bipartitioning algorithm for static mapping. Technical report, TR 1038--96, LaBRI, URA CNRS 1304, Univ. Bordeaux I, 1996.

[29]

F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking, HPCN Europe, pages 493--498, 1996.

Digital Library

[30]

A. Rodrigues et al. Improvements to the structural simulation toolkit. In Proceedings of the 5th International Conference on Simulation Tools and Techniques, SIMUTOOLS '12, pages 190--195, 2012.

Digital Library

[31]

H. D. Simon and S.-H. Teng. How good is recursive bisection? SIAM J. Sci. Comput., 18(5):1436--1445, sep 1997.

Digital Library

[32]

V. Subramani, R. Kettimuthu, S. Srinivasan, J. Johnston, and P. Sadayappan. Selective buddy allocation for scheduling parallel jobs on clusters. In CLUSTER'02, pages 107--116, 2002.

Digital Library

[33]

H. Subramoni et al. Design of a scalable infiniband topology service to enable network topology aware placement of processes. In SC'12, pages 70:1--70:12, 2012.

Digital Library

[34]

H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for blue gene/l supercomputer. In SC'06, pages 52--52, Nov 2006.

Digital Library

Cited By

Koohi SHamid NOthman MIbragimov G(2023)HATS: HetTask SchedulingIEEE Transactions on Cloud Computing10.1109/TCC.2022.318408111:2(2071-2083)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3184081
Hu Y(2023)Accelerating Parallel Applications Based on Graph Reordering for Random Network TopologiesIEEE Access10.1109/ACCESS.2023.326979311(40373-40383)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3269793
Wang HQu ZZhou QZhang HLuo BXu WGuo SLi R(2022)A Comprehensive Survey on Training Acceleration for Large Machine Learning Models in IoTIEEE Internet of Things Journal10.1109/JIOT.2021.31116249:2(939-963)Online publication date: 15-Jan-2022
https://doi.org/10.1109/JIOT.2021.3111624
Show More Cited By

Index Terms

PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Generic topology mapping strategies for large-scale parallel architectures
ICS '11: Proceedings of the international conference on Supercomputing

The steadily increasing number of nodes in high-performance computing systems and the technology and power constraints lead to sparse network topologies. Efficient mapping of application communication patterns to the network topology gains importance as ...
Local search to improve coordinate-based task mapping

Local search algorithm that improves on task mapping algs for stencil patterns.Algorithm shown to reduce total running time and running time variability.Improvement shown to depend on the allocation algorithm used.Number of swaps made shown to be ...
Task mapping stencil computations for non-contiguous allocations
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

We examine task mapping algorithms for systems that allocate jobs non-contiguously. Several studies have shown that task placement affects job running time. We focus on jobs with a stencil communication pattern and use experiments on a Cray XE to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

June 2015

446 pages

ISBN:9781450335591

DOI:10.1145/2751205

General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICS'15

Sponsor:

SIGARCH

ICS'15: 2015 International Conference on Supercomputing

June 8 - 11, 2015

California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
466
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)17

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Koohi SHamid NOthman MIbragimov G(2023)HATS: HetTask SchedulingIEEE Transactions on Cloud Computing10.1109/TCC.2022.318408111:2(2071-2083)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3184081
Hu Y(2023)Accelerating Parallel Applications Based on Graph Reordering for Random Network TopologiesIEEE Access10.1109/ACCESS.2023.326979311(40373-40383)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3269793
Wang HQu ZZhou QZhang HLuo BXu WGuo SLi R(2022)A Comprehensive Survey on Training Acceleration for Large Machine Learning Models in IoTIEEE Internet of Things Journal10.1109/JIOT.2021.31116249:2(939-963)Online publication date: 15-Jan-2022
https://doi.org/10.1109/JIOT.2021.3111624
Hu YKoibuchi M(2022)The Case for Disjoint Job Mapping on High-Radix Networked Parallel ComputersAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95388-1_9(123-143)Online publication date: 23-Feb-2022
https://doi.org/10.1007/978-3-030-95388-1_9
Koohi SAbdul Hamid NOthman MIbragimov G(2021)ROA-CONS: Raccoon Optimization for Job SchedulingSymmetry10.3390/sym1312227013:12(2270)Online publication date: 29-Nov-2021
https://doi.org/10.3390/sym13122270
Li YChen XLiu JYang BGong CGan XLi SXu H(2020)OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototypeFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.190007521:6(939-949)Online publication date: 3-Jul-2020
https://doi.org/10.1631/FITEE.1900075
HU YKOIBUCHI M(2020)Application Mapping and Scheduling of Uncertain Communication Patterns onto Non-Random and Random Network TopologiesIEICE Transactions on Information and Systems10.1587/transinf.2020PAP0006E103.D:12(2480-2493)Online publication date: 1-Dec-2020
https://doi.org/10.1587/transinf.2020PAP0006
Deveci MDevine KPedretti KTaylor MRajamanickam SCatalyurek U(2019)Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2900043(1-1)Online publication date: 2019
https://doi.org/10.1109/TPDS.2019.2900043
Hu YKoibuchi M(2019)Diameter/ASPL-Based Mapping of Applications with Uncertain Communication over Random Interconnection Networks2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS47876.2019.00044(249-258)Online publication date: Dec-2019
https://doi.org/10.1109/ICPADS47876.2019.00044
Hu Y(2019)Topology Mapping of Parallel Applications onto Random Allocations2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00199(1437-1444)Online publication date: Aug-2019
https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00199
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents