skip to main content
10.1145/2751205.2751225acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations

Published: 08 June 2015 Publication History

Abstract

In high performance computing (HPC), applications usually have many parallel tasks running on multiple machine nodes. As these tasks intensively communicate with each other, the communication overhead has a significant impact on an application's execution time. This overhead is determined by the application's communication pattern as well as the network distances between communicating tasks. By mapping the tasks to the available machine nodes in a communication-aware manner, the network distances and the execution times can be significantly reduced.
Existing techniques first allocate available nodes to an application, and then map the tasks onto the allocated nodes. In this paper, we discuss the potential benefits of simultaneous allocation and mapping for applications with irregular communication patterns. We also propose a novel graph-based allocation and mapping technique to reduce the execution time in HPC machines that use non-contiguous allocation, such as Cray XK series. Simulations calibrated with real-life experiments show that our technique reduces hop-bytes up to 30% compared to the state-of-the-art.

References

[1]
Cray Inc. http://www.cray.com/.
[2]
Top 500 supercomputer sites. http://www.top500.org/.
[3]
N. Adiga et al. Blue gene/l torus interconnection network. IBM Journal of Research and Development, 49(2.3):265--276, March 2005.
[4]
C. Albing et al. Scalable node allocation for improved performance in regular and anisotropic 3d torus supercomputers. In Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface, EuroMPI'11, pages 61--70, 2011.
[5]
D. Auble and B. Christiansen. SLURM workload manager overview. Presented at the 2014 ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC'14).
[6]
R. F. Barrett, C. T. Vaughan, and M. A. Heroux. Minighost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing. Technical report, Sandia National Laboratories, Albuquerque, NM, 2012.
[7]
M. A. Bender et al. Communication-aware processor allocation for supercomputers: Finding point sets of small average distance. Algorithmica, 50(2):279--298, Jan. 2008.
[8]
A. Bhatele et al. Automated mapping of regular communication graphs on mesh interconnects. In International Conference on High Performance Computing (HiPC), pages 1--10, Dec 2010.
[9]
A. Bhatelé and L. Kalé. Heuristic-based techniques for mapping irregular communication graphs to mesh topologies. In IEEE 13th International Conference on High Performance Computing and Communications (HPCC), 2011, pages 765--771, Sept 2011.
[10]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2nd edition, 2001.
[11]
E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference, ACM '69, pages 157--172.
[12]
T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.
[13]
M. Deveci et al. Exploiting geometric partitioning in task mapping for parallel computers. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 27--36, 2014.
[14]
J. Dongarra et al. The international exascale software project roadmap. Int. J. High Perform. Comput. Appl., 25(1):3--60, Feb. 2011.
[15]
D. G. Feitelson, D. Tsafrir, and D. Krakov. Experience with the parallel workloads archive, 2012.
[16]
E. S. Hertel et al. CTH: A software family for multi-dimensional shock physics analysis. In Proceedings of the 19th International Symposium on Shock Waves, pages 377--382, 1993.
[17]
T. Hoefler and M. Snir. Generic topology mapping strategies for large-scale parallel architectures. In Proceedings of the International Conference on Supercomputing, ICS '11, pages 75--84, 2011.
[18]
M. Holtgrewe, P. Sanders, and C. Schulz. Engineering a scalable high quality graph partitioner. IPDPS'10, pages 1--12, April 2010.
[19]
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, Jan. 1998.
[20]
J. Kim et al. Technology-driven, highly-scalable dragonfly topology. In 35th International Symposium on Computer Architecture, 2008. ISCA '08., pages 77--88, June 2008.
[21]
S. Krumke et al. Compact location problems. Theoretical Computer Science, 181:238--247, 1996.
[22]
V. Leung et al. Processor allocation on cplant: achieving general processor locality using one-dimensional allocation strategies. In IEEE International Conference on Cluster Computing (CLUSTER), pages 296--304, 2002.
[23]
D. A. Lifka. The anl/ibm sp scheduling system. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, IPPS '95, pages 295--303, 1995.
[24]
V. Lo, K. J. Windisch, W. Liu, and B. Nitzberg. Noncontiguous processor allocation algorithms for mesh-connected multicomputers. IEEE Trans. Parallel Distrib. Syst., 8(7):712--726, jul 1997.
[25]
Los Alamos National Laboratory. Cielo supercomputer. http://www.lanl.gov/projects/cielo/index.php.
[26]
J. Mache, V. Lo, and K. Windisch. Minimizing message-passing contention in fragmentation-free processor allocation. In Proceedings of the 10th International Conference on Parallel and Distributed Computing Systems, pages 120--124, 1997.
[27]
MPI Forum. MPI: A message-passing interface standart. Version 3.0, Sept. 2012. http://www.mpi-forum.org/.
[28]
F. Pellegrini and J. Roman. Experimental analysis of the dual recursive bipartitioning algorithm for static mapping. Technical report, TR 1038--96, LaBRI, URA CNRS 1304, Univ. Bordeaux I, 1996.
[29]
F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking, HPCN Europe, pages 493--498, 1996.
[30]
A. Rodrigues et al. Improvements to the structural simulation toolkit. In Proceedings of the 5th International Conference on Simulation Tools and Techniques, SIMUTOOLS '12, pages 190--195, 2012.
[31]
H. D. Simon and S.-H. Teng. How good is recursive bisection? SIAM J. Sci. Comput., 18(5):1436--1445, sep 1997.
[32]
V. Subramani, R. Kettimuthu, S. Srinivasan, J. Johnston, and P. Sadayappan. Selective buddy allocation for scheduling parallel jobs on clusters. In CLUSTER'02, pages 107--116, 2002.
[33]
H. Subramoni et al. Design of a scalable infiniband topology service to enable network topology aware placement of processes. In SC'12, pages 70:1--70:12, 2012.
[34]
H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for blue gene/l supercomputer. In SC'06, pages 52--52, Nov 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
June 2015
446 pages
ISBN:9781450335591
DOI:10.1145/2751205
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. non-contiguous allocation
  2. performance
  3. task mapping
  4. topology mapping
  5. unstructured communication pattern

Qualifiers

  • Research-article

Funding Sources

Conference

ICS'15
Sponsor:
ICS'15: 2015 International Conference on Supercomputing
June 8 - 11, 2015
California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)17
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media