article

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

Authors:

Martin Schreiber,

Christoph Riesinger,

Hans-Joachim Bungartz,

Alexander BreuerAuthors Info & Claims

International Journal of Parallel Programming, Volume 43, Issue 6

Pages 1004 - 1027

https://doi.org/10.1007/s10766-014-0336-3

Published: 01 December 2015 Publication History

Abstract

Achieving high scalability with dynamically adaptive algorithms in high-performance computing (HPC) is a non-trivial task. The invasive paradigm using compute migration represents an efficient alternative to classical data migration approaches for such algorithms in HPC. We present a core-distribution scheduler which realizes the migration of computational power by distributing the cores depending on the requirements specified by one or more parallel program instances. We validate our approach with different benchmark suites for simulations with artificial workload as well as applications based on dynamically adaptive shallow water simulations, and investigate concurrently executed adaptivity parameter studies on realistic Tsunami simulations. The invasive approach results in significantly faster overall execution times and higher hardware utilization than alternative approaches. A dynamic resource management is therefore mandatory for a more efficient execution of scenarios similar to our simulations, e.g. several Tsunami simulations in urgent computing, to overcome strong scalability challenges in the area of HPC. The optimizations obtained by invasive migration of cores can be generalized to similar classes of algorithms with dynamic resource requirements.

References

[1]

Aizinger, V.: A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Adv. Water Resour. 25, 67---84 (2002)

[2]

Al Faruque, M.A., Krist, R., Henkel, J.: ADAM: run-time agent-based distributed application mapping for on-chip communication. In: Proceedings of the 45th Annual Design Automation Conference, ACM, New York, NY, USA, DAC '08, pp. 760---765 (2008)

[3]

Bader, M., Breuer, A., Schreiber, M.: Parallel fully adaptive tsunami simulations. In: Facing the Multicore-Challenge III, Institut für Informatik, Technische Universität München, Springer, Heidelberg, Germany. Lecture Notes in Computer Science, vol. 7686 (2012a)

[4]

Bader, M., Bungartz, H.J., Schreiber, M.: Invasive computing on high performance shared memory systems. In: Facing the Multicore-Challenge III. Lecture Notes in Computer Science, vol. 7686, pp. 1---12. Springer (2012b)

[5]

Bangerth, W., Hartmann, R., Kanschat, G.: Deal.II--a general purpose object oriented finite element library. ACM Trans. Math. Softw. 33(4), 1---27 (2007)

Digital Library

[6]

Becchi, M., Crowley, P.: Dynamic thread assignment on heterogeneous multiprocessor architectures. In: Proceedings of the 3rd Conference on Computing Frontiers, ACM, New York, NY, USA, CF '06, pp. 29---40 (2006)

Digital Library

[7]

Behrens, J.: Efficiency for adaptive triangular meshes: key issues of future approaches. In: Hamilton, K., Lohmann, G., Mysak, L. A. (eds.) Earth System Modelling, vol. 2. Springer (2012)

[8]

Bhadauria, M., McKee, S.: An approach to resource-aware co-scheduling for CMPs. In: Proceedings of the 24th ACM International Conference on Supercomputing, ACM, ICS '10, pp. 189---199 (2010)

Digital Library

[9]

BODC.: Centenary Edition of the GEBCO Digital Atlas (2013)

[10]

Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: 4th Symposium on Experimental Distributed and Multiprocessor Systems, pp. 57---71 (1993)

[11]

Burstedde, C., Wilcox, L.C., Ghattas, O.: p4est: scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM J. Sci. Comput. 33(3), 1103---1133 (2011).

Digital Library

[12]

Castro, C., Käser, M., Toro, E.: Space-time adaptive numerical methods for geophysical applications. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 367, 4613---4631 (2009)

[13]

Corbalán, J., Martorell, X., Labarta, J.: Performance-driven processor allocation. In: Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation, vol. 4 (2000)

[14]

Corbalan, J., Martorell, X., Labarta, J.: Performance-driven processor allocation. IEEE Trans. Parallel Distrib. Syst. 16(7), 599---611 (2005)

Digital Library

[15]

De Grande, R., Boukerche, A.: Dynamic load redistribution based on migration latency analysis for distributed virtual simulations. In: 2011 IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE), pp. 88---93 (2011).

[16]

Drosinos, N., Koziris, N.: Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters. In: Parallel and Distributed Processing Symposium 2004 IEEE (2004)

[17]

Falby, J.S., Zyda, M.J., Pratt, D.R., Mackey, R.L.: NPSNET: hierarchical data structures for real-time three-dimensional visual simulation. Comput. Graph. 17(1), 65---69 (1993)

[18]

Fleisch, B.D.: Distributed system V IPC in LOCUS: a design and implementation retrospective. ACM SIGCOMM Comput. Commun. Rev. ACM 16, 386---396 (1986)

Digital Library

[19]

Fletcher, R., Powell, M.J.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163---168 (1963)

[20]

Garcia, M., Corbalan, J., Badia Maria, R., Labarta, J.: A dynamic load balancing approach with SMPSuperscalar and MPI. In: Keller, R., Kramer, D., Weiss, J.P. (eds.) Facing the Multicore-Challenge II, Springer Berlin Heidelberg, Stuttgart (2012)

[21]

George, D.: Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227(6), 3089---3113 (2008)

Digital Library

[22]

Gerndt, M., Hollmann, A., Meyer, M., Schreiber, M., Weidendorfer, J.: Invasive computing with iOMP. In: Specification and Design Languages (FDL), pp. 225---231. IEEE, Vienna (2012)

[23]

Hesthaven, J.S., Warburton, T.: Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications, pp. 97---107. Springer Verlag, New York (2008)

[24]

Hsieh, W.C.Y.: Dynamic computation migration in distributed shared memory systems. PhD thesis, MIT (1995)

[25]

Keyes, D.E.: Four horizons for enhancing the performance of parallel simulations based on partial differential equations. In: Euro-Par 2000 Parallel Processing, pp. 1---17. Springer (2000)

[26]

Kobbe, S., Bauer, L., Lohmann, D., Schröder-Preikschat, W., Henkel, J.: DistRM: Distributed resource management for on-chip many-core systems. In: Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, ACM, pp. 119---128 (2011)

[27]

Li, D., De Supinski, B., Schulz, M., Cameron, K., Nikolopoulos, D.: Hybrid MPI/OpenMP power-aware computing. In: Parallel Distributed Processing (IPDPS), pp. 1---12 (2010)

[28]

Meister, O., Rahnema, K., Bader, M.: A software concept for cache-efficient simulation on dynamically adaptive structured triangular grids. In: PARCO, pp. 251---260 (2011)

[29]

Michael, M.M.: Scalable lock-free dynamic memory allocation. ACM SIGPLAN Not. ACM 39, 35---46 (2004)

Digital Library

[30]

Neckel, T.: The PDE framework peano: an environment for efficient flow simulations. Dissertation, Institut für Informatik, Technische Universität München (2009)

[31]

Nogina, S., Unterweger, K., Weinzierl, T.: Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures. In: PPAM 2011. Lecture Notes in Computer Science, vol. 7203, pp. 671---680. Springer, Heidelberg (2012)

[32]

Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media Inc, Sebastopol (2010)

[33]

Rosu, D., Schwan, K., Yalamanchili, S., Jha, R.: On adaptive resource allocation for complex real-time applications. In: Proceedings of the 18th IEEE Real-Time Systems Symposium, IEEE Computer Society, Washington, DC, USA, RTSS '97, p. 320 (1997).

[34]

Rüde, U.: Fully adaptive multigrid methods. SIAM J. Numer. Anal. 30(1), 230---248 (1993)

Digital Library

[35]

Rusanov, V.V.: Calculation of interaction of non-steady shock waves with obstacles. NRC, Division of Mechanical Engineering (1962)

[36]

Sagan, H.: Space-Filling Curves, vol. 18. Springer, New York (1994)

[37]

Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Müller, M.: Assessing the performance of openmp programs on the intel xeon phi. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013 Parallel Processing. Lecture Notes in Computer Science, vol. 8097, pp. 547---558. Springer, Berlin (2013)

[38]

Schreiber, M., Bungartz, H.J., Bader, M.: Shared memory parallelization of fully-adaptive simulations using a dynamic tree-split and -join approach. In: IEEE International Conference on High Performance Computing (HiPC), IEEE Xplore, Puna, India (2012)

[39]

Schreiber, M., Weinzierl, T., Bungartz, H.J.: Cluster optimization of parallel simulations with dynamically adaptive grids. In: EuroPar 2013, Aachen, Germany (2013a)

[40]

Schreiber, M., Weinzierl, T., Bungartz, H.J.: SFC-based communication metadata encoding for adaptive mesh. In: Proceedings of the International Conference on Parallel Computing (ParCo) (2013b)

[41]

Shao, G., Li, X., Ji, C., Maeda, T.: Focal mechanism and slip history of the 2011 Mw 9.1 off the Pacific coast of Tohoku Earthquake, constrained with teleseismic body and surface waves. Earth Planets Space 63(7), 559---564 (2011)

[42]

Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Multiprocessor SoC, pp. 241---268. Springer (2011)

[43]

Tradowsky, C., Schreiber, M., Vesper, M., Domladovec, I., Braun, M., Bungartz, H.J., Becker, J.: Towards Dynamic Cache and Bandwidth Invasion, pp. 97---107. Springer International Publishing (2014)

[44]

Vigh, C.A.: Parallel simulations of the shallow water equations on structured dynamically adaptive triangular grids. Dissertation, Institut für Informatik, Technische Universität München (2012)

[45]

Vuchener, C., Esnard, A.: Dynamic load-balancing with variable number of processors based on graph repartitioning. In: Proceedings of High Performance Computing (HiPC 2012), pp. 1---9 (2012)

[46]

Weinzierl, T.: A framework for parallel PDE solvers on multiscale adaptive cartesian grids. Dissertation, Institut für Informatik, Technische Universität München, München (2009)

Cited By

Huber DSchreiber MSchulz M(2023)A Case Study on PMIx-Usage for Dynamic Resource ManagementHigh Performance Computing10.1007/978-3-031-40843-4_4(42-55)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-40843-4_4

Recommendations

Invasive Compute Balancing for Applications with Hybrid Parallelization
SBAC-PAD '13: Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance Computing

Achieving high scalability for dynamical adaptive algorithms in HPC is a non-trivial task. In this contex, explicit data migration is typically applied which comes with several challenges such as criteria when and where to migrate data or additional ...
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Dense linear algebra has been traditionally used to evaluate the performance and efficiency of new architectures. This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators. In this paper we ...
Fine-grained parallelization of lattice QCD kernel routine on GPUs

Simulation time for the classical problem of Lattice Quantum Chromodynamics (Lattice QCD) is dominated by one kernel routine responsible for computing the actions of a Dirac operator. This paper describes an experience in parallelizing this kernel ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Parallel Programming

International Journal of Parallel Programming Volume 43, Issue 6

December 2015

283 pages

ISSN:0885-7458

Issue’s Table of Contents

Copyright © Copyright © 2015 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huber DSchreiber MSchulz M(2023)A Case Study on PMIx-Usage for Dynamic Resource ManagementHigh Performance Computing10.1007/978-3-031-40843-4_4(42-55)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-40843-4_4

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents