skip to main content
10.1145/3093172.3093237acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article
Open access

Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors

Published: 26 June 2017 Publication History

Abstract

We present a patch-based approach for tsunami simulation with parallel adaptive mesh refinement on the Salomon supercomputer. The special architecture of Salomon, with two Intel Xeon CPUs (Haswell architecture) and two Intel Xeon Phi coprocessors (Knights Corner) per compute node, suggests truly heterogeneous load balancing instead of offload approaches, because host and accelerator achieve comparable performance for our simulations.
We use a tree-structured mesh refinement strategy resulting from newest-vertex bisection of triangular grid cells, but introduce small uniform grid patches into the leaves of the tree to allow vectorisation of the Finite Volume solver over grid cells. In particular, we implemented vectorised versions of the approximate Riemann solvers, exploiting Fortran's array notations where possible. While large patches increase computational performance due to vectorisation, improved memory access and reduced meshing overhead, they also increase the overall number of processed cells. Thus, a trade-off must be found regarding the patch size. We experimented with different patch sizes in a study of the time-to-solution of a simulation of the 2011 Tohoku tsunami, and found that relatively small patches with 82 cells resulted in the smallest execution times.
We use the Xeon Phis in symmetric mode and apply heterogeneous load balancing between hosts and coprocessors, identifying the relative load distribution either from on-the-fly runtime measurements or from a priori exhaustive testing. Both approaches perform better than homogeneous load balancing and better than using only the CPUs or only the Xeon Phi coprocessors in native mode. In all set-ups, however, the absolute speedups are impeded by the slow MPI communication between Xeon Phi coprocessors.

References

[1]
Alexey Androsov, Jörn Behrens, and Sergey Danilov. 2011. Tsunami Modelling with Unstructured Grids. Interaction between Tides and Tsunami Waves. In Computational Science and High Performance Computing IV, Vol. 115. 191--206.
[2]
Michael Bader, Christian Böck, Johannes Schwaiger, and Csaba Attila Vigh. 2010. Dynamically Adaptive Simulations with Minimal Memory Requirement -- Solving the Shallow Water Equations Using Sierpinski Curves. SIAM Journal of Scientific Computing 32, 1 (2010), 212--228.
[3]
Derek S. Bale, Randall J. LeVeque, Sorin Mitran, and James A. Rossmanith. 2002. A wave propagation method for conservation laws and balance laws with spatially varying flux functions. SIAM Journal on Scientific Computing 24, 3 (2002), 955--978.
[4]
Jörn Behrens and Jens Zimmermann. 2000. Parallelizing an Unstructured Grid Generator with a Space-Filling Curve Approach. In Euro-Par 2000 Parallel Processing (Lecture Notes in Computer Science), Vol. 1900. Springer Berlin Heidelberg, 815--823.
[5]
Gheorghe-Teodor Bercea, Andrew T. T. McRae, David A. Ham, Lawrence Mitchell, Florian Rathgeber, Luigi Nardi, Fabio Luporini, and Paul H. J. Kelly. 2016. A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in Firedrake. Geoscientific Model Development 9, 10 (2016), 3803--3815.
[6]
Marsha J. Berger and Phillip Colella. 1989. Local adaptive mesh refinement for shock hydrodynamics. Journal of Computational Physics 82 (1989), 64--84.
[7]
Marsha J. Berger, David L. George, Randall J. LeVeque, and Kyle T. Mandli. 2011. The GeoClaw software for depth-averaged flows with adaptive refinement. Advances in Water Resources 34, 9 (2011), 1195--1206.
[8]
Marsha J. Berger and Joseph Oliger. 1984. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics 53 (1984), 484--512.
[9]
Carsten Burstedde, Donna Calhoun, Kyle Mandli, and Andy R. Terrel. 2014. ForestClaw: Hybrid forest-of-octrees AMR for hyperbolic conservation laws. In Parallel Computing: Accelerating Computational Science and Engineering (CSE) (Advances in Parallel Computing), Vol. 25. 253--262.
[10]
Carsten Burstedde, Lucas C. Wilcox, and Omar Ghattas. 2011. p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 3 (2011), 1103--1133.
[11]
Richard Courant, Kurt Friedrichs, and Hans Lewy. 1967. On the partial difference equations of mathematical physics. IBM journal 11, 2 (1967), 215--234.
[12]
Anshu Dubey, Ann Almgren, John Bell, Martin Berzins, Steve Brandt, Greg Bryan, Phillip Colella, Daniel Graves, Michael Lijewski, Frank Löffler, Brian O'Shea, Erik Schnetter, Brian Van Straalen, and Klaus Weide. 2014. A survey of high level frameworks in block-structured adaptive mesh refinement packages. J. Parallel and Distrib. Comput. 74, 12 (2014), 3217--3227. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
[13]
Bernd Einfeldt. 1988. On Godunov-type methods for gas dynamics. SIAM J. Numer. Anal. 25, 2 (1988), 294--318.
[14]
Percy Galvez, Jean-Paul Ampuero, Luis A. Dalguer, Surendra N. Somala, and Tarje Nissen-Meyer. 2014. Dynamic earthquake rupture modelled with an unstructured 3-D spectral element method applied to the 2011 M9 Tohoku earthquake. Geophysical Journal International 198, 2 (2014), 1222--1240.
[15]
David L. George. 2008. Augmented Riemann solvers for the shallow water equations over variable topography with steady states and inundation. J. Comput. Phys. 227, 6 (2008), 3089--3113.
[16]
Sven Harig, Chaeroni, Widodo S. Pranowo, and Jörn Behrens. 2008. Tsunami simulations on several scales. Ocean Dynamics 58, 5 (2008), 429--440.
[17]
Alexander Heinecke, Roman Karlstetter, Dirk Pflüger, and Hans-Joachim Bungartz. 2015. Data Mining on Vast Datasets as a Cluster System Benchmark. Concurrency and Computation: Practice and Experience 28, 7 (2015), 2145--2165.
[18]
Yuta Hirokawa, Taisuke Boku, Shunsuke A. Sato, and Kazuhiro Yabana. 2016. Electron Dynamics Simulation with Time-Dependent Density Functional Theory on Large Scale Symmetric Mode Xeon Phi Cluster. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1202--1211.
[19]
Alan Humphrey, Daniel Sunderland, Todd Harman, and Martin Berzins. 2016. Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1222--1231. http://www.sci.utah.edu/publications/Hum2016a/ipdps-pdsec16.pdf
[20]
James Jeffers and James Reinders. 2013. Intel Xeon Phi coprocessor high-performance programming. Newnes.
[21]
Randall J. LeVeque, David L. George, and Marsha J. Berger. 2011. Tsunami modelling with adaptively refined finite volume methods. Acta Numerica 20 (2011), 211--289.
[22]
Kyle T. Mandli and Clint N. Dawson. 2014. Adaptive mesh refinement for storm surge. Ocean Modelling 75 (2014), 36--50.
[23]
Oliver Meister. 2016. Sierpinski Curves for Parallel Adaptive Mesh Refinement in Finite Element and Finite Volume Methods. Dissertation. Institut für Informatik, Technische Universität München. https://mediatum.ub.tum.de/doc/1320149/1320149.pdf
[24]
Oliver Meister and Michael Bader. 2015. 2D adaptivity for 3D problems: Parallel SPE10 reservoir simulation on dynamically adaptive prism grids. Journal of Computational Science 9 (2015), 101--106.
[25]
Oliver Meister, Kaveh Rahnema, and Michael Bader. 2016. Parallel Memory-Efficient Adaptive Mesh Refinement on Structured Triangular Meshes with Billions of Grid Cells. ACM Transactions on Mathematical Software 43, 3 (2016), 19.
[26]
Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. 2013. Investigating Applications Portability with the Uintah DAG-based Runtime System on PetaScale Supercomputers. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, 96:1--96:12.
[27]
Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. 2013. Preliminary Experiences with the Uintah Framework on Intel Xeon Phi and Stampede. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery (XSEDE '13). ACM, 48:1--48:8.
[28]
William F. Mitchell. 1991. Adaptive refinement for arbitrary finite-element spaces with hierarchical bases. Journal of computational and applied mathematics 36, 1 (1991), 65--78.
[29]
William F. Mitchell. 2007. A Refinement-Tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids. J. Parallel and Distrib. Comput. 67, 4 (2007), 417--429.
[30]
Andreas Mueller, Michal Kopera, Simone Marras, Lucas Wilcox, Tobin Isaac, and Francis X. Giraldo. 2016. Strong scaling for numerical weather prediction at petascale with the atmospheric model NUMA. International Journal for High-Performance Computing Applications (2016).
[31]
Ali Pinar and Cevdet Aykanat. 2004. Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64, 8 (2004), 974--996.
[32]
Ali Pinar, E. Kartal Tabak, and Cevdet Aykanat. 2008. One-dimensional partitioning for heterogeneous systems: Theory and practice. J. Parallel and Distrib. Comput. 68, 11 (2008), 1473--1486.
[33]
Stephane Popinet. 2012. Adaptive modelling of long-distance wave propagation and fine-scale flooding during the Tohoku tsunami. Natural Hazards and Earth System Sciences 12 (2012), 1213--1227.
[34]
Sreeram Potluri, Devendar Bureddy, Khaled Hamidouche, Akshay Venkatesh, Krishna Kandalla, Hari Subramoni, and Dhabaleswar K. (Dk) Panda. 2013. MVAPICH-PRISM: A Proxy-based Communication Framework Using InfiniBand and SCIF for Intel MIC Clusters. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, Article 54, 11 pages.
[35]
Abtin Rahimian, Ilya Lashuk, Shravan Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros. 2010. Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures. In Supercomputing 2010. 1--11.
[36]
Sebastian Rettenberger, Oliver Meister, Michael Bader, and Alice-Agnes Gabriel. 2016. ASAGI -- A Parallel Server for Adaptive Geoinformation. In Proceedings of the Exascale Applications and Software Conference 2016 (EASC '16). ACM, 2:1--2:9. http://delivery.acm.org/10.1145/2940000/2938618/a2-Rettenberger.pdf
[37]
Martin Schreiber and Hans-Joachim Bungartz. 2014. Cluster-based communication and load balancing for simulations on dynamically adaptive grids. In Proceedings of the International Conference on Computational Science (ICCS'14) (Procedia Computer Science), Vol. 29. Elsevier, 2241--2253.
[38]
Jie Shen, Ana Lucia Varbanescu, Yutong Lu, Peng Zou, and Henk Sips. 2016. Workload Partitioning for Accelerating Applications on Heterogeneous Platforms. IEEE Transactions on Parallel and Distributed Systems 27, 9 (2016), 2766--2780.
[39]
Hari Sundar and Omar Ghattas. 2015. A Nested Partitioning Algorithm for Adaptive Meshes on Heterogeneous Clusters. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 319--328.
[40]
Kristof Unterweger, Roland Wittmann, Philipp Neumann, Tobias Weinzierl, and Hans-Joachim Bungartz. 2015. Integration of FULLSWOF2D and PeanoClaw: Adaptivity and Local Time-stepping for Complex Overland Flows. In Recent Trends in Computational Engineering -- CE2014 (Lecture Notes in Computational Science and Engineering), Vol. 105. Springer, 181--195.
[41]
Karthikeyan Vaidyanathan, Kiran Pamnany, Dhiraj D. Kalamkar, Alexander Heinecke, Mikhail Smelyanskiy, Jongsoo Park, Daehyun Kim, Aniruddha Shet G., Bharat Kaul, B'alint Jo'o, and Pradeep Dubey. 2014. Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters. In 28th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2014, Phoenix, AZ, USA, May 19-23, 2014.
[42]
Mohamed Wahib, Naoya Maruyama, and Takayuki Aoki. 2016. Daino: A High-level Framework for Parallel and Efficient AMR on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, 53:1--53:12. http://dl.acm.org/citation.cfm?id=3014904.3014975
[43]
Tobias Weinzierl, Michael Bader, Kristof Unterweger, and Roland Wittmann. 2014. Block Fusion on Dynamically Adaptive Spacetree Grids for Shallow Water Waves. Parallel Processing Letters 24, 3 (2014), 1441006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference
June 2017
136 pages
ISBN:9781450350624
DOI:10.1145/3093172
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2017

Check for updates

Author Tags

  1. Parallel adaptive mesh refinement
  2. Xeon Phi coprocessor
  3. load balancing on heterogeneous systems
  4. patch-based adaptivity
  5. tsunami simulation
  6. vectorisation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PASC '17
Sponsor:

Acceptance Rates

PASC '17 Paper Acceptance Rate 13 of 33 submissions, 39%;
Overall Acceptance Rate 109 of 221 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)14
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media