skip to main content
10.1145/3592979.3593417acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article

Cornerstone: Octree Construction Algorithms for Scalable Particle Simulations

Published: 26 June 2023 Publication History

Abstract

This paper presents an octree construction method, called Cornerstone, that facilitates global domain decomposition and interactions between particles in mesh-free numerical simulations. Our method is based on algorithms developed for 3D computer graphics, which we extend to distributed high performance computing (HPC) systems. Cornerstone yields global and locally essential octrees and is able to operate on all levels of tree hierarchies in parallel. The resulting octrees are suitable for supporting the computation of various kinds of short and long range interactions in N-body methods, such as Barnes-Hut and the Fast Multipole Method (FMM). While we provide a CPU implementation, Cornerstone may run entirely on GPUs. This results in significantly faster tree construction compared to execution on CPUs and serves as a powerful building block for the design of simulation codes that move beyond an offloading approach, where only numerically intensive tasks are dispatched to GPUs. With data residing exclusively in GPU memory, Cornerstone eliminates data movements between CPUs and GPUs. As an example, we employ Cornerstone to generate locally essential octrees for a Barnes-Hut treecode running on almost the full LUMI-G system with up to 8 trillion particles.

References

[1]
C Apetrei. 2014. Fast and Simple Agglomerative LBVH Construction. In Computer Graphics and Visual Computing (CGVC). The Eurographics Association.
[2]
J Barnes and P Hut. 1986. A hierarchical O (N log N) force-calculation algorithm. Nature 324, 6096 (1986), 446--449.
[3]
Joshua E Barnes. 1990. A modified tree code: Don't laugh; it runs. J. Comput. Phys. 87, 1 (1990), 161--170.
[4]
Jeroen Bedorf, Evghenii Gaburov, Michiko S. Fujii, Keigo Nitadori, Tomoaki Ishiyama, and Simon Portegies Zwart. 2014. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In SC14: International conference for high performance computing, networking, storage and analysis. 54--65. International Conference on High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, Nov 16--21, 2014.
[5]
Jeroen Bedorf, Evghenii Gaburov, and Simon Portegies Zwart. 2012. A sparse octree gravitational N-body code that runs entirely on the GPU processor. J Comput. Phys. 231, 7 (Apr 1 2012), 2825--2839.
[6]
N Bell and J Hoberock. 2022. Thrust 1.17. https://github.com/nvidia/thrust.
[7]
D Blackston and T Suel. 1997. Highly portable and efficient implementations of parallel adaptive n-body methods. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing. 1--20.
[8]
Martin Burtscher and Keshav Pingali. 2011. An efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In GPU computing Gems Emerald edition. Elsevier, 75--92.
[9]
RA Gingold and JJ Monaghan. 1977. Smoothed particle hydrodynamics: theory and application to non-spherical stars. MNRAS 181, 3 (1977), 375--389.
[10]
L Greengard and V Rokhlin. 1987. A fast algorithm for particle simulations. J. Comput. Phys. 73, 2 (1987), 325--348.
[11]
Herman Haverkort. 2011. An inventory of three-dimensional Hilbert space-filling curves. arXiv preprint arXiv:1109.2323 (2011).
[12]
Lars Hernquist. 1987. Performance characteristics of tree codes. The Astrophysical Journal Supplement Series 64 (1987), 715--734.
[13]
David Hilbert. 1891. Ueber die stetige Abbildung einer Line auf ein Flächenstuck. Math. Ann. 38 (1891), 459--460.
[14]
T Karras. 2012. Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In Proceedings of the Fourth ACM SIGGRAPH/Eurographics conference on High-Performance Graphics. 33--37.
[15]
Steffen R Knollmann and Alexander Knebe. 2009. AHF: Amiga's halo finder. The Astrophysical Journal Supplement Series 182, 2 (2009), 608.
[16]
LB Lucy. 1977. A numerical approach to the testing of the fission hypothesis. AJ 82 (1977), 1013--1024.
[17]
Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966).
[18]
G Peano. 1890. Sur une courbe, qui remplit toute une aire plane. Math. Ann. 36 (1890), 157--160.
[19]
Albino Perego, Emanuel Gafton, Rubén Cabezón, Stephan Rosswog, and Matthias Liebendörfer. 2014. MODA: a new algorithm to compute optical depths in multidimensional hydrodynamic simulations. aap 568, Article A11 (Aug. 2014), A11 pages. arXiv:1403.1297 [astroph.HE]
[20]
Douglas Potter, Joachim Stadel, and Romain Teyssier. 2017. PKDGRAV3: beyond trillion particle cosmological simulations for the next era of galaxy surveys. Computational Astrophysics and Cosmology 4, 1 (2017), 1--13.
[21]
DS Shamshirgar, R Yokota, AK Tornberg, and B Hess. 2019. Regularizing the fast multipole method for use in molecular simulation. The Journal of Chemical Physics 151, 23 (2019), 234113. arXiv:https://doi.org/10.1063/1.5122859
[22]
Robert Speck, Lukas Arnold, and Paul Gibbon. 2011. Towards a petascale tree code: Scaling and efficiency of the PEPC library. Journal of Computational Science 2, 2 (2011), 138--143.
[23]
MS Warren and JK Salmon. 1993. A Parallel Hashed Oct-Tree N-Body Algorithm. In Supercomputing '93, Proceedings (Supercomputing Proceedings). 12--21. Supercomputing 93 Conference, Portland, OR, Nov 15--19, 1993.
[24]
Michael S Warren and John K Salmon. 1992. Astrophysical N-body simulations using hierarchical tree data structures. SC 92 (1992), 570--576.
[25]
Rio Yokota. 2013. An FMM based on dual tree traversal for many-core architectures. Journal of Algorithms & Computational Technology 7, 3 (2013), 301--324.

Cited By

View all
  • (2024)SIMCoV-GPU: Accelerating an Agent-Based Model for ExascaleProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658692(322-333)Online publication date: 3-Jun-2024
  • (2024)Increasing Energy Efficiency of Astrophysics Simulations Through GPU Frequency ScalingSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00229(1826-1834)Online publication date: 17-Nov-2024
  • (2024)Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard ParallelismSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00099(708-717)Online publication date: 17-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PASC '23: Proceedings of the Platform for Advanced Scientific Computing Conference
June 2023
274 pages
ISBN:9798400701900
DOI:10.1145/3592979
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

PASC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 109 of 221 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)15
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SIMCoV-GPU: Accelerating an Agent-Based Model for ExascaleProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658692(322-333)Online publication date: 3-Jun-2024
  • (2024)Increasing Energy Efficiency of Astrophysics Simulations Through GPU Frequency ScalingSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00229(1826-1834)Online publication date: 17-Nov-2024
  • (2024)Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard ParallelismSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00099(708-717)Online publication date: 17-Nov-2024
  • (2024)Alternative Quadrant Representations with Morton Index and AVX2 Vectorization for AMR Algorithms within the p4est Software Library2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00071(301-310)Online publication date: 27-May-2024
  • (2023)Accurate Measurement of Application-level Energy Consumption for Energy-Aware Large-Scale SimulationsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624272(1881-1884)Online publication date: 12-Nov-2023
  • (2023)Optimizing the gravitational tree algorithm for many-core processorsMonthly Notices of the Royal Astronomical Society10.1093/mnras/stad4001528:1(821-832)Online publication date: 29-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media