Computer Science ›› 2017, Vol. 44 ›› Issue (10): 64-70.doi: 10.11896/j.issn.1002-137X.2017.10.012

Previous Articles     Next Articles

Porting and Optimizing OpenFOAM on Sunway TaihuLight System

MENG De-long, WEN Min-hua, WEI Jian-wen and James LIN   

  • Online:2018-12-01 Published:2018-12-01

Abstract: The Sunway TaihuLight supercomputer based on the Chinese-designed many-core processors is the world’s fastest system with a peak performance of 125.4 PFlops.OpenFOAM (open source field operation and manipulation) is one of the most popular open source computational fluid dynamics (CFD) software which is written in C++ and not fully compatible with compilers on the heterogeneous many-core processor SW26010.This paper ported OpenFOAM based on SW26010’s MPE(management processing element)/CPE (computing processing element) cluster architecture.To overcome the compilation incompatibility problem,we adopted the mixed-language application design.We also applied several SW26010’s feature-specific optimizations on the hotspot of OpenFOAM to deliver high performance,such as the register communication,vectorization,and double buffering.The experiments on SW26010 using real datasets show that the single-CG (core group) code runs 8.03x faster than the well-tuned version on the MPE,and the performance of single-CG is 1.18x higher than the serial implementation of Intel(R) Xeon(R) CPU E5-2695 v3.We also optimized the parallel implementation of OpenFOAM and yielded speedups of 184.9x on 256 CGs.The porting methods and optimizations presented can also be referenced for other complex C++ programs to achieve high performance on SW26010.

Key words: CFD,OpenFOAM,Heterogeneous many-core processor,Sunway supercomputer

[1] ANDERSON J D,WENDT J.Computational fluid dynamics[M].New York:McGraw-Hill,1995.
[2] ALONAZI A A.Design and optimization of openfoam-basedCFD applications for modern hybrid and heterogeneous HPC platforms[D].King Abdullah University of Science and Technology,2014.
[3] WELLER H G,TABOR G,JASAK H,et al.A tensorial approach to computational continuum mechanics using object-oriented techniques[J].Computers in Physics,1998,12(6):620-631.
[4] DONGARRA J.Report on the Sunway TaihuLight System.http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-report-2016.pdf.
[5] FU H,LIAO J,YANG J,et al.The Sunway TaihuLight supercomputer:system and applications[J].Science China Information Sciences,2016,59(7):072001.
[6] ZHENG F,ZHANG K,WU G M,et al.Architecture Techni-ques of Many-Core Processor for Energy-Efficient in High Performance Computing[J].Chinese Journal of Computers,2014,7(10):2176-2186.(in Chinese) 郑方,张昆,邬贵明,等.面向高性能计算的众核处理器结构级高能效技术[J].计算机学报,2014,37(10):2176-2186.
[7] BELL N,GARLAND M.Implementing sparse matrix-vectormultiplication on throughput-oriented processors[C]∥Procee-dings of the Conference on High Performance Computing Networking,Storage and Analysis.ACM,2009:18.
[8] HARRIS M.Optimizing parallel reduction in CUDA[J].NVIDIA Developer Technology,2007,2(4):511-519.
[9] KLCKNER A.Iterative CUDA .http://mathema.tician.de/software/iterative-cuda.
[10] THIBAULT J C,SENOCAK I.CUDA implementation of aNavier-Stokes solver on multi-GPU desktop platforms for incompressible flows[C]∥Proceedings of the 47th AIAA Aerospace Sciences Meeting.2009:1-15.
[11] TLKE J.Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA[J].Computing and Visualization in Science,2010,13(1):29-39.
[12] KRAWEZIK G P,POOLE G.Accelerating the ANSYS direct sparse solver with GPUs[C]∥Proc.Symposium on Application Accelerators in High Performance Computing (SAAHPC).NCSA,Urbana-Champaign,2009.
[13] COMBEST D P,DAY J.Cufflink:a library for linking numerical methods based on cuda c/c++ with openfoam[J/OL].http://cufflink-library.googlecode.com.
[14] YING Z.Research on Acceleration of Openfoam Based on GPU[D].Shanghai:Shanghai Jiao Tong University,2012.(in Chinese) 应智.基于 GPU 的 OpenFOAM 并行加速研究[D].上海:上海交通大学,2012.
[15] HE X,ZHOU M Z,LIU X.Design and Implementation of Multi-level Heterogenous Parallel Algorithm of 3D Acoustic Wave Equation Forwarded[J].Computer Applications and Software,2014,1(1):264-267.(in Chinese) 何香,周明忠,刘鑫.三维声波方程正演多级异构并行算法设计与实现[J].计算机应用与软件,2014,31(1):264-267.
[16] XU J C,GUO S Z,HUANG Y Z,et al.Access Optimization Technique for Mathematical Library of Slave Processors on He-terogeneous Many-core Architectures[J].Computer Science,2014,1(6):12-17.(in Chinese) 许瑾晨,郭绍忠,黄永忠,等.面向异构众核从核的数学函数库访存优化方法[J].计算机科学,2014,41(6):12-17.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!