As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The aim of this paper is to provide a strategy for overcoming the limits of codes employing the FFTW library by implementing a more powerful parallel domain decomposition algorithm and by refining the auto-tuning mechanism that is already implemented in this library. In the first part of this paper we identify some of the major performance bottlenecks present in the current FFTW implementation, in particular the auto-tuning mechanism provided in FFTW. To do this we have tested for the first time on a Blue Gene/Q system a 2D Parallel Domain Decomposition algorithm provided by the 2DECOMP&FFT library. We found that on massively parallel supercomputers such as Blue Gene/Q clusters the performance of this new algorithm is significantly higher. To demonstrate the benefits of the algorithm in a real application we included the library in a CFD code, BlowupNS, where we found a marked improvement in parallel scalability.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.