计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 31-39.doi: 10.11896/jsjkx.190900179
龚彤艳1,2,张广婷2,贾海鹏2,袁良2
GONG Tong-yan1,2,ZHANG Guang-ting2,JIA Hai-peng2,YUAN Liang2
摘要: 快速傅里叶变换(Fast Fourier Transform,FFT)是最重要的基础算法之一,在科学计算、信号处理、图像处理等领域都有着广泛的应用。随着这些应用领域对实时性需求的进一步提高,FFT算法面临着越来越高的性能要求。在现有的FFT算法库中,FFT算法的求解速度和计算精度受到一定程度的限制,而且也少有研究者对偶数基Cooley-Tukey FFT的高性能实现提出相应的优化策略并对技术进行深入研究。基于此,文中提出了一套针对偶数基的Cooley-Tukey FFT的优化策略和方法。首先构建一个SIMD(Single Instruction Multiple Data)友好、支持混合基的蝶形网络,然后根据偶数基旋转因子特性最大限度地降低蝶形计算的复杂度,接着通过SIMD汇编优化、汇编指令重排及选择、寄存器分配策略制定、高性能矩阵转置算法等方法来优化应用,最后实现一个高性能的FFT算法库。目前,最流行、应用最广的FFT有FFTW和Intel MKL。实验结果表明,在X86计算平台上,新提出的这套针对偶数基Cooley-Tukey FFT的技术所实现的FFT算法库的性能全面优于MKL和FFTW。所提出的这套高性能算法优化和实现技术体系,可推广到除偶数基以外的其他基的实现和优化上,为进一步的研究开发工作奠定一定的基础,进而突破FFT算法在硬件平台上的性能瓶颈,实现一套针对特定平台的高性能FFT算法库。
中图分类号:
[1]COOLEY J W,TUKEY J W.An algorithm for the machine calculation of complex Fourier series[J].Mathematics of Computation,1965,19(90):297-301. [2]COCHRAN W T,COOLEY J W,FAVIN D L,et al.What is the fast Fourier transform?[J].Proceedings of the IEEE,1967,55(10):1664-1674. [3]DUHAMEL P,VETTERLI M.Fast Fourier transforms:a tutorial review and a state of the art[J].Signal Processing,1990,19(4):259-299. [4]STRANG,GILBERT.“Wavelets”[J].American Scientist, 1994,2(3):250-255. [5]KENT R D,READ C.Acoustic Analysis of Speech[M].Singular Publishing Group,2002. [6]DONGARRA J,SULLIVAN F.Guest editors' introduction:The top 10 algorithms[J].Computing in Science & Engineering,2000,2(1):22-23. [7]FRIGO M,JOHNSON S G.FFTW:An adaptive software architecture for the FFT[C]∥Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,1998,3:1381-1384. [8]Intel® Math Kernel Library Release Notes and New Features .(2017-09-10) .https://software.intel.com/en-us/articles/intel-math-kernel-library-release-notes-and-new-features. [9]Developer Reference for Intel® Math Kernel Library - C .(2019-09-10) .https://software.intel.com/zhcn/download/developer-reference-for-intel-math-kernel-libra-ry-c. [10]FRIGO M,JOHNSON S G.The Design and Implementation of FFTW3∥Proceedings of the IEEE.2005,216-231. [11]JOHNSON S G,FRIGO M.Implementing FFTs in practice.https://scholar.google.com/scholar?q=Implementing+FFTs+in+practice&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart. [12]MOHAMED K,ALI F H H M,ARIFFIN S,et al.An Improved AES S-box Based on Fibonacci Numbers and Prime Factor.IJ Network Security,2018,20(6):1206-1214. [13]JOHNSON S G,FRIGO M.A modified split-radix FFT with fewer arithmetic operations.Signal Processing,2007,55(1):111-119. [14]DUHAMEL P,HOLLMANN H.Split radix'FFT algorithm.Electronics letters,1984,20(1):14-16. [15]BADAR S,DANDEKAR D R.High speed FFT processor design using radix? 4 pipelined architecture[C]∥2015 International Conference on Industrial Instrumentation and Control (ICIC).IEEE,2015:1050-1055. [16]NUGTEREN C.Clblast:A tuned opencl BLAS library[C]∥Proceedings of the International Workshop on OpenCL.ACM,2018:5. [17]NAKATA M.Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK[M]∥The Art of High Performance Computing for Computational Science. Singapore:Springer,2019:83-112. [18]BUJANOVIC' Z,DRMACˇ Z.New robust ScaLAPACK routine for computing the QR factorization with column pivoting.arXiv:1910.05623,2019. [19]MULTIPLICATION M,SICARD-RAMÍREZ A.Intel R G Math Kernel Library.Universidad EAFIT,2018,2018:3-21. [20]INTEL R.Math kernel library[J/OL].https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&as_yhi=2009&q=Intel+Math+Kernel+Library&btnG=. [21]FOG A. Intel’s “cripple AMD” function.(2009-12-30). https://www.agner.org/optimize/blog/read.php?i=49#49. [22]YANG M R.MKL has bad performances on an AMD cpu[EB/QL].(2019-02-02).https://sites.google.com/a/uci.edu/mi ngru-yang/programming/mkl-has-bad-performance-on-an-amd-cpu. [23]The NVIDIA CUDA Fast Fourier Transform library [EB/OL].(2019-08-14) .https://developer.nvidia.com/cufft. [24]ALGLIB User Guide online [EB/OL].(2019-02-21) .http://www.alglib.net/fasttransforms/fft.php. [25]EISL J,GRIMMER M,SIMON D,et al.Trace-based Register Allocation in a JIT Compiler[C]∥Proceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform:Virtual Machines,Languages,and Tools.ACM,2016:14. [26]EISL J,LEOPOLDSEDER D,MÖSSENBÖCK H.Parallel trace register allocation[C]∥Conference on Managed Languages & Runtimes (ManLang’18).2018. |
[1] | 余敦辉, 成涛, 袁旭. 基于排序学习的软件众包任务推荐算法 Software Crowdsourcing Task Recommendation Algorithm Based on Learning to Rank 计算机科学, 2020, 47(12): 106-113. https://doi.org/10.11896/jsjkx.200300107 |
[2] | 王栋, 商红慧, 张云泉, 李琨, 贺新福, 贾丽霞. 原子动力学蒙特卡洛程序MISA-KMC在反应堆压力容器钢辐照损伤研究中的应用 Application of Atomic Dynamics Monte Carlo Program MISA-KMC in Study of Irradiation Damage of Reactor Pressure Vessel Steel 计算机科学, 2020, 47(4): 30-35. https://doi.org/10.11896/jsjkx.191100045 |
[3] | 曲广强, 孙斌. 基于区块链的实验教学经费可信任回溯机制研究 Study on Trustworthy Backtracking Mechanism of Experimental Teaching Fund Based on Blockchain 计算机科学, 2019, 46(11A): 553-556. |
[4] | 侯禹臣, 吴伟. 静态图像行为标注众包系统的设计与实现 Design and Implementation of Crowdsourcing System for Still Image Activity Annotation 计算机科学, 2019, 46(11A): 580-583. |
[5] | 吕小虎, 韩笑冬, 宫江雷, 王志杰, 刘小鲲. 基于系统多维要素的安全关键软件验证方法 Systemic Muti-factors Based Verification Method for Safety-critical Software 计算机科学, 2019, 46(9): 156-161. https://doi.org/10.11896/j.issn.1002-137X.2019.09.022 |
[6] | 姚庆, 郑凯, 刘垚, 王肃, 孙军, 徐梦轩. SOM算法在申威众核上的实现和优化 Implementation and Optimization of SOM Algorithm on Sunway Many-core Processors 计算机科学, 2018, 45(11A): 591-596. |
[7] | 刘正浩, 张广泉. 基于WSAN的物联网软件分布式知识框架及其实现方法 Distributed Knowledge Framework of IOT Software Based on WSAN and Its Implementation Strategy 计算机科学, 2019, 46(1): 148-154. https://doi.org/10.11896/j.issn.1002-137X.2019.01.023 |
[8] | 郝俊生,李冰锋,陈曦,高文娟. 基于Android平台的高校网络订餐系统的设计与实现 Design and Implementation of Network Subscription System Based on Android Platform 计算机科学, 2018, 45(6A): 591-594. |
|