research-article

GSWO

Authors:

Po Yang,

Zhikun DengAuthors Info & Claims

Image Communication, Volume 47, Issue C

Pages 332 - 345

https://doi.org/10.1016/j.image.2016.05.003

Published: 01 September 2016 Publication History

Abstract

Sliding Window Operations (SWOs) are widely used in image processing applications. They often have to be performed repeatedly across the target image, which can demand significant computing resources when processing large images with large windows. In applications in which real-time performance is essential, running these filters on a CPU often fails to deliver results within an acceptable timeframe. The emergence of sophisticated graphic processing units (GPUs) presents an opportunity to address this challenge. However, GPU programming requires a steep learning curve and is error-prone for novices, so the availability of a tool that can produce a GPU implementation automatically from the original CPU source code can provide an attractive means by which the GPU power can be harnessed effectively. This paper presents a GPU-enabled programming model, called GSWO, which can assist GPU novices by converting their SWO-based image processing applications from the original C/C++ source code to CUDA code in a highly automated manner. This model includes a new set of simple SWO pragmas to generate GPU kernels and to support effective GPU memory management. We have implemented this programming model based on a CPU-to-GPU translator (C2GPU). Evaluations have been performed on a number of typical SWO image filters and applications. The experimental results show that the GSWO model is capable of efficiently accelerating these applications, with improved applicability and a speed-up of performance compared to several leading CPU-to-GPU source-to-source translators. ź A programming model is presented for automated CPU-to-GPU translation of SWO image processing.New easy-to-use pragmas are applicable to diversely parallelizable operations in SWO.Memory management hierarchy for effective memory creation and data transfer between CPU and GPU.A thorough performance evaluation of the model using benchmarks and practical applications.Results show performance gains and improved applicability and usability in state-of-the-art.

References

[1]

Y.H. Chen, S.J. Horng, R.S. Run, J.L. Lai, R.J. Chen, W.C. Chen, Y. Pan, T. Takao, A scan-based configurable, programmable, and scalable architecture for sliding window-based operations, IEEE Trans. Comput., 48 (1999) 615-627.

Abstract

References

Cited By

Recommendations

OpenMP to GPGPU: a compiler framework for automatic translation and optimization

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

An efficient parallel collaborative filtering algorithm on multi-GPU platform

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations