skip to main content
research-article

GSWO

Published: 01 September 2016 Publication History

Abstract

Sliding Window Operations (SWOs) are widely used in image processing applications. They often have to be performed repeatedly across the target image, which can demand significant computing resources when processing large images with large windows. In applications in which real-time performance is essential, running these filters on a CPU often fails to deliver results within an acceptable timeframe. The emergence of sophisticated graphic processing units (GPUs) presents an opportunity to address this challenge. However, GPU programming requires a steep learning curve and is error-prone for novices, so the availability of a tool that can produce a GPU implementation automatically from the original CPU source code can provide an attractive means by which the GPU power can be harnessed effectively. This paper presents a GPU-enabled programming model, called GSWO, which can assist GPU novices by converting their SWO-based image processing applications from the original C/C++ source code to CUDA code in a highly automated manner. This model includes a new set of simple SWO pragmas to generate GPU kernels and to support effective GPU memory management. We have implemented this programming model based on a CPU-to-GPU translator (C2GPU). Evaluations have been performed on a number of typical SWO image filters and applications. The experimental results show that the GSWO model is capable of efficiently accelerating these applications, with improved applicability and a speed-up of performance compared to several leading CPU-to-GPU source-to-source translators. ź A programming model is presented for automated CPU-to-GPU translation of SWO image processing.New easy-to-use pragmas are applicable to diversely parallelizable operations in SWO.Memory management hierarchy for effective memory creation and data transfer between CPU and GPU.A thorough performance evaluation of the model using benchmarks and practical applications.Results show performance gains and improved applicability and usability in state-of-the-art.

References

[1]
Y.H. Chen, S.J. Horng, R.S. Run, J.L. Lai, R.J. Chen, W.C. Chen, Y. Pan, T. Takao, A scan-based configurable, programmable, and scalable architecture for sliding window-based operations, IEEE Trans. Comput., 48 (1999) 615-627.
[2]
P. Shivakumara, G.H. Kumar, D.S. Guru, P. Nagabhushan, Sliding window based approach for document image mosaicing, J. Image Vision. Comput., 24 (2006) 94-100.
[3]
Y.R. Wang, W.H. Lin, S.J. Horng, A sliding window technique for efficient license plate localization based on discrete wavelet transform, Expert Syst. Appl., 38 (2011) 3142-3146.
[4]
X. Xu, E.L. Miller, D. Chen, M. Sarhadi, Adaptive two-pass rank order filter to remove impulse noise in highly corrupted images, IEEE Trans. Image Process., 13 (2004) 238-247.
[5]
Y. Nie, K.E. Barner, Fuzzy rank LUM filters, IEEE Trans. Image Process., 15 (2006) 3636-3654.
[6]
P. Soille, H. Talbot, Directional morphological filtering, IEEE Trans. Pattern Anal. Mach. Learn., 23 (2001) 1213-1329.
[7]
J. Gil, R. Kimmel, Efficient dilation, erosion, opening and closing algorithms, IEEE Trans. Pattern Anal. Mach. Learn., 24 (2002) 1606-1617.
[8]
M.J. Thurley, V. Danell, Fast morphological image processing open-source extensions for GPU processing with CUDA, IEEE J. Sel. Top. Signal Process., 6 (2012) 849-856.
[9]
C.H. Wu, S.J. Horng, Fast and scalable selection algorithms with applications to median filtering, IEEE Trans. Parallel Distrib. Syst., 14 (2003) 983-992.
[10]
M.M. Bronstein, Lazy sliding window implementation of the bilateral filter on parallel architectures, IEEE Trans. Image Process., 20 (2011) 1751-1757.
[11]
P.N. Happ, R.Q. Feitosa, C. Bentes, R. Farias, A region-growing segmentation algorithm for GPUs, IEEE Geosci. Remote Sens. Lett., 10 (2013) 1612-1617.
[12]
W.J. Dally, The GPU computing era, IEEE Micro, 30 (2010) 56-69.
[13]
OpenCV_gpu, Opencv_2.4.8. Available Online: {http://docs.opencv.org/modules/gpu/doc/introduction.html}, Dec 2013.
[14]
GpuCV, Gpucv: GPU-accelerated Computer Vision. Available Online: {http://picoforge.int-evry.fr/cgi-bin/twiki/view/Gpucv/Web/}, Oct 2010.
[15]
A. Reiner, Etothepi CUDA-Image-Processing. Available Online: {https://github.com/etotheipi/CUDA-Image-Processing}, Sep 2010.
[16]
M.G. Sanchez, V. Vidal, J. Bataller, J. Arnal, A parallel method for impulsive image noise removal on hybrid CPU/GPU systems, Procedia Comput. Sci., 18 (2013) 2504-2507.
[17]
GPSME, A General Toolkit for "GPUtilisation" in SME Applications, Available Online: {www.gp-sme.co.uk}, Oct 2013.
[18]
R.C. Hardie, C. Boncelet, LUM filters: a class of rank-order-based filters for smoothing and sharpening, IEEE Trans. Signal Process., 41 (1993) 1061-1076.
[19]
D.B. Min, J.B. Lu, M.N. Do, Depth video enhancement based on weighted mode filtering, IEEE Trans. Image Process., 21 (2012) 1176-1190.
[20]
R. Oten, D. Figueiredo, J.P. Rui, Adaptive alpha-trimmed mean filters under deviations from assumed noise model, IEEE Trans. Image Process., 13 (2004) 627-639.
[21]
Y.M.Y. Hasan, L.J. Karam, Morphological text extraction from images, IEEE Trans. Image Process., 9 (2000) 1978-1983.
[22]
S. Chen, R.M. Haralick, Recursive erosion, dilation, opening and closing transforms, IEEE Trans. Image Process., 4 (1995) 335-345.
[23]
C. Nugteren, H. Corporaal, Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons, in: GPGPU-5: 5th Workshop on General Purpose Processing on Graphics Processing Units, ACM, 2012.
[24]
J. Enmyren, C.K. Kessler, SkePU: a multi-backend skeleton programming library for multi-GPU systems, in: Proceedings of 4th International Workshop on High-Level Parallel Programming and Applications (HLPP-2010), Baltimore, Maryland, USA, ACM, 2010, pp. 5-14.
[25]
M.M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, A compiler framework for optimization of affine loop nests for GPGPUs, in: Proceedings of International Conference on Supercomputing, NewYork, USA, ACM, 2008, pp. 225-234.
[26]
A. Leung, N. Vasilache, B. Meister, M. Baskaran, D. Wohlford, C. Bastoul, R. Lethin, A mapping path for multi-gpgpu accelerated computers from a portable high level programming abstraction, in: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU'10), New York, NY, USA, ACM, 2010, pp. 51-61.
[27]
HPC Project, Par4all Automatic Parallelization. Available Online: {http://www.par4all.org}, Oct 2011.
[28]
D. Unat, X. Cai, and S. B. Baden. Mint: realizing CUDA performance in 3D stencil methods with annotated C, in: ICS'11: International Conference on Supercomputing, New York, NY, USA, ACM, 2011, pp. 214-224.
[29]
S.Z., Ueng, M. Lathara, S.S. Baghsorkhi and W. W. Hwu, "CUDA-lite: Reducing GPU Programming Complexity, in: Proceedings of International Workshop on Languages and Compilers for Parallel Computing, Berlin, Heidelberg, Springer, 2008, pp. 1-15.
[30]
T. Han, T. Abdelrahman, hiCUDA: high-level GPGPU rogramming, IEEE Trans. Parallel Distrib. Syst., 22 (2011) 78-90.
[31]
The OpenACC Standard, The OpenACC¿ Application Programming Interface. {http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf}, November 2011.
[32]
The Portland Group, PGI Fortran and C Accelerator Programming Model. Available Online: {http://www.pgroup.com/lit/whitepapers/pgi_accel_prog_model_1.0.pdf}, June 2009.
[33]
L.-N. Pouchet, PolyBench: The Polyhedral Benchmark Suite. Available Online: {http://www.cse.ohio-state.edu/~pouchet/software/polybench/}, Nov 2011.
[34]
G.J. Bloy, Blind camera fingerprinting and image clustering, IEEE Trans. Pattern Anal. Mach. Intell., 30 (2008) 532-534.
[35]
V. Conotter, G. Boato, Analysis of sensor fingerprint for source camera identification, Electron. Lett., 47 (2011) 1366-1367.
[36]
IME, Image Forgery Detection, Ltd. Available Online: {http://www.imagemetry.com/}, Nov 2012.
[37]
IMPACT, Improving Access to Text, EU FP7 project. {http://www.impact-project.eu}, Dec 2011.
[38]
P. Yang, A. Antonacopoulos, C. Clausner, S. Pletschacher, Grid-based modelling and correction of arbitrarily warped historical document images for large-scale digitization, in: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, HIP'11, ACM, Beijing, China, Sep 2011, pp. 106-111.
[39]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Tucson, Arizona, Jun 2008.
[40]
Pluto, A polyhedral automatic parallelizer and locality optimizer for multicores. Available Online: {http://pluto-compiler.sourceforge.net}, Dec 2013.

Cited By

View all
  • (2017)Automatic Nucleus Detection of Pap Smear Images using Stacked Sparse Autoencoder (SSAE)Proceedings of the 1st International Conference on Algorithms, Computing and Systems10.1145/3127942.3127946(9-13)Online publication date: 10-Aug-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Image Communication
Image Communication  Volume 47, Issue C
September 2016
556 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 September 2016

Author Tags

  1. Automatic translation
  2. CUDA
  3. Parallel computing
  4. Sliding window operation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Automatic Nucleus Detection of Pap Smear Images using Stacked Sparse Autoencoder (SSAE)Proceedings of the 1st International Conference on Algorithms, Computing and Systems10.1145/3127942.3127946(9-13)Online publication date: 10-Aug-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media