skip to main content
10.1145/2485922.2485925acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Convolution engine: balancing efficiency & flexibility in specialized computing

Published: 23 June 2013 Publication History

Abstract

This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications.
We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.

References

[1]
Digic Processors, Canon Inc. http://learn.usa.canon.com/resources-/articles/2012/digic_processors.htmlp.
[2]
Omap 5 platform, texas instruments, www.ti.com/omap.
[3]
Snapdragon Processors, Qualcomm Inc. http://www.qualcomm.com/snapdragon/processors.
[4]
Tegra processors. NVIDIA Corporation.
[5]
A. Adams, D. Jacobs, J. Dolson, M. Tico, K. Pulli, E. Talvala, B. Ajdin, D. Vaquero, H. Lensch, M. Horowitz, et al. The frankencamera: an experimental platform for computational photography. ACM Transactions on Graphics (TOG), 2010.
[6]
A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software, 2009.
[7]
J. Balfour, W. Dally, D. Black-Schaffer, V. Parikh, and J. Park. An energy-efficient processor architecture for embedded systems. Computer Architecture Letters, 7(1):29--32, 2007.
[8]
H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Computer Vision--ECCV 2006, pages 404--417, 2006.
[9]
B. Bayer. Color imaging array, 1976. US Patent 3,971,065.
[10]
J. D. Brown. The ibm power edge of network processor. In The Technical Record of the 22nd Hot Chips Conference, Aug. 2010.
[11]
T. C. Chen. Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):673--688, 2006.
[12]
Y. Cheng, K. Xie, Y. Zhou, and Y. Liu. An adaptive color plane interpolation method based on edge detection. Journal of Electronics (China), 2007.
[13]
J. Cong, V. Sarkar, G. Reinman, and A. Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15, Mar. 2011.
[14]
N. Corporation. Expeed Digital Image Processors. Nikon Corporation., http://imaging.nikon.com/lineup/microsite/d300.
[15]
S. Corporation. BIONZ Image Processing Engine. Sony Corporation., http://www.sony-mea.com/microsite/dslr/10/tech/bionz.html.
[16]
P. Debevec, E. Reinhard, G. Ward, and S. Pattanaik. High dynamic range imaging. In ACM SIGGRAPH 2004 Course Notes, page 14. ACM, 2004.
[17]
R. Golla and P. Jordan. T4: A highly threaded server-on-a-chip with native support for heterogeneous computing. In The Technical Record of the 23rd Hot Chips Conference, Aug. 2011.
[18]
R. Gonzalez. Xtensa: a configurable and extensible processor. Micro, IEEE, 20(2):60--70, Mar/Apr 2000.
[19]
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. Dyser: Unifying functionality and parallelism specialization for energy-efficient computing. Micro, IEEE, 2012.
[20]
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz. Understanding Sources of Inefficiency in General-Purpose Chips. In ISCA '10: Proc. 37th Annual International Symposium on Computer Architecture. ACM, 2010.
[21]
J. Leng, S. Gilani, T. Hetherington, A. ElTantawy, N. S. Kim, T. M. Aamodt, and V. J. Reddi. Gpuwattch: Enabling energy optimizations in gpgpus. In ISCA 2013: International Symposium on Computer Architecture, 2013.
[22]
D. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.
[23]
Y. Matsushita, E. Ofek, X. Tang, and H. Shum. Full-frame video stabilization. In Computer Vision and Pattern Recognition (CVPR), 2005. IEEE Computer Society Conference on.
[24]
G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama. Digital photography with flash and no-flash image pairs. In ACM Transactions on Graphics (TOG).
[25]
R. Raskar. Computational photography. In Computational Optical Sensing and Imaging. Optical Society of America, 2009.
[26]
O. Shacham, S. Galal, S. Sankaranarayanan, M. Wachs, J. Brunhaver, A. Vassiliev, M. Horowitz, A. Danowitz, W. Qadeer, and S. Richardson. Avoiding game over: Bringing design to the next level. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, june 2012.
[27]
A. Solomatnikov, A. Firoozshahian, W. Qadeer, O. Shacham, K. Kelley, Z. Asgar, M. Wachs, R. Hameed, and M. Horowitz. Chip Multi-Processor Generator. In DAC '07: Proceedings of the 44th Annual Design Automation Conference, 2007.
[28]
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, vLi Wen Chang, N. Anssari, G. D. Liu, and W. mei W. Hwu. Impact technical report. In IMPACT-12-01, 2012.
[29]
Tensilica Inc. ConnX Vectra LX DSP Engine Guide.
[30]
Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual.
[31]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation cores: reducing the energy of mature computations. ASPLOS '10. ACM, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
    ICSA '13
    June 2013
    666 pages
    ISSN:0163-5964
    DOI:10.1145/2508148
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IEEE CS

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. H.264
  2. computational photography
  3. convolution
  4. demosaic
  5. energy efficiency
  6. specialized computing
  7. tensilica

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA'13
Sponsor:

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)14
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media