skip to main content
article

A templated programmable architecture for highly constrained embedded HD video processing

Published: 01 February 2019 Publication History

Abstract

The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities--several dozens of GOPs for real-time HD 1080p video streams. Today's embedded design constraints impose limitations both in terms of silicon budget and power consumption--usually 2 mm$$^2$$2 for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm$$^2$$2 and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account different parameters, which simplify the architecture sizing and characterization.

References

[1]
Chalamalasetti, S.R., Purohit, S., Margala, M., Vanderbauwhede, W.: MORA--an architecture and programming model for a resource efficient coarse grained reconfigurable processor. In: 2009 NASA/ESA conference on adaptive hardware and systems, IEEE, pp 389---396 (2009).
[2]
Chao, W.M., Chen, L.G.: Pyramid architecture for 3840 x 2160 quad full high definition 30 frames/s video acquisition. Circ Syst Video Technol IEEE Trans 20(11), 1499---1508 (2010).
[3]
Chen, J.C., Chien, S.Y.: CRISP: coarse-grained reconfigurable image stream processor for digital still cameras and camcorders. IEEE Trans Circ Syst Video Technol 18(9), 1223---1236 (2008).
[4]
Chen, P.Y., Lien, C.Y., Lin, Y.M.: A real-time image denoising chip. In: Circuits and systems, 2008. ISCAS 2008. IEEE international symposium on, pp. 3390---3393 (2008).
[5]
Chen, T.H., Chen, J.C., Cheng, T.Y., Chien, S.Y.: CRISP-DS: dual-stream coarse-grained reconfigurable image stream processor for HD digital camcorders and digital still cameras. In: Solid-state circuits conference, 2009. A-SSCC 2009. IEEE Asian, IEEE, pp. 193---196 (2009).
[6]
Conti, F., Schilling, R., Schiavone, P.D., Pullini, A., Rossi, D., Gurkaynak, F.K., Muehlberghuber, M., Gautschi, M., Loi, I., Haugou, G., Mangard, S., Benini, L.: An iot endpoint system-on-chip for secure and energy-efficient near-sensor analytics. IEEE Trans Circ Syst I Regular Papers 64(9), 2481---2494 (2017).
[7]
David, R., Chillet, D., Pillement, S., Sentieys, O.: DART: a dynamically reconfigurable architecture dealing with future mobile telecommunications constr. In: Proceedings 16th international parallel and distributed processing symposium, IEEE Comput. Soc, pp. 156+ (2002).
[8]
Desoli, G., Chawla, N., Boesch, T., Singh, S.P., Guidetti, E., Ambroggi, F.D., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., Aggarwal, N.: 14.1 a 2.9tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems. In: 2017 IEEE international solid-state circuits conference (ISSCC), pp. 238---239 (2017).
[9]
Di Carlo, S., Prinetto, P., Rolfo, D., Trotta, P.: AIdi: an adaptive image denoising FPGA-based IP-core for real-time applications. In: Adaptive hardware and systems (AHS), 2013 NASA/ESA conference on, pp. 99---106 (2013).
[10]
Du, Y., Du, L., Li, Y., Su, J., Chang, M.F.: A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications. CoRR abs/1709.05116:1---5 (2017). http://arxiv.org/abs/1709.05116(1709.05116)
[11]
Evain, S., Diguet, J.P.: Houzet D (2006) NoC design flow for TDMA and QoS management in a GALS context. EURASIP J Embedded Syst 1, 4---4 (2006)
[12]
Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/
[13]
Garcia-Lamont, J., Aleman-Arce, M., Waissman-Vilanova, J.: A digital real time image demosaicking implementation for high definition video cameras. In: Electronics, robotics and automotive mechanics conference, 2008. CERMA '08, pp. 565---569 (2008).
[14]
Gentile, A., Wills, D.S.: Portable video supercomputing. IEEE Trans Comput 53(8), 960---973 (2004).
[15]
Global Sources: Mobile phone camera modules--mobile phones spur output growth, r&d activities in camera modules segment. Glob Sour Part 1---4: NA (2009)
[16]
Gonzalez, R.: Xtensa: a configurable and extensible processor. Micro IEEE 20(2), 60---70 (2000).
[17]
Goossens, K., Hansson, A.: The aethereal network on chip after ten years: goals, evolution, lessons, and future. In: Proceedings of the 47th design automation conference, ACM, New York, NY, USA, DAC '10, pp. 306---311 (2010).
[18]
Goossens, K., Dielissen, J., Radulescu, A.: Aethereal network on chip: concepts, architectures, and implementations. Design Test Comput IEEE 22(5), 414---421 (2005).
[19]
Hartmann, M., Pantazis, V., Vander Aa, T., Berekovic, M., Hochberger, C.: Still image processing on coarse-grained reconfigurable array architectures. J Signal Process Syst 60(2), 225---237 (2010).
[20]
Jin, W., He, G., He, W., Mao, Z.: A 12-bit $$4928 \times 3264$$4928×3264 pixel cmos image signal processor for digital still cameras. Integr VLSI J 59, 206---217 (2017).
[21]
Juan, E.S.S.: Optimizing VLIW architecture for multimedia application. PhD thesis, Universitat Politècnica de Catalunya (2007)
[22]
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. Computer 36(8), 54---62 (2003).
[23]
Khailany, B.K., Williams, J., Long, E.P., Rygh, M., Tovey, D.W., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. Solid State Circ IEEE J 43(1), 202---213 (2008).
[24]
Khawam, S., Nousias, I., Milward, M., Yi, Y., Muir, M., Arslan, T.: The reconfigurable instruction cell array. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(1), 75---85 (2008).
[25]
Lopez, D., Llosa, J., Valero, M., Ayguade, E.: Widening resources: a cost-effective technique for aggressive ILP architectures. In: Microarchitecture, 1998. MICRO-31. Proceedings. 31st annual ACM/IEEE international symposium on, pp. 237---246 (1998).
[26]
Millberg, M., Nilsson, E., Thid, R., Kumar, S., Jantsch, A.: The nostrum backbone-a communication protocol stack for networks on chip. In: VLSI design, 2004. Proceedings. 17th international conference on, pp. 693---696 (2004).
[27]
Paindavoine, M., Boisard, O., Carbon, A., Philippe, J.M., Brousse, O.: Neurodsp accelerator for face detection application. In: Proceedings of the 25th edition on great lakes symposium on VLSI, ACM, New York, NY, USA, GLSVLSI '15, pp. 211---215 (2015). http://doi.acm.org/10.1145/2742060.2743769
[28]
Philippe, J.M., Carbon, A., Schmit, R.: Neurodsp: a multi-purpose energy-optimized accelerator for neural networks. In: Design, automation and test in Europe (DATE) 2016 conference, p. UB06.9 (2016). https://www.date-conference.com/date16/conference/session/UB06
[29]
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Sixth international symposium on high-performance computer architecture, 2000. HPCA-6, pp. 375---386 (2000)
[30]
Rossi, D., Pullini, A., Loi, I., Gautschi, M., Gürkaynak, F.K., Bartolini, A., Flatresse, P., Benini, L.: A 60 GOPS/W, $$-1.8$$-1.8---0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology. Solid State Electron 117, 170---184 (2016).
[31]
Saidani, T., Lacassagne, L., Falcou, J., Tadonki, C., Bouaziz, S.: Parallelization schemes for memory optimization on the cell processor: a case study on the harris corner detector. Transaction HiPEAC 3, 177---200 (2011)
[32]
Seo, S., Dreslinski, R.G., Woh, M., Chakrabarti, C., Mahlke, S., Mudge, T.: Diet soda: a power-efficient processor for digital cameras. In: 2010 ACM/IEEE international symposium on low-power electronics and design (ISLPED), pp. 79---84 (2010).
[33]
Singh, H., Lee, M.H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49(5), 465---481 (2000).
[34]
Sparsoe, J.: Design of networks-on-chip for real-time multi-processor systems-on-chip. In: Application of concurrency to system design (ACSD), 2012 12th international conference on, pp. 1---5 (2012).
[35]
Texier, M., Piriou, E., Thevenin, M., David, R.: Designing processors using mass, a modular and lightweight instruction-level exploration tool. In: Design and architectures for signal and image processing (DASIP), 2011 conference on, pp. 1---6 (2011).
[36]
Thevenin, M., Letellier, L.: Device for the parallel processing of a data stream. International Patent WO/2010/037570 PCT/EP2009/057033:1 (2008)
[37]
Thevenin, M., Paindavoine, M., Letellier, L., Heyrman, B.: Embedded processor extensions for image processing. In: Proc. SPIE 7001, photonics in multimedia II, vol 7001, pp. 70,010B---11 (2008).

Index Terms

  1. A templated programmable architecture for highly constrained embedded HD video processing
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image Journal of Real-Time Image Processing
              Journal of Real-Time Image Processing  Volume 16, Issue 1
              Feb 2019
              233 pages

              Publisher

              Springer-Verlag

              Berlin, Heidelberg

              Publication History

              Published: 01 February 2019

              Author Tags

              1. Low silicon footprint
              2. Low-power
              3. Programmable
              4. SIMD
              5. VLIW

              Qualifiers

              • Article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 05 Jan 2025

              Other Metrics

              Citations

              View Options

              View options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media