skip to main content
10.1145/1995896.1995945acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

A QHD-capable parallel H.264 decoder

Published: 31 May 2011 Publication History

Abstract

Video coding follows the trend of demanding higher performance every new generation, and therefore could utilize many-cores. A complete parallelization of H.264, which is the most advanced video coding standard, was found to be difficult due to the complexity of the standard. In this paper a parallel implementation of a complete H.264 decoder is presented. Our parallelization strategy exploits function-level as well as data-level parallelism. Function-level parallelism is used to pipeline the H.264 decoding stages. Data-level parallelism is exploited within the two most time consuming stages, the entropy decoding stage and the macroblock decoding stage. The parallelization strategy has been implemented and optimized on three platforms with very different memory architectures, namely an 8-core SMP, a 64-core cc-NUMA, and an 18-core Cell platform. Evaluations have been performed using 4kx2k QHD sequences. On the SMP platform a maximum speedup of 4.5x is achieved. The SMP-implementation is reasonably performance portable as it achieves a speedup of 26.6x on the cc-NUMA system. However, to obtain the highest performance (speedup of 33.4x and throughput of 200 QHD frames per second), several cc-NUMA specific optimizations are necessary such as optimizing the page placement and statically assigning threads to cores. Finally, on the Cell platform a near ideal speedup of 16.5x is achieved by completely hiding the communication latency.

References

[1]
M. Alvarez, A. Ramirez, A. Azevedo, C. Meenderinck, B. Juurlink, and M. Valero. Scalability of Macroblock-level Parallelism for H.264 Decoding. In Proc. 15th Int. Conf. on Parallel and Distributed Systems, 2009.
[2]
M. Alvarez, E. Salami, A. Ramirez, and M. Valero. A Performance Characterization of High Definition Digital Video Decoding using H.264/AVC. In Proceedings IEEE Int. Symp. on Workload Characterization, 2005.
[3]
A. Azevedo, C. Meenderinck, B. Juurlink, A. Terechko, J. Hoogerbrugge, M. Alvarez, and A. Ramirez. Parallel H.264 Decoding on an Embedded Multicore Processor. In Proc. 4th Int. Conf. on High Performance Embedded Architectures and Compilers, 2009.
[4]
H. Baik, K.-H. Sihn, Y. il Kim, S. Bae, N. Han, and H. J. Song. Analysis and Parallelization of H.264 Decoder on Cell Broadband Engine Architecture. In Proc. Int. Symp. on Signal Processing and Information Technology, 2007.
[5]
M. A. Baker, P. Dalale, K. S. Chatha, and S. B. Vrudhula. A Scalable Parallel H.264 Decoder on the Cell Broadband Engine Architecture. In In Proc. 7th ACM/IEEE Int. Conf. on Hardware/Software Codesign and System Synthesis, 2009.
[6]
G. Blake, R. G. Dreslinski, T. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. In Proc. 37th Int. Symp. on Computer Architecture, 2010.
[7]
C. C. Chi, B. Juurlink, and C. Meenderinck. Evaluation of Parallel H.264 Decoding Strategies for the Cell Broadband Engine. In Proc. 24th Int. Conf. on Supercomputing, 2010.
[8]
Y. Cho, S. Kim, J. Lee, and H. Shin. Parallelizing the H.264 Decoder on the Cell BE Architecture. In Proc. 10th Int. Conf on Embedded software, 2010.
[9]
The FFmpeg Libavcodec. http://ffmpeg.org.
[10]
D. Finchelstein, V. Sze, and A. Chandrakasan. Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders. IEEE Trans. on Circuits and Systems for Video Technology, 2009.
[11]
Hewlett-Packard. HP ProLiant DL980 G7 server with HP PREMA Architecture. Technical report, 2010.
[12]
M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro. H.264/AVC Baseline Profile Decoder Complexity Analysis. IEEE Trans. on Circuits and Systems for Video Technology, 13(7), 2003.
[13]
N. Iqbal and J. Henkel. Efficient Constant-Time Entropy Decoding for H.264. In Proc. Conf. Design, Automation Test in Europe, 2009.
[14]
T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient Operating System Scheduling for Performance-Asymmetric Multi-core Architectures. In Proc. ACM/IEEE Conf. on Supercomputing, 2007.
[15]
N. Ling. Expectations and Challenges for Next Generation Video Compression. In Proc. 5th IEEE Conf. on Industrial Electronics and Applications, 2010.
[16]
C. Meenderinck, A. Azevedo, B. Juurlink, M. Alvarez Mesa, and A. Ramirez. Parallel Scalability of Video Decoders. Journal of Signal Processing Systems, 57, November 2009.
[17]
D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé. A Case for User-Level Dynamic Page Migration. In Proc. 14th Int. Conf. on Supercomputing, 2000.
[18]
K. Nishihara, A. Hatabu, and T. Moriyoshi. Parallelization of H.264 video decoder for embedded multicore processor. In Proc. IEEE Int. Conf. on Multimedia and Expo, 2008.
[19]
M. Roitzsch. Slice-balancing H.264 video encoding for improved scalability of multicore decoding. In Proc. 7th Int. Conf. on Embedded software, 2007.
[20]
F. H. Seitner, R. M. Schreier, M. Bleyer, and M. Gelautz. Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding. In Proc. 6th Int. Conf. on Advances in Mobile Computing and Multimedia, 2008.
[21]
K.-H. Sihn, H. Baik, J.-T. Kim, S. Bae, and H. J. Song. Novel Approaches to Parallel H.264 Decoder on Symmetric Multicore Systems. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, 2009.
[22]
D. Tam, R. Azimi, and M. Stumm. Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors. In Proc. 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, 2007.
[23]
E. van der Tol, E. Jaspers, and R. Gelderblom. Mapping of H.264 Decoding on a Multiprocessor Architecture. In Proc. SPIE Conf. on Image and Video Communications and Processing, 2003.
[24]
T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC Video Coding Standard. IEEE Trans. on Circuits and Systems for Video Technology, 13(7), 2003.
[25]
X264. A Free H.264/AVC Encoder. http://www.videolan.org/developers/x264.html.
[26]
Xiph.org. http://media.xiph.org/video/derf/.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '11: Proceedings of the international conference on Supercomputing
May 2011
398 pages
ISBN:9781450301022
DOI:10.1145/1995896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 4k x 2k
  2. cell
  3. decoding
  4. h.264
  5. numa
  6. parallel
  7. smp

Qualifiers

  • Research-article

Conference

ICS '11
Sponsor:
ICS '11: International Conference on Supercomputing
May 31 - June 4, 2011
Arizona, Tucson, USA

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media