skip to main content
research-article

ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction

Published: 01 January 2021 Publication History

Abstract

We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. A generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes. The entropic differences are calculated across multiple temporal and spatial subbands, and merged using a learned regressor. We show through extensive experiments that GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models. The features used in GREED are highly generalizable and obtain competitive performance even on standard, non-HFR VQA databases. The implementation of GREED has been made available online: <uri>https://github.com/pavancm/GREED</uri>.

References

[1]
R. M. Nasiri, J. Wang, A. Rehman, S. Wang, and Z. Wang, “Perceptual quality assessment of high frame rate video,” in Proc. IEEE Int. Workshop Multimedia Signal Process. (MMSP), Oct. 2015, pp. 1–6.
[2]
A. Mackin, F. Zhang, and D. R. Bull, “A study of high frame rate video formats,” IEEE Trans. Multimedia, vol. 21, no. 6, pp. 1499–1512, Jun. 2019.
[3]
P. C. Madhusudana, X. Yu, N. Birkbeck, Y. Wang, B. Adsumilli, and A. C. Bovik, “Subjective and objective quality assessment of high frame rate videos,” IEEE Access, vol. 9, pp. 108069–108082, 2021.
[4]
R. Soundararajan and A. C. Bovik, “Video quality assessment by reduced reference spatio-temporal entropic differencing,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, pp. 684–694, Apr. 2013.
[5]
Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara. Toward a Practical Perceptual Video Quality Metric. Accessed: Nov.1, 2019. [Online]. Available: http://techblog.netflix.com/2016/06/toward-practical-perceptual-video.html
[6]
C. G. Bampis, P. Gupta, R. Soundararajan, and A. C. Bovik, “SpEED-QA: Spatial efficient entropic differencing for image and video quality,” IEEE Signal Process. Lett., vol. 24, no. 9, pp. 1333–1337, Sep. 2017.
[7]
C. G. Bampis, Z. Li, and A. C. Bovik, “Spatiotemporal feature integration and model fusion for full reference video quality assessment,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 8, pp. 2256–2270, Aug. 2019.
[8]
S. Chikkerur, V. Sundaram, M. Reisslein, and L. J. Karam, “Objective video quality assessment methods: A classification, review, and performance comparison,” IEEE Trans. Broadcast., vol. 57, no. 2, pp. 165–182, Jun. 2011.
[9]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[10]
Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. Asilomar Conf. Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
[11]
L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
[12]
M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50, no. 3, pp. 312–322, Sep. 2004.
[13]
M. H. Pinson, L. K. Choi, and A. C. Bovik, “Temporal video quality model accounting for variable frame delay distortions,” IEEE Trans. Broadcast., vol. 60, no. 4, pp. 637–649, Dec. 2014.
[14]
A. K. Moorthy, L. K. Choi, A. C. Bovik, and G. de Veciana, “Video quality assessment on mobile devices: Subjective, behavioral and objective studies,” IEEE J. Sel. Topics Signal Process., vol. 6, no. 6, pp. 652–671, Oct. 2012.
[15]
K. Seshadrinathan and A. C. Bovik, “Motion tuned spatio-temporal quality assessment of natural videos,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 335–350, Feb. 2010.
[16]
K. Seshadrinathan and A. C. Bovik, “A structural similarity metric for video based on motion models,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2007, p. 869.
[17]
P. V. Vu, C. T. Vu, and D. M. Chandler, “A spatiotemporal most-apparent-distortion model for video quality assessment,” in Proc. IEEE Int. Conf. Image Process., Sep. 2011, pp. 2505–2508.
[18]
E. C. Larson and D. M. Chandler, “Most apparent distortion: Full-reference image quality assessment and the role of strategy,” J. Electron. Imag., vol. 19, no. 1, 2010, Art. no.
[19]
J. You, T. Ebrahimi, and A. Perkis, “Attention driven foveated video quality assessment,” IEEE Trans. Image Process., vol. 23, no. 1, pp. 200–213, Jan. 2014.
[20]
B. Ortiz-Jaramillo, A. Kumcu, L. Platisa, and W. Philips, “A full reference video quality measure based on motion differences and saliency maps evaluation,” in Proc. Int. Conf. Comput. Vis. Theory Appl., vol. 2, Jan. 2014, pp. 714–722.
[21]
K. Manasa and S. S. Channappayya, “An optical flow-based full reference video quality assessment algorithm,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2480–2492, Jun. 2016.
[22]
W. Kim, J. Kim, S. Ahn, J. Kim, and S. Lee, “Deep video quality assessor: From spatio-temporal visual sensitivity to a convolutional neural aggregation network,” in Proc. Eur. Conf. Comput. Vis., Sep. 2018, pp. 219–234.
[23]
S. Becker, K.-R. Müller, T. Wiegand, and S. Bosse, “A neural network model of spatial distortion sensitivity for video quality estimation,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process., Oct. 2019, pp. 1–6.
[24]
M. Xu, J. Chen, H. Wang, S. Liu, G. Li, and Z. Bai, “C3DVQA: Full-reference video quality assessment with 3D convolutional neural network,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2020, pp. 4447–4451.
[25]
R. M. Nasiri and Z. Wang, “Perceptual aliasing factors and the impact of frame rate on video quality,” in Proc. IEEE Int. Conf. Image Process., Sep. 2017, pp. 3475–3479.
[26]
R. M. Nasiri, Z. Duanmu, and Z. Wang, “Temporal motion smoothness and the impact of frame rate variation on video quality,” in Proc. IEEE Int. Conf. Image Process., Oct. 2018, pp. 1418–1422.
[27]
F. Zhang, A. Mackin, and D. R. Bull, “A frame rate dependent video quality metric based on temporal wavelet decomposition and spatiotemporal pooling,” in Proc. IEEE Int. Conf. Image Process., Sep. 2017, pp. 300–304.
[28]
P. C. Madhusudana, N. Birkbeck, Y. Wang, B. Adsumilli, and A. C. Bovik, “Capturing video frame rate variations via entropic differencing,” IEEE Signal Process. Lett., vol. 27, pp. 1809–1813, 2020.
[29]
M. A. Saad, A. C. Bovik, and C. Charrier, “Blind prediction of natural video quality,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1352–1365, Mar. 2014.
[30]
H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.
[31]
R. Soundararajan and A. C. Bovik, “RRED indices: Reduced reference entropic differencing for image quality assessment,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 517–526, Feb. 2012.
[32]
D. L. Ruderman, “The statistics of natural images,” Netw., Comput. Neural Syst., vol. 5, no. 4, pp. 517–548, 1994.
[33]
A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21, no. 12, pp. 4695–4708, Dec. 2012.
[34]
A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely blind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, Mar. 2013.
[35]
S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Trans. Image Process., vol. 9, no. 9, pp. 1532–1546, Sep. 2000.
[36]
M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density and Kullback–Leibler distance,” IEEE Trans. Image Process., vol. 11, no. 2, pp. 146–158, Feb. 2002.
[37]
Q. Zhao, H.-W. Li, and Y.-T. Shen, “On the sum of generalized Gaussian random signals,” in Proc. IEEE Int. Conf. Signal Process., Aug./Sep. 2004, pp. 50–53.
[38]
H. Soury and M.-S. Alouini, “New results on the sum of two generalized Gaussian random variables,” in Proc. IEEE Global Conf. Signal Inf. Process., Dec. 2015, pp. 1017–1021.
[39]
X. Pan, X. Zhang, and S. Lyu, “Exposing image splicing with inconsistent local noise variances,” in Proc. IEEE Int. Conf. Comput. Photogr. (ICCP), Apr. 2012, pp. 1–10.
[40]
FFmpeg. Encoding for Streaming Sites. Accessed: Nov.1, 2019. [Online]. Available: https://trac.ffmpeg.org/wiki
[41]
B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett, “New support vector algorithms,” Neural Comput., vol. 12, no. 5, pp. 1207–1245, 2000.
[42]
C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27:1–27:27, May 2011.
[43]
R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 713–718, Mar. 1992.
[44]
K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of subjective and objective quality assessment of video,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1427–1441, Jun. 2010.
[45]
“Final report from the video quality experts group on the validation of objective quality metrics for video quality assessment,” VQEG, Tech. Rep., 2000.
[46]
J. G. Robson, “Spatial and temporal contrast-sensitivity functions of the visual system,” J. Opt. Soc. Amer., vol. 56, no. 8, pp. 1141–1142, 1966.
[47]
P. V. Vu and D. M. Chandler, “ViS3: An algorithm for video quality assessment via analysis of spatial and spatiotemporal slices,” J. Electron. Imag., vol. 23, no. 1, 2014, Art. no.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 30, Issue
2021
5053 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2021

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media