research-article

Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction

Authors:

Shuigeng ZhouAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5646 - 5654

https://doi.org/10.1145/3474085.3475710

Published: 17 October 2021 Publication History

Abstract

A number of deep learning based algorithms have been proposed to recover high-quality videos from low-quality compressed ones. Among them, some restore the missing details of each frame via exploring the spatiotemporal information of neighboring frames. However, these methods usually suffer from a narrow temporal scope, thus may miss some useful details from some frames outside the neighboring ones. In this paper, to boost artifact removal, on the one hand, we propose a Recursive Fusion (RF) module to model the temporal dependency within a long temporal range. Specifically, RF utilizes both the current reference frames and the preceding hidden state to conduct better spatiotemporal compensation. On the other hand, we design an efficient and effective Deformable Spatiotemporal Attention (DSTA) module such that the model can pay more effort on restoring the artifact-rich areas like the boundary area of a moving object. Extensive experiments show that our method outperforms the existing ones on the MFQE 2.0 dataset in terms of both fidelity and perceptual effect. Code is available at https://github.com/zhaominyiz/RFDA-PyTorch.

Supplementary Material

ZIP File (mfp3026aux.zip)

supp.pdf - Supplementary file RFDA_Visualization_DEMO.mp4 - Visualization video

Download
12.90 MB

References

[1]

Evlampios Apostolidis, Eleni Adamantidou, Alexandros I Metsai, Vasileios Mezaris, and Ioannis Patras. 2021. Video Summarization Using Deep Neural Networks: A Survey. arXiv preprint arXiv:2101.06072 (2021).

[2]

Frank Bossen. 2011. Common test conditions and software reference configurations. In Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 5th meeting, Jan. 2011.

[3]

Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2020. Understanding deformable alignment in video super-resolution. arXiv preprint arXiv:2009.07265, Vol. 4 (2020).

[4]

Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. 1994. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of 1st International Conference on Image Processing, Vol. 2. IEEE, 168--172.

[5]

Honggang Chen, Xiaohai He, Linbo Qing, Shuhua Xiong, and Truong Q Nguyen. 2018. DPW-SDNet: Dual pixel-wavelet domain deep CNNs for soft decoding of JPEG-compressed images. In CVPRW. 711--720.

[6]

Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In International Conference on Multimedia Modeling. Springer, 28--39.

[7]

Jianing Deng, Li Wang, Shiliang Pu, and Cheng Zhuo. 2020. Spatio-temporal deformable convolution for compressed video quality enhancement. In AAAI, Vol. 34. 10696--10703.

[8]

Qing Ding, Liquan Shen, Liangwei Yu, Hao Yang, and Mai Xu. 2021. Patch-wise Spatial-Temporal Quality Enhancement for HEVC Compressed Video. TIP (2021).

[9]

Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV. 576--584.

Digital Library

[10]

L. Galteri, L. Seidenari, M. Bertini, and AD Bimbo. 2017. Deep Generative Adversarial Compression Artifact Removal. (2017).

[11]

Z. Guan, Q. Xing, X. Mai, Y. Ren, and Z. Wang. 2019. MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video. TPAMI, Vol. PP, 99 (2019), 1--1.

[12]

Jun Guo and Hongyang Chao. 2016. Building dual-domain representations for compression artifacts reduction. In ECCV. Springer, 628--644.

[13]

Jun Guo and Hongyang Chao. 2017. One-to-many network for visually pleasing compression artifacts reduction. In CVPR. 4867--4876.

[14]

Zhipeng Jin, Ping An, Chao Yang, and Liquan Shen. 2018. Quality Enhancement for Intra Frame Coding Via Cnns: An Adversarial Approach. In ICASSP. IEEE, 1368--1372.

[15]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[16]

Ke Li, Bahetiyaer Bare, and Bo Yan. 2017. An efficient deep convolutional neural networks model for compressed image deblocking. In ICME. IEEE, 1320--1325.

[17]

Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and Thomas S Huang. 2018. Non-local recurrent network for image restoration. In NeurIPS. 1680--1689.

Digital Library

[18]

Jie Liu, Wenjie Zhang, Yuting Tang, Jie Tang, and Gangshan Wu. 2020. Residual feature aggregation network for image super-resolution. In CVPR. 2359--2368.

[19]

Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Zhiyong Gao, and Ming-Ting Sun. 2018. Deep Kalman Filtering Network for Video Compression Artifact Reduction. In ECCV. 568--584.

[20]

Guo Lu, Xiaoyun Zhang, Wanli Ouyang, Dong Xu, Li Chen, and Zhiyong Gao. 2019. Deep Non-Local Kalman Network for Video Compression Artifact Reduction. TIP, Vol. 29 (2019), 1725--1737.

[21]

Y. Ren, X. Mai, Z. Wang, and T. Li. 2018. Multi-frame Quality Enhancement for Compressed Video. In CVPR.

[22]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[23]

Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology, Vol. 22, 12 (2012), 1649--1668.

Digital Library

[24]

Y Tai, J Yang, X Liu, and C Memnet Xu. [n.d.]. A persistent memory network for image restoration. In ICCV. 4549--4557.

[25]

VQEG. [n.d.]. VQEG video datasets and organizations. https://www.its.bldrdoc.gov/vqeg/video-datasets-and-organizations.aspx.

[26]

Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Data Compression Conference, 2017. IEEE, 410--419.

[27]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. TIP, Vol. 13, 4 (2004), 600--612.

Digital Library

[28]

Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology, Vol. 13, 7 (2003), 560--576.

Digital Library

[29]

Huifen Xia and Yongzhao Zhan. 2020. A Survey on Temporal Action Localization. IEEE Access, Vol. 8 (2020), 70477--70487.

[30]

SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In NeurIPS. 802--810.

Digital Library

[31]

Xiph.org. [n.d.]. Xiph.org Video Test Media. https://media.xiph.org/video/derf/.

[32]

Yi Xu, Fan Bai, Yingxuan Shi, Qiuyu Chen, Longwen Gao, Kai Tian, Shuigeng Zhou, and Huyang Sun. 2021 a. GIF Thumbnails: Attract More Clicks to Your Videos. In AAAI, Vol. 35. 3074--3082.

[33]

Yi Xu, Longwen Gao, Kai Tian, Shuigeng Zhou, and Huyang Sun. 2019. Non-local ConvLS™ for video compression artifact reduction. In ICCV. 7043--7052.

[34]

Yi Xu, Minyi Zhao, Jing Liu, Xinjian Zhang, Longwen Gao, Shuigeng Zhou, and Huyang Sun. 2021 b. Boosting the performance of video compression artifact reduction with reference frame proposals and frequency domain information. In CVPRW. 213--222.

[35]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. IJCV, Vol. 127, 8 (2019), 1106--1125.

Digital Library

[36]

Ren Yang. 2021. NTIRE 2021 challenge on quality enhancement of compressed video: Methods and results. In CVPRW. 647--666.

[37]

Ren Yang, Xiaoyan Sun, Mai Xu, and Wenjun Zeng. 2019. Quality-gated convolutional LS™ for enhancing compressed video. In ICME. IEEE, 532--537.

[38]

Ren Yang, Mai Xu, Tie Liu, Zulin Wang, and Zhenyu Guan. 2018. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 7 (2018), 2039--2054.

[39]

Ren Yang, Mai Xu, and Zulin Wang. 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In ICME. IEEE, 817--822.

[40]

Jaeyoung Yoo, Sang-ho Lee, and Nojun Kwak. 2018. Image Restoration by Estimating Frequency Distribution of Local Patches. In CVPR. 6684--6692.

[41]

Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, Vol. 26, 7 (2017), 3142--3155.

Digital Library

[42]

Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu. 2019. Residual non-local attention networks for image restoration. ICLR (2019).

[43]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R Manmatha, and Mu Li. 2020. A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567 (2020).

Cited By

Lan CYan HLuo CZhao T(2025)GAN-based multi-view video coding with spatio-temporal EPI reconstructionSignal Processing: Image Communication10.1016/j.image.2024.117242132(117242)Online publication date: Mar-2025
https://doi.org/10.1016/j.image.2024.117242
Wang WJing MFan YWeng W(2024)PixRevive: Latent Feature Diffusion Model for Compressed Video Quality EnhancementSensors10.3390/s2406190724:6(1907)Online publication date: 16-Mar-2024
https://doi.org/10.3390/s24061907
Li HHe XBi XXiong SChen H(2024)Spatio-temporal enhancement method based on dense connection structure for compressed videoJournal of Electronic Imaging10.1117/1.JEI.33.4.04305433:04Online publication date: 1-Jul-2024
https://doi.org/10.1117/1.JEI.33.4.043054
Show More Cited By

Index Terms

Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Compression loss-based spatial-temporal attention module for compressed video quality enhancement
Abstract
Recently, deep learning technology has achieved remarkable progress in compressed video quality enhancement. However, the existing methods fail to consider the fact that the regions with different compression losses contain varied ...
User-Video Co-Attention Network for Personalized Micro-video Recommendation
WWW '19: The World Wide Web Conference

With the increasing popularity of micro-video sharing where people shoot short-videos effortlessly and share their daily stories on social media platforms, the micro-video recommendation has attracted extensive research efforts to provide users with ...
STDF: Spatio-Temporal Deformable Fusion for Video Quality Enhancement on Embedded Platforms
With the development of embedded systems and deep learning, it is feasible to combine them for offering various and convenient human-centered services, which is based on high-quality (HQ) videos. However, due to the limit of video traffic load and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Zhejiang Lab

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
350
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)15

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lan CYan HLuo CZhao T(2025)GAN-based multi-view video coding with spatio-temporal EPI reconstructionSignal Processing: Image Communication10.1016/j.image.2024.117242132(117242)Online publication date: Mar-2025
https://doi.org/10.1016/j.image.2024.117242
Wang WJing MFan YWeng W(2024)PixRevive: Latent Feature Diffusion Model for Compressed Video Quality EnhancementSensors10.3390/s2406190724:6(1907)Online publication date: 16-Mar-2024
https://doi.org/10.3390/s24061907
Li HHe XBi XXiong SChen H(2024)Spatio-temporal enhancement method based on dense connection structure for compressed videoJournal of Electronic Imaging10.1117/1.JEI.33.4.04305433:04Online publication date: 1-Jul-2024
https://doi.org/10.1117/1.JEI.33.4.043054
Ehrlich MBarker JPadmanabhan NDavis LTao ACatanzaro BShrivastava A(2024)Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00154(1506-1516)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00154
Lin LWang MYang JZhang KZhao T(2024)Toward Efficient Video Compression Artifact Detection and Removal: A Benchmark DatasetIEEE Transactions on Multimedia10.1109/TMM.2024.341454926(10816-10827)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3414549
Liu WGao WLi GMa SZhao TYuan H(2024)Enlarged Motion-Aware and Frequency-Aware Network for Compressed Video Artifact ReductionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.340642534:10(10339-10352)Online publication date: Oct-2024
https://doi.org/10.1109/TCSVT.2024.3406425
Fang XChen PWang MXie XWang SWang SMa S(2024)Exploiting Bidirectional Quality Impulse for Reference Picture Resampled Gaming Video CodingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337997134:9(8808-8820)Online publication date: Sep-2024
https://doi.org/10.1109/TCSVT.2024.3379971
Zhang THe XTeng QCheng JRen C(2024)Spatio-Temporal Adaptive Weighted Fusion Network for Compressed Video Quality EnhancementIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.344405271:12(5064-5068)Online publication date: Dec-2024
https://doi.org/10.1109/TCSII.2024.3444052
Wei LYe MJi LGan YLi SLi X(2024)Multi-Level Alignments for Compressed Video Super-ResolutionIEEE Transactions on Consumer Electronics10.1109/TCE.2024.341114470:3(5101-5114)Online publication date: Aug-2024
https://doi.org/10.1109/TCE.2024.3411144
Liu CJia K(2024)Multi-Frame Quality Recovery Model for Compressed Video EnhancementIEEE Transactions on Consumer Electronics10.1109/TCE.2024.340931370:3(6354-6362)Online publication date: Aug-2024
https://doi.org/10.1109/TCE.2024.3409313
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents