skip to main content
10.1145/3474085.3475710acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction

Published: 17 October 2021 Publication History

Abstract

A number of deep learning based algorithms have been proposed to recover high-quality videos from low-quality compressed ones. Among them, some restore the missing details of each frame via exploring the spatiotemporal information of neighboring frames. However, these methods usually suffer from a narrow temporal scope, thus may miss some useful details from some frames outside the neighboring ones. In this paper, to boost artifact removal, on the one hand, we propose a Recursive Fusion (RF) module to model the temporal dependency within a long temporal range. Specifically, RF utilizes both the current reference frames and the preceding hidden state to conduct better spatiotemporal compensation. On the other hand, we design an efficient and effective Deformable Spatiotemporal Attention (DSTA) module such that the model can pay more effort on restoring the artifact-rich areas like the boundary area of a moving object. Extensive experiments show that our method outperforms the existing ones on the MFQE 2.0 dataset in terms of both fidelity and perceptual effect. Code is available at https://github.com/zhaominyiz/RFDA-PyTorch.

Supplementary Material

ZIP File (mfp3026aux.zip)
supp.pdf - Supplementary file RFDA_Visualization_DEMO.mp4 - Visualization video

References

[1]
Evlampios Apostolidis, Eleni Adamantidou, Alexandros I Metsai, Vasileios Mezaris, and Ioannis Patras. 2021. Video Summarization Using Deep Neural Networks: A Survey. arXiv preprint arXiv:2101.06072 (2021).
[2]
Frank Bossen. 2011. Common test conditions and software reference configurations. In Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 5th meeting, Jan. 2011.
[3]
Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2020. Understanding deformable alignment in video super-resolution. arXiv preprint arXiv:2009.07265, Vol. 4 (2020).
[4]
Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. 1994. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of 1st International Conference on Image Processing, Vol. 2. IEEE, 168--172.
[5]
Honggang Chen, Xiaohai He, Linbo Qing, Shuhua Xiong, and Truong Q Nguyen. 2018. DPW-SDNet: Dual pixel-wavelet domain deep CNNs for soft decoding of JPEG-compressed images. In CVPRW. 711--720.
[6]
Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In International Conference on Multimedia Modeling. Springer, 28--39.
[7]
Jianing Deng, Li Wang, Shiliang Pu, and Cheng Zhuo. 2020. Spatio-temporal deformable convolution for compressed video quality enhancement. In AAAI, Vol. 34. 10696--10703.
[8]
Qing Ding, Liquan Shen, Liangwei Yu, Hao Yang, and Mai Xu. 2021. Patch-wise Spatial-Temporal Quality Enhancement for HEVC Compressed Video. TIP (2021).
[9]
Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV. 576--584.
[10]
L. Galteri, L. Seidenari, M. Bertini, and AD Bimbo. 2017. Deep Generative Adversarial Compression Artifact Removal. (2017).
[11]
Z. Guan, Q. Xing, X. Mai, Y. Ren, and Z. Wang. 2019. MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video. TPAMI, Vol. PP, 99 (2019), 1--1.
[12]
Jun Guo and Hongyang Chao. 2016. Building dual-domain representations for compression artifacts reduction. In ECCV. Springer, 628--644.
[13]
Jun Guo and Hongyang Chao. 2017. One-to-many network for visually pleasing compression artifacts reduction. In CVPR. 4867--4876.
[14]
Zhipeng Jin, Ping An, Chao Yang, and Liquan Shen. 2018. Quality Enhancement for Intra Frame Coding Via Cnns: An Adversarial Approach. In ICASSP. IEEE, 1368--1372.
[15]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[16]
Ke Li, Bahetiyaer Bare, and Bo Yan. 2017. An efficient deep convolutional neural networks model for compressed image deblocking. In ICME. IEEE, 1320--1325.
[17]
Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and Thomas S Huang. 2018. Non-local recurrent network for image restoration. In NeurIPS. 1680--1689.
[18]
Jie Liu, Wenjie Zhang, Yuting Tang, Jie Tang, and Gangshan Wu. 2020. Residual feature aggregation network for image super-resolution. In CVPR. 2359--2368.
[19]
Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Zhiyong Gao, and Ming-Ting Sun. 2018. Deep Kalman Filtering Network for Video Compression Artifact Reduction. In ECCV. 568--584.
[20]
Guo Lu, Xiaoyun Zhang, Wanli Ouyang, Dong Xu, Li Chen, and Zhiyong Gao. 2019. Deep Non-Local Kalman Network for Video Compression Artifact Reduction. TIP, Vol. 29 (2019), 1725--1737.
[21]
Y. Ren, X. Mai, Z. Wang, and T. Li. 2018. Multi-frame Quality Enhancement for Compressed Video. In CVPR.
[22]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[23]
Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology, Vol. 22, 12 (2012), 1649--1668.
[24]
Y Tai, J Yang, X Liu, and C Memnet Xu. [n.d.]. A persistent memory network for image restoration. In ICCV. 4549--4557.
[25]
VQEG. [n.d.]. VQEG video datasets and organizations. https://www.its.bldrdoc.gov/vqeg/video-datasets-and-organizations.aspx.
[26]
Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Data Compression Conference, 2017. IEEE, 410--419.
[27]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. TIP, Vol. 13, 4 (2004), 600--612.
[28]
Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology, Vol. 13, 7 (2003), 560--576.
[29]
Huifen Xia and Yongzhao Zhan. 2020. A Survey on Temporal Action Localization. IEEE Access, Vol. 8 (2020), 70477--70487.
[30]
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In NeurIPS. 802--810.
[31]
Xiph.org. [n.d.]. Xiph.org Video Test Media. https://media.xiph.org/video/derf/.
[32]
Yi Xu, Fan Bai, Yingxuan Shi, Qiuyu Chen, Longwen Gao, Kai Tian, Shuigeng Zhou, and Huyang Sun. 2021 a. GIF Thumbnails: Attract More Clicks to Your Videos. In AAAI, Vol. 35. 3074--3082.
[33]
Yi Xu, Longwen Gao, Kai Tian, Shuigeng Zhou, and Huyang Sun. 2019. Non-local ConvLS™ for video compression artifact reduction. In ICCV. 7043--7052.
[34]
Yi Xu, Minyi Zhao, Jing Liu, Xinjian Zhang, Longwen Gao, Shuigeng Zhou, and Huyang Sun. 2021 b. Boosting the performance of video compression artifact reduction with reference frame proposals and frequency domain information. In CVPRW. 213--222.
[35]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. IJCV, Vol. 127, 8 (2019), 1106--1125.
[36]
Ren Yang. 2021. NTIRE 2021 challenge on quality enhancement of compressed video: Methods and results. In CVPRW. 647--666.
[37]
Ren Yang, Xiaoyan Sun, Mai Xu, and Wenjun Zeng. 2019. Quality-gated convolutional LS™ for enhancing compressed video. In ICME. IEEE, 532--537.
[38]
Ren Yang, Mai Xu, Tie Liu, Zulin Wang, and Zhenyu Guan. 2018. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 7 (2018), 2039--2054.
[39]
Ren Yang, Mai Xu, and Zulin Wang. 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In ICME. IEEE, 817--822.
[40]
Jaeyoung Yoo, Sang-ho Lee, and Nojun Kwak. 2018. Image Restoration by Estimating Frequency Distribution of Local Patches. In CVPR. 6684--6692.
[41]
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, Vol. 26, 7 (2017), 3142--3155.
[42]
Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu. 2019. Residual non-local attention networks for image restoration. ICLR (2019).
[43]
Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R Manmatha, and Mu Li. 2020. A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567 (2020).

Cited By

View all

Index Terms

  1. Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention mechanism
    2. compressed video
    3. deep learning
    4. quality enhancement
    5. video artifact reduction
    6. video enhancement

    Qualifiers

    • Research-article

    Funding Sources

    • Zhejiang Lab

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)63
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media