skip to main content
research-article
Free access
Just Accepted

MSEConv: A Unified Warping Framework for Video Frame Interpolation

Online AM: 14 February 2024 Publication History

Abstract

Within the context of video frame interpolation, complex motion modeling is the task of capturing, in a video sequence, where the moving objects are located in the interpolated frame, and how to maintain the temporal consistency of motion. Existing video frame interpolation methods typically assign either a fixed size of the motion kernel or a refined optical flow to model complex motions. However, they have the limitation of data redundancy and inaccuracy representation of motion. This paper introduces a unified warping framework, named multi-scale expandable deformable convolution (MSEConv), for simultaneously performing complex motion modeling and frame interpolation. In the proposed framework, a deep fully convolutional neural network with global attention is proposed to estimate multiple small-scale kernel weights with different expansion degrees and adaptive weight allocation for each pixel synthesis. Moreover, most of the kernel-based interpolation methods can be treated as the special case of the proposed MSEConv, thus, MSEConv can be easily transferred to other kernel-based frame interpolation methods for performance improvement. To further improve the robustness of motion occlusions, an operation of mask occlusion is introduced. As a consequence, our proposed MSEConv shows strong performance on par or even better than the state-of-the-art kernel-based frame interpolation works on public datasets. Our source code and visual comparable results are available at https://github.com/Pumpkin123709/MSEConv.

References

[1]
Simon Baker, Daniel Scharstein, JP Lewis, Stefan Roth, Michael J Black, and Richard Szeliski. 2011. A database and evaluation methodology for optical flow. International journal of computer vision 92, 1 (2011), 1–31.
[2]
Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Depth-aware video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3]
Wenbo Bao, Xiaoyun Zhang, Li Chen, Lianghui Ding, and Zhiyong Gao. 2018. High-order model and dynamic filtering for frame rate up-conversion. IEEE Transactions on Image Processing 27, 8 (2018), 3813–3826.
[4]
John L Barron, David J Fleet, and Steven S Beauchemin. 1994. Performance of optical flow techniques. International journal of computer vision 12, 1 (1994), 43–77.
[5]
Xianhang Cheng and Zhenzhong Chen. 2020. Video frame interpolation via deformable separable convolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.  34. 10607–10614.
[6]
Xianhang Cheng and Zhenzhong Chen. 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[7]
Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, and Kyoung Mu Lee. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.  34. 10663–10671.
[8]
Tianyu Ding, Luming Liang, Zhihui Zhu, and Ilya Zharkov. 2021. CDFI: Compression-driven network design for frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8001–8011.
[9]
Xiangling Ding, Pu Huang, Dengyong Zhang, and Xianfeng Zhao. 2022. Video frame interpolation via local lightweight bidirectional encoding with channel attention cascade. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1915–1919.
[10]
Jiong Dong, Kaoru Ota, and Mianxiong Dong. 2022. Video Frame Interpolation: A Comprehensive Survey. ACM Transactions on Multimedia Computing, Communications and Applications (2022).
[11]
Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. Advances in neural information processing systems 29 (2016).
[12]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758–2766.
[13]
Saikat Dutta, Arulkumar Subramaniam, and Anurag Mittal. 2022. Non-linear motion estimation for video frame interpolation using space-time convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1726–1731.
[14]
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5515–5524.
[15]
Jiale He, Gaobo Yang, Xin Liu, and Xiangling Ding. 2020. Spatio-temporal saliency-based motion vector refinement for frame rate up-conversion. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2 (2020), 1–18.
[16]
Ping Hu, Simon Niklaus, Stan Sclaroff, and Kate Saenko. 2022. Many-to-many splatting for efficient video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3553–3562.
[17]
Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, and Kate Saenko. 2023. Video Frame Interpolation With Many-to-Many Splatting and Spatial Selective Refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[18]
Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou. 2020. RIFE: Real-time intermediate flow estimation for video frame interpolation. arXiv preprint arXiv:2011.06294(2020).
[19]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2462–2470.
[20]
Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, and Jan Kautz. 2018. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9000–9008.
[21]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694–711.
[22]
Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, and Du Tran. 2023. Flavr: Flow-agnostic video representations for fast frame interpolation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2071–2082.
[23]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[24]
Mark Koren, Kunal Menda, and Apoorva Sharma. 2017. Frame interpolation using generative adversarial networks.
[25]
Hyeongmin Lee, Taeoh Kim, Tae-young Chung, Daehyun Pak, Yuseok Ban, and Sangyoun Lee. 2020. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5316–5325.
[26]
Haopeng Li, Yuan Yuan, and Qi Wang. 2019. Fi-net: A lightweight video frame interpolation network using feature-level flow. IEEE Access 7(2019), 118287–118296.
[27]
Haopeng Li, Yuan Yuan, and Qi Wang. 2020. Video frame interpolation via residue refinement. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2613–2617.
[28]
Chengxu Liu, Huan Yang, Jianlong Fu, and Xueming Qian. 2023. Ttvfi: Learning trajectory-aware transformer for video frame interpolation. IEEE Transactions on Image Processing(2023).
[29]
Meiqin Liu, Chenming Xu, Chao Yao, Chunyu Lin, and Yao Zhao. 2023. Jnmr: Joint non-linear motion regression for video frame interpolation. IEEE Transactions on Image Processing(2023).
[30]
Gucan Long, Laurent Kneip, Jose M Alvarez, Hongdong Li, Xiaohu Zhang, and Qifeng Yu. 2016. Learning image matching by simply watching video. In European Conference on Computer Vision. Springer, 434–450.
[31]
Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia. 2022. Video frame interpolation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3532–3542.
[32]
Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440(2015).
[33]
Simone Meyer, Abdelaziz Djelouah, Brian McWilliams, Alexander Sorkine-Hornung, Markus Gross, and Christopher Schroers. 2018. PhaseNet for video frame interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34]
Simone Meyer, Oliver Wang, Henning Zimmer, Max Grosse, and Alexander Sorkine-Hornung. 2015. Phase-based frame interpolation for video. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1410–1418.
[35]
Simon Niklaus, Long Mai, and Feng Liu. 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36]
Simon Niklaus, Long Mai, and Feng Liu. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision. 261–270.
[37]
Minho Park, Hak Gu Kim, Sangmin Lee, and Yong Man Ro. 2020. Robust video frame interpolation with exceptional motion map. IEEE Transactions on Circuits and Systems for Video Technology 31, 2(2020), 754–764.
[38]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[39]
Tomer Peleg, Pablo Szekely, Doron Sabo, and Omry Sendik. 2019. Im-net for high resolution video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2398–2407.
[40]
Wang Shen, Wenbo Bao, Guangtao Zhai, Li Chen, Xiongkuo Min, and Zhiyong Gao. 2020. Video frame interpolation and enhancement via pyramid recurrent framework. IEEE Transactions on Image Processing 30 (2020), 277–292.
[41]
Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, and Ming-Hsuan Yang. 2022. Video frame interpolation transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17482–17491.
[42]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402(2012).
[43]
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8934–8943.
[44]
Quang Nhat Tran and Shih-Hsuan Yang. 2020. Efficient video frame interpolation using generative adversarial networks. Applied Sciences 10, 18 (2020), 6245.
[45]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[46]
Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. 2013. DeepFlow: Large displacement optical flow with deep matching. In Proceedings of the IEEE international conference on computer vision. 1385–1392.
[47]
Manuel Werlberger, Thomas Pock, Markus Unger, and Horst Bischof. 2011. Optical flow guided TV-L 1 video interpolation and restoration. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition. Springer, 273–286.
[48]
Jian Xiao and Xiaojun Bi. 2020. Multi-scale attention generative adversarial networks for video frame interpolation. IEEE Access 8(2020), 94842–94851.
[49]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.
[50]
Zhefei Yu, Houqiang Li, Zhangyang Wang, Zeng Hu, and Chang Wen Chen. 2013. Multi-level video frame interpolation: Exploiting the interaction among different levels. IEEE Transactions on Circuits and Systems for Video Technology 23, 7(2013), 1235–1248.
[51]
Zhiyang Yu, Yu Zhang, Dongqing Zou, Xijun Chen, Jimmy S Ren, and Shunqing Ren. 2023. Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22159–22168.
[52]
Dengyong Zhang, Pu Huang, Xiangling Ding, Feng Li, Wenjie Zhu, Yun Song, and Gaobo Yang. 2023. L2BEC2: Local lightweight bidirectional encoding and channel attention cascade for video frame interpolation. ACM Transactions on Multimedia Computing, Communications and Applications 19, 2(2023), 1–19.
[53]
Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, and Limin Wang. 2023. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5682–5692.
[54]
Kun Zhou, Wenbo Li, Xiaoguang Han, and Jiangbo Lu. 2023. Exploring motion ambiguity and alignment for high-quality video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22169–22179.
[55]
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In European conference on computer vision. Springer, 286–301.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
EISSN:2375-4702
Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 14 February 2024
Accepted: 07 February 2024
Revised: 26 January 2024
Received: 13 August 2022

Check for updates

Author Tags

  1. Video Frame Interpolation
  2. Unified Framework
  3. Complex motion Modeling
  4. Mask Occlusion

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)315
  • Downloads (Last 6 weeks)49
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media