research-article

Rethinking two-dimensional camera motion estimation assessment for digital video stabilization: : A camera motion field-based metric

Authors:

Marcos Roberto e Souza,

Helena de Almeida Maia,

Helio PedriniAuthors Info & Claims

Volume 559, Issue C

https://doi.org/10.1016/j.neucom.2023.126768

Published: 28 November 2023 Publication History

Abstract

Digital video stabilization aims to remove camera motion jitters through software implementation. The first step of the classical video stabilization methodology is called camera motion estimation, which is usually performed using only RGB frames of the unstable video. Despite recent advances in camera motion estimation strategies, methods classified as two-dimensional are still not properly evaluated, even though it is well known that motion estimation is a crucial step for classical approaches to video stabilization. The main purpose of this work is to draw attention to two-dimensional camera motion estimation assessment and reinforce its importance on video stabilization progress. We proposed a new approach to perform this evaluation using camera motion fields in a pixel-by-pixel comparison and demonstrated through experimental results that our metrics are reliable for diverse scenarios comparing them to image similarity metrics. In addition, we showed and analyzed the results of our metrics for a global and a local method of camera motion estimation. We believe that our assessment and study presented in this work is an important starting point for a more rigorous analysis of this task. In addition, this can be a foundation for coming 2D camera motion estimation methods based on deep learning.

References

[1]

Özyeşil O., Voroninski V., Basri R., Singer A., A survey of structure from motion, Acta Numer. 26 (2017) 305–364.

[2]

Häming K., Peters G., The structure-from-motion reconstruction pipeline – a survey with focus on short image sequences, Kybernetika 46 (5) (2010) 926–937.

[3]

Schonberger J.L., Frahm J.-M., Structure-from-motion revisited, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4104–4113.

[4]

Melekhov I., Ylioinas J., Kannala J., Rahtu E., Relative camera pose estimation using convolutional neural networks, in: International Conference on Advanced Concepts for Intelligent Vision Systems, Springer, 2017, pp. 675–687.

[5]

Camposeco F., Cohen A., Pollefeys M., Sattler T., Hybrid camera pose estimation, in: IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 136–144.

[6]

Souza M.R., Maia H.A., Pedrini H., Survey on digital video stabilization: Concepts, methods, and challenges, ACM Comput. Surv. 55 (3) (2022) 1–37.

[7]

Xu S.-Z., Hu J., Wang M., Mu T.-J., Hu S.-M., Deep video stabilization using adversarial networks, in: Computer Graphics Forum, Vol. 37, Wiley Online Library, 2018, pp. 267–276.

[8]

Choi J., Kweon I.S., Deep iterative frame interpolation for full-frame video stabilization, ACM Trans. Graph. 39 (1) (2020) 1–9.

[9]

Wang M., Yang G.-Y., Lin J.-K., Zhang S.-H., Shamir A., Lu S.-P., Hu S.-M., Deep online video stabilization with multi-grid warping transformation learning, IEEE Trans. Image Process. 28 (5) (2019) 2283–2292.

[10]

Sun D., Yang X., Liu M.-Y., Kautz J., PWC-net: CNNs for optical flow using pyramid, warping, and cost volume, in: IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8934–8943.

[11]

Teed Z., Deng J., RAFT: Recurrent all-pairs field transforms for optical flow, in: European Conference on Computer Vision, Springer, 2020, pp. 402–419.

[12]

Hedman P., Philip J., Price T., Frahm J.-M., Drettakis G., Brostow G., Deep blending for free-viewpoint image-based rendering, ACM Trans. Graph. 37 (6) (2018) 1–15.

[13]

Wang Q., Wang Z., Genova K., Srinivasan P.P., Zhou H., Barron J.T., Martin-Brualla R., Snavely N., Funkhouser T., IBRNet: Learning multi-view image-based rendering, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4690–4699.

[14]

Xu Y., Zhang J., Maybank S.J., Tao D., DUT: Learning video stabilization by simply watching unstable videos, IEEE Trans. Image Process. (2022).

[15]

Yu J., Ramamoorthi R., Learning video stabilization using optical flow, in: IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 8159–8167.

[16]

Lee Y.-C., Tseng K.-W., Chen Y.-T., Chen C.-C., Chen C.-S., Hung Y.-P., 3D video stabilization with depth estimation by CNN-based optimization, in: IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 10621–10630.

[17]

Bao W., Lai W.-S., Zhang X., Gao Z., Yang M.-H., MEMC-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement, IEEE Trans. Pattern Anal. Mach. Intell. 43 (3) (2019) 933–948.

[18]

Jiao S., Sun M., Gao Y., Lei T., Xie Z., Yuan X., Motion estimation and quality enhancement for a single image in dynamic single-pixel imaging, Opt. Express 27 (9) (2019) 12841–12854.

[19]

Protter M., Elad M., Super-resolution with probabilistic motion estimation, in: Super-Resolution Imaging, CRC Press, 2017, pp. 97–122.

[20]

Souza M.R., da Fonseca L.F.R., Pedrini H., Improvement of global motion estimation in two-dimensional digital video stabilisation methods, Image Process. 12 (12) (2018) 2204–2211.

[21]

Souza M.R., Pedrini H., Visual rhythms for qualitative evaluation of video stabilization, EURASIP J. Image Video Process. 2019 (2019).

[22]

e Souza M.R., de Almeida Maia H., Vieira M.B., Pedrini H., Survey on visual rhythms: A spatio-temporal representation for video sequences, Neurocomputing 402 (2020) 409–422.

[23]

Cheng B., Girshick R., Dollár P., Berg A.C., Kirillov A., Boundary IoU: Improving object-centric image segmentation evaluation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15334–15342.

[24]

Shen J., Wang H., Zhang A., Qiu Q., Zhen X., Cao X., Model-agnostic metric for zero-shot learning, in: IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 786–795.

[25]

Deng W., Zheng L., Are labels always necessary for classifier accuracy evaluation?, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15069–15078.

[26]

Khrulkov V., Babenko A., Neural side-by-side: Predicting human preferences for no-reference super-resolution evaluation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4988–4997.

[27]

Tan Y., Li Y., Huang S.-L., OTCE: A transferability metric for cross-domain cross-task representations, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15779–15788.

[28]

Tong L., Chen Z., Ni J., Cheng W., Song D., Chen H., Vorobeychik Y., FACESEC: A fine-grained robustness evaluation framework for face recognition systems, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13254–13263.

[29]

Huang P.-H., Matzen K., Kopf J., Ahuja N., Huang J.-B., DeepMVS: Learning multi-view stereopsis, in: IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2821–2830.

[30]

Geiger A., Lenz P., Stiller C., Urtasun R., Vision meets robotics: The KITTI dataset, Int. J. Robot. Res. 32 (11) (2013) 1231=–1237.

[31]

Wang W., Zhu D., Wang X., Hu Y., Qiu Y., Wang C., Hu Y., Kapoor A., Scherer S., TartanAir: A dataset to push the limits of visual SLAM, 2020, arXiv preprint arXiv:2003.14338.

[32]

Souza M.R., Pedrini H., Digital video stabilization based on adaptive camera trajectory smoothing, EURASIP J. Image Video Process. 2018 (1) (2018) 37.

[33]

Zhang L., Zheng Q.-Z., Liu H.-K., Huang H., Full-reference stability assessment of digital video stabilization based on Riemannian metric, IEEE Trans. Image Process. 27 (12) (2018) 6051–6063.

[34]

Zhang L., Zheng Q., Huang H., Intrinsic motion stability assessment for video stabilization, IEEE Trans. Vis. Comput. Graphics 25 (4) (2018) 1681–1692.

[35]

Fischler M.A., Bolles R.C., Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24 (6) (1981) 381–395.

Digital Library

[36]

Chen B.-Y., Lee K.-Y., Huang W.-T., Lin J.-S., Capturing intention-based full-frame video stabilization, Comput. Graph. Forum 27 (7) (2008) 1805–1814.

[37]

Rublee E., Rabaud V., Konolige K., Bradski G., ORB: An efficient alternative to SIFT or SURF, in: IEEE International Conference on Computer Vision, IEEE, 2011, pp. 2564–2571.

[38]

Grundmann M., Kwatra V., Essa I., Auto-directed video stabilization with robust L1 optimal camera paths, in: Conference on Computer Vision and Pattern Recognition, IEEE, 2011, pp. 225–232.

[39]

Okade M., Biswas P.K., Video stabilization using maximally stable extremal region features, Multimedia Tools Appl. 68 (3) (2014) 947–968.

[40]

Kim S.W., Yin S., Yun K., Choi J.Y., Spatio-temporal weighting in local patches for direct estimation of camera motion in video stabilization, Comput. Vis. Image Underst. 118 (2014) 71–83.

[41]

Liu S., Yuan L., Tan P., Sun J., Bundled camera paths for video stabilization, ACM Trans. Graph. 32 (4) (2013) 1–10.

[42]

Liu S., Yuan L., Tan P., Sun J., Steadyflow: Spatially smooth optical flow for video stabilization, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 4209–4216.

[43]

Liu S., Tan P., Yuan L., Sun J., Zeng B., Meshflow: Minimum latency online video stabilization, in: European Conference on Computer Vision, Springer, 2016, pp. 800–815.

[44]

Li S., Yuan L., Sun J., Quan L., Dual-feature warping-based motion model estimation, in: IEEE International Conference on Computer Vision, 2015, pp. 4283–4291.

[45]

Souza M.R., Pedrini H., Combination of local feature detection methods for digital video stabilization, Signal Image Video Process. 12 (8) (2018) 1513–1521.

[46]

Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P., Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612.

Digital Library

[47]

Trucco E., Verri A., Introductory Techniques for 3-D Computer Vision, Vol. 201, Prentice Hall, Englewood Cliffs, 1998.

[48]

Zhai M., Xiang X., Lv N., Kong X., Optical flow and scene flow estimation: A survey, Pattern Recognit. 114 (2021) 1–23.

[49]

Wang Y.-S., Liu F., Hsu P.-S., Lee T.-Y., Spatially and temporally optimized video stabilization, IEEE Trans. Vis. Comput. Graphics 19 (8) (2013) 1354–1361.

[50]

Schönberger J.L., Zheng E., Frahm J.-M., Pollefeys M., Pixelwise view selection for unstructured multi-view stereo, in: European Conference on Computer Vision, Springer, 2016, pp. 501–518.

[51]

Kar A., Häne C., Malik J., Learning a multi-view stereo machine, in: Advances in Neural Information Processing Systems, 2017, pp. 365–376.

[52]

Schops T., Schonberger J.L., Galliani S., Sattler T., Schindler K., Pollefeys M., Geiger A., A multi-view stereo benchmark with high-resolution images and multi-camera videos, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3260–3269.

[53]

Cordts M., Omran M., Ramos S., Rehfeld T., Enzweiler M., Benenson R., Franke U., Roth S., Schiele B., The cityscapes dataset for semantic urban scene understanding, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223.

[54]

Silberman N., Hoiem D., Kohli P., Fergus R., Indoor segmentation and support inference from RGBD images, in: European Conference on Computer Vision, 2012, pp. 746–760.

Digital Library

[55]

Butler D.J., Wulff J., Stanley G.B., Black M.J., A naturalistic open source movie for optical flow evaluation, in: Fitzgibbon A., Lazebnik S., Perona P., Sato Y., Schmid C. (Eds.), European Conference on Computer Vision, in: Part IV, LNCS 7577, Springer-Verlag, 2012, pp. 611–625.

[56]

Gonzalez R.C., Woods R.E., Digital Image Processing, Prentice Hall, Upper Saddle River, NJ, 2002.

Digital Library

[57]

Morimoto C., Chellappa R., Evaluation of image stabilization algorithms, in: DARPA Image Understanding Workshop, 1997, pp. 295–302.

Recommendations

Spatio-temporal weighting in local patches for direct estimation of camera motion in video stabilization

This paper presents a robust video stabilization method by solving a novel formulation for the camera motion estimation. We introduce spatio-temporal weighting on local patches in optimization formulation, which enables one-step direct estimation ...
Fast Video Stabilization in the Compressed Domain
ICME '12: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo

Video stabilization is an important technique in present day digital cameras as most of the cameras are hand-held, mounted on moving platforms or subjected to atmospheric vibrations. Motion estimation is a bottleneck in the stabilization pipeline as it ...
Robust Camera Motion Estimation for Point-of-View Video Stabilization
Virtual, Augmented and Mixed Reality
Abstract
Point-of-View videos recorded by Augmented Reality Glasses contain jitters because they are acquired under users’ actions in varying environments. Applying video stabilization on such videos is difficult due to weakness of conventional keypoint-...

Comments

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 559, Issue C

Nov 2023

442 pages

ISSN:0925-2312

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 28 November 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents