Article

Unified Image and Video Saliency Modeling

Authors:

Richard Droste,

J. Alison NobleAuthors Info & Claims

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V

Pages 419 - 435

https://doi.org/10.1007/978-3-030-58558-7_25

Published: 23 August 2020 Publication History

Abstract

Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques—Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN—in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal.

References

[1]

Bak C, Kocak A, Erdem E, and Erdem A Spatio-temporal saliency networks for dynamic saliency prediction IEEE TMM 2017 20 7 1688-1698

[2]

Borji, A.: Saliency prediction in the deep learning era: an empirical investigation. arXiv:1810.03716 (2018)

[3]

Borji A and Itti L State-of-the-art in visual attention modeling IEEE TPAMI 2012 35 1 185-207

Digital Library

[4]

Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: NeurIPS (2016)

[5]

Bylinskii Z, Judd T, Oliva A, Torralba A, and Durand F What do different evaluation metrics tell us about saliency models? IEEE TPAMI 2019 41 3 740-757

Digital Library

[6]

Chang, W.G., You, T., Seo, S., Kwak, S., Han, B.: Domain-specific batch normalization for unsupervised domain adaptation. In: CVPR (2019)

[7]

Cornia M, Baraldi L, Serra G, and Cucchiara R Predicting human eye fixations via an LSTM-based saliency attentive model IEEE TIP 2016 27 10 5142-5154

[8]

Fang Y, Wang Z, Lin W, and Fang Z Video saliency incorporating spatiotemporal cues and uncertainty weighting IEEE TIP 2014 23 9 3910-3921

[9]

Gal, Y., Ghahramani, Z.: A Theoretically grounded application of dropout in recurrent neural networks. In: NeurIPS (2016)

[10]

Gorji, S., Clark, J.J.: Going from image to video saliency: augmenting image salience with dynamic attentional push. In: CVPR (2018)

[11]

Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR (2008)

[12]

Guo C and Zhang L A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression IEEE TIP 2009 19 1 185-198

[13]

Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NeurIPS (2007)

[14]

Hossein Khatoonabadi, S., Vasconcelos, N., Bajic, I.V., Shan, Y.: How many bits does it take for a stimulus to be salient? In: CVPR (2015)

[15]

Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. In: NeurIPS (2009)

[16]

Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: ICCV (2015)

[17]

Itti L, Koch C, and Niebur E A model of saliency-based visual attention for rapid scene analysis IEEE TPAMI 1998 20 11 1254-1259

Digital Library

[18]

Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: CVPR (2016)

[19]

Jiang, L., Xu, M., Liu, T., Qiao, M., Wang, Z.: DeepVS: a deep learning based video saliency prediction approach. In: ECCV (2018)

[20]

Jiang, M., Huang, S., Duan, J., Zhao, Q.: SALICON: saliency in context. In: CVPR (2015)

[21]

Judd, T., Durand, F., Torralba, A.: A Benchmark of computational models of saliency to predict human fixations. In: MIT-CSAIL-TR-2012, vol. 1, pp. 1–7 (2012)

[22]

Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)

[23]

Kruthiventi SSS, Ayush K, and Babu RV DeepFix: a fully convolutional neural network for predicting human eye fixations IEEE TIP 2015 26 9 4446-4456

[24]

Kümmerer, M., Wallis, T.S.A., Bethge, M.: DeepGaze II: reading fixations from deep features trained on object recognition. arXiv:1610.01563 (2016)

[25]

Lai Q, Wang W, Sun H, and Shen J Video saliency prediction using spatiotemporal residual attentive networks IEEE TIP 2019 26 1113-1126

[26]

Le Meur O, Le Callet P, Barba D, and Thoreau D A coherent computational approach to model bottom-up visual attention IEEE TPAMI 2006 28 5 802-817

Digital Library

[27]

Leboran V, Garcia-Diaz A, Fdez-Vidal XR, and Pardo XM Dynamic whitening saliency IEEE TPAMI 2016 39 5 893-907

Digital Library

[28]

Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. In: ICLR (2016)

[29]

Linardos, P., Mohedano, E., Nieto, J.J., McGuinness, K., Giro-i Nieto, X., O’Connor, N.E.: Simple vs complex temporal recurrences for video saliency prediction. In: BMVC (2019)

[30]

Liu J, Shahroudy A, Xu D, and Wang G Leibe B, Matas J, Sebe N, and Welling M Spatio-temporal LSTM with trust gates for 3D human action recognition Computer Vision – ECCV 2016 2016 Cham Springer 816-833

[31]

Maaten LVD and Hinton G Visualizing data using t-SNE J. Mach. Learn. Res. 2008 9 Nov 2579-2605

[32]

Mahadevan V and Vasconcelos N Spatiotemporal saliency in dynamic scenes IEEE TPAMI 2009 32 1 171-177

[33]

Marat S, Phuoc TH, Granjon L, Guyader N, Pellerin D, and Guérin-Dugué A Modelling spatio-temporal saliency to predict gaze direction for short videos Int. J. Comput. Vis. 2009 82 3 231

Digital Library

[34]

Mathe S and Sminchisescu C Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition IEEE TPAMI 2015 37 7 1408-1424

Digital Library

[35]

Min, K., Corso, J.J.: TASED-net: temporally-aggregating spatial encoder-decoder network for video saliency detection. In: ICCV (2019)

[36]

Pan, J., et al.: SaLGAN: visual saliency prediction with generative adversarial networks. arXiv:1701.01081 (2017)

[37]

Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: CVPR (2016)

[38]

Rozantsev A, Salzmann M, and Fua P Beyond sharing weights for deep domain adaptation IEEE TPAMI 2019 41 4 801-814

Digital Library

[39]

Rudoy, D., Goldman, D.B., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)

[40]

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)

[41]

Seo HJ and Milanfar P Static and space-time visual saliency detection by self-resemblance J. Vis. 2009 9 12 15-15

[42]

Sun Y and Fisher R Object-based visual attention for computer vision Artif. Intell. 2003 146 1 77-123

Digital Library

[43]

Tsai, J.C., Chien, J.T.: Adversarial domain separation and adaptation. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2017)

[44]

Valipour, S., Siam, M., Jagersand, M., Ray, N.: Recurrent fully convolutional networks for video segmentation. In: IEEE WACV, pp. 29–36 (2017)

[45]

Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: CVPR (2014)

[46]

Wang W and Shen J Deep visual attention prediction IEEE TIP 2017 27 5 2368-2378

[47]

Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: a large-scale benchmark and a new model. In: CVPR (2018)

[48]

Wang, W., Shen, J., Xie, J., Cheng, M.M., Ling, H., Borji, A.: Revisiting video saliency prediction in the deep learning era. IEEE TPAMI (2019, early access)

[49]

Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. arXiv:1604.07528 (2016)

[50]

Yang, S., Lin, G., Jiang, Q., Lin, W.: A dilated inception network for visual saliency prediction. IEEE TMM 22(8), 2163–2176 (2020)

[51]

Zheng, Q., Jiao, J., Cao, Y., Lau, R.W.: Task-driven webpage saliency. In: ECCV (2018)

[52]

Zhong, S.H., Liu, Y., Ren, F., Zhang, J., Ren, T.: Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI (2013)

Cited By

Wen SQiao MJiang LXu MDeng XLi SLi JGao XLe Callet PJanowski LLu WYang JWang JLi JZhang J(2024)MT-VQA: A Multi-task Approach for Quality Assessment of Short-form VideosProceedings of the 3rd Workshop on Quality of Experience in Visual Multimedia Applications10.1145/3689093.3689181(30-38)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689093.3689181
Xie JLiu ZLi GSong Y(2024)Audio-visual saliency prediction with multisensory perception and integrationImage and Vision Computing10.1016/j.imavis.2024.104955143:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104955
Moradi MMoradi MRundo FSpampinato CBorji APalazzo S(2024)SalFoM: Dynamic Saliency Prediction with Video Foundation ModelsPattern Recognition10.1007/978-3-031-78312-8_3(33-48)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/978-3-031-78312-8_3
Show More Cited By

Recommendations

A Novel Visual Saliency Model for Surveillance Video Compression
SITIS '11: Proceedings of the 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems

Human visual system is very fast at detectingsalient information of a scene. This detection mechanism ishardwired into our HVS. In many applications there is aneed to find a robust visual saliency detection method thatmimics this detection mechanism in ...
Linking visual saliency deviation to image quality degradation: A saliency deviation-based image quality index
Abstract
Advances in image quality research have shown the benefits of modeling functional components of the human visual system in image quality metrics. Recently, visual saliency, an important aspect of the human visual system, is ...
Highlights
- Visual quality degradation is linked and modelled with saliency deviation for the first time.
Visual saliency detection based on region descriptors and prior knowledge

Visual saliency detection not only plays a significant role, but it is also a challenging task in computer vision. In this paper we propose a new method for saliency detection. It incorporates visual features and spatial information with a guidance of ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V

Aug 2020

843 pages

ISBN:978-3-030-58557-0

DOI:10.1007/978-3-030-58558-7

Editors:
Andrea Vedaldi
University of Oxford, Oxford, UK
,
Horst Bischof
Graz University of Technology, Graz, Austria
,
Thomas Brox
University of Freiburg, Freiburg im Breisgau, Germany
,
Jan-Michael Frahm
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

© Springer Nature Switzerland AG 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wen SQiao MJiang LXu MDeng XLi SLi JGao XLe Callet PJanowski LLu WYang JWang JLi JZhang J(2024)MT-VQA: A Multi-task Approach for Quality Assessment of Short-form VideosProceedings of the 3rd Workshop on Quality of Experience in Visual Multimedia Applications10.1145/3689093.3689181(30-38)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689093.3689181
Xie JLiu ZLi GSong Y(2024)Audio-visual saliency prediction with multisensory perception and integrationImage and Vision Computing10.1016/j.imavis.2024.104955143:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104955
Moradi MMoradi MRundo FSpampinato CBorji APalazzo S(2024)SalFoM: Dynamic Saliency Prediction with Video Foundation ModelsPattern Recognition10.1007/978-3-031-78312-8_3(33-48)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/978-3-031-78312-8_3
Liu JZhang BCao X(2024)ROI-Aware Dynamic Network Quantization for Neural Video CompressionPattern Recognition10.1007/978-3-031-78169-8_22(333-349)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/978-3-031-78169-8_22
Aydemir BBhattacharjee DZhang TSalzmann MSüsstrunk S(2024)Data Augmentation via Latent Diffusion for Saliency PredictionComputer Vision – ECCV 202410.1007/978-3-031-73229-4_21(360-377)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73229-4_21
Hiroe MYamamoto MNagamatsu T(2023)Implicit User Calibration for Gaze-tracking Systems Using Saliency Maps Filtered by Eye MovementsProceedings of the 2023 Symposium on Eye Tracking Research and Applications10.1145/3588015.3589204(1-5)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588015.3589204
Yang ZRen SWu ZZhao NWang JQin JHe SEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic VideosProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611839(2294-2304)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611839
Jiang YLeiva LRezazadegan Tavakoli HR. B. Houssel PKylmälä JOulasvirta A(2023)UEyes: Understanding Visual Saliency across User Interface TypesProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581096(1-21)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581096
Zhang YZhang TWu CZheng Y(2023)Accurate video saliency prediction via hierarchical fusion and temporal recurrenceImage and Vision Computing10.1016/j.imavis.2023.104744136:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.imavis.2023.104744
Agrawal RJyoti SGirmaji RSivaprasad SGandhi V(2022)Does Audio help in deep Audio-Visual Saliency prediction models?Proceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556625(48-56)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3536221.3556625
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents