research-article

Multi-Scale Spatiotemporal Conv-LSTM Network for Video Saliency Detection

Authors:

Xia LiAuthors Info & Claims

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

Pages 362 - 369

https://doi.org/10.1145/3206025.3206052

Published: 05 June 2018 Publication History

Abstract

Recently, deep neural networks have been crucial techniques for image salient detection. However, two difficulties prevent the development of deep learning in video saliency detection. The first one is that the traditional static network cannot conduct a robust motion estimation in videos. The other is that the data-driven deep learning is in lack of sufficient manually annotated pixel-wise ground truths for video saliency network training. In this paper, we propose a multi-scale spatiotemporal convolutional LSTM network (MSST-ConvLSTM) to incorporate spatial and temporal cues for video salient objects detection. Furthermore, as manually pixel-wised labeling is very time-consuming, we sign lots of coarse labels, which are mixed with fine labels to train a robust saliency prediction model. Experiments on the widely used challenging benchmark datasets (e.g., FBMS and DAVIS) demonstrate that the proposed approach has competitive performance of video saliency detection compared with the state-of-the-art saliency models.

References

[1]

Thomas Brox and Jitendra Malik. 2010. Object segmentation by long term analysis of point trajectories Proceedings of the European Conference on Computer Vision. Springer, 282--295.

Digital Library

[2]

Chenglizhao Chen, Shuai Li, Yongguang Wang, Hong Qin, and Aimin Hao. 2017. Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion. IEEE Transactions on Image Processing Vol. 26, 7 (2017), 3156--3170.

Digital Library

[3]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint:1412.7062 (2014).

[4]

Ming-Ming Cheng, Niloy J Mitra, Xiaolei Huang, Philip HS Torr, and Shi-Min Hu. 2015. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, 3 (2015), 569--582.

Digital Library

[5]

Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.

[6]

Ionut C. Duta, Bogdan Ionescu, Kiyoharu Aizawa, and Nicu Sebe. 2017. Simple, Efficient and Effective Encodings of Local Deep Features for Video Action Recognition. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 218--225.

Digital Library

[7]

Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision Vol. 59, 2 (2004), 167--181.

Digital Library

[8]

Huazhu Fu, Xiaochun Cao, and Zhuowen Tu. 2013. Cluster-based co-saliency detection. IEEE Transactions on Image Processing Vol. 22, 10 (2013), 3766--3778.

Digital Library

[9]

Dashan Gao, Vijay Mahadevan, and Nuno Vasconcelos. 2008. The discriminant center-surround hypothesis for bottom-up saliency Proceedings of the Advances in Neural Information Processing Systems. 497--504.

Digital Library

[10]

Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Zhuowen Tu, and A Borji. 2017. Deeply supervised salient object detection with short connections Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE.

[11]

Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, and Shipeng Li. 2013. Salient object detection: A discriminative regional feature integration approach Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2083--2090.

Digital Library

[12]

Hansang Kim, Youngbae Kim, Jae-Young Sim, and Chang-Su Kim. 2015. Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Transactions on Image Processing Vol. 24, 8 (2015), 2552--2564.

Digital Library

[13]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).

[14]

Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M Rehg. 2013. Video segmentation by tracking many figure-ground segments Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2192--2199.

Digital Library

[15]

Guanbin Li and Yizhou Yu. 2016 a. Deep contrast learning for salient object detection Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 478--487.

[16]

Guanbin Li and Yizhou Yu. 2016 b. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing Vol. 25, 11 (2016), 5012--5024.

Digital Library

[17]

Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, and Jingdong Wang. 2016. DeepSaliency: Multi-Task deep neural network model for salient object detection. IEEE Transactions on Image Processing Vol. 25, 8 (2016), 3919--3930.

[18]

Nian Liu and Junwei Han. 2016. Dhsnet: Deep hierarchical saliency network for salient object detection Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 678--686.

[19]

Wu Liu, Cheng Zhang, Huadong Ma, and Shuangqun Li. 2018. Learning Efficient Spatial-Temporal Gait Features with Deep Learning for Human Identification. Neuroinformatics (2018), 1--15.

[20]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3431--3440.

[21]

Anestis Papazoglou and Vittorio Ferrari. 2013. Fast object segmentation in unconstrained video. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1777--1784.

Digital Library

[22]

Yuxin Peng, Yunzhen Zhao, and Junchao Zhang. 2017. Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification. arXiv preprint arXiv:1711.03273 (2017).

[23]

Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung . 2012. Saliency filters: Contrast based filtering for salient region detection Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 733--740.

Digital Library

[24]

Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 724--732.

[25]

Esa Rahtu, Juho Kannala, Mikko Salo, and Janne Heikkilä. 2010. Segmenting salient objects from images and videos. In Proceedings of the European Conference on Computer Vision. Springer, 366--379.

Digital Library

[26]

Hae Jong Seo and Peyman Milanfar. 2009. Static and space-time visual saliency detection by self-resemblance. Journal of vision Vol. 9, 12 (2009), 15--15.

[27]

Lijun Wang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang. 2015 a. Deep networks for saliency detection via local estimation and global search Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3183--3192.

[28]

Linzhao Wang, Lijun Wang, Huchuan Lu, Pingping Zhang, and Xiang Ruan. 2016. Saliency detection with recurrent fully convolutional networks Proceedings of the European Conference on Computer Vision. Springer, 825--841.

[29]

Wenguan Wang, Jianbing Shen, and Fatih Porikli. 2015 b. Saliency-aware geodesic video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3395--3402.

[30]

Wenguan Wang, Jianbing Shen, and Ling Shao. 2015 c. Consistent video saliency using local gradient flow optimization and global refinement. IEEE Transactions on Image Processing Vol. 24, 11 (2015), 4185--4196.

Digital Library

[31]

Wenguan Wang, Jianbing Shen, and Ling Shao. 2018. Video salient object detection via fully convolutional networks. IEEE Transactions on Image Processing Vol. 27, 1 (2018), 38--49.

[32]

Yichen Wei, Fang Wen, Wangjiang Zhu, and Jian Sun. 2012. Geodesic saliency using background priors. In Proceedings of the European Conference on Computer Vision. Springer, 29--42.

Digital Library

[33]

Dayan Wu, Zheng Lin, Bo Li, Mingzhen Ye, and Weiping Wang. 2017. Deep Supervised Hashing for Multi-Label and Large-Scale Image Retrieval Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 150--158.

Digital Library

[34]

Yun Zhai and Mubarak Shah. 2006. Visual attention detection in video sequences using spatiotemporal cues Proceedings of the ACM international conference on Multimedia. ACM, 815--824.

Digital Library

[35]

Rui Zhao, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2015. Saliency detection by multi-context deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1265--1274.

[36]

Yunzhen Zhao and Yuxin Peng. 2017. Saliency-guided video classification via adaptively weighted learning Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 847--852.

[37]

Feng Zhou, Sing Bing Kang, and Michael F Cohen. 2014. Time-mapping using space-time saliency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3358--3365.

Digital Library

[38]

Wenbin Zou and Nikos Komodakis. 2015. HARF: Hierarchy-Associated Rich Features for Salient Object Detection Proceedings of the IEEE International Conference on Computer Vision. IEEE, 406--414.

Digital Library

[39]

Wenbin Zou, Kidiyo Kpalma, Zhi Liu, and Joseph Ronsin. 2013. Segmentation driven low-rank matrix recovery for saliency detection Proceedings of the British Machine Vision on Conference. 1--13.

[40]

Wenbin Zou, Zhi Liu, Kidiyo Kpalma, Joseph Ronsin, Yong. Zhao, and Nikos Komodakis. 2015. Unsupervised Joint Salient Region Detection and Object Segmentation. IEEE Transactions on Image Processing Vol. 24, 11 (2015), 3858--3873.

Digital Library

Cited By

Singh HVerma MCheruku R(2024)DSFNet: Video Salient Object Detection Using a Novel Lightweight Deformable Separable Fusion NetworkIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.347004573(1-12)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3470045
Liu BMu KXu MWang FFeng L(2022)A novel spatiotemporal attention enhanced discriminative network for video salient object detectionApplied Intelligence10.1007/s10489-021-02649-z52:6(5922-5937)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10489-021-02649-z
Turan MErzin E(2021)Domain Adaptation for Food Intake Classification With Teacher/Student LearningIEEE Transactions on Multimedia10.1109/TMM.2020.303831523(4220-4231)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3038315
Show More Cited By

Index Terms

Multi-Scale Spatiotemporal Conv-LSTM Network for Video Saliency Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Saliency detection for stereoscopic video
MMSys '13: Proceedings of the 4th ACM Multimedia Systems Conference

We present a novel system for automatically detecting salient image regions in stereoscopic videos. Our proposed algorithm considers information based on three dimensions: salient colors in individual frames, salient information derived from camera and ...
A Novel Visual Saliency Model for Surveillance Video Compression
SITIS '11: Proceedings of the 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems

Human visual system is very fast at detectingsalient information of a scene. This detection mechanism ishardwired into our HVS. In many applications there is aneed to find a robust visual saliency detection method thatmimics this detection mechanism in ...
Unified Image and Video Saliency Modeling
Computer Vision – ECCV 2020
Abstract
Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

June 2018

550 pages

ISBN:9781450350464

DOI:10.1145/3206025

Conference Chairs:
Kiyoharu Aizawa
The Univ. of Tokyo, Japan
,
Michael Lew
Leiden Univ., Netherlands
,
Shin'ichi Satoh
National Inst. of Informatics, Japan

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '18

Sponsor:

SIGMM

ICMR '18: International Conference on Multimedia Retrieval

June 11 - 14, 2018

Yokohama, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
605
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)6

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh HVerma MCheruku R(2024)DSFNet: Video Salient Object Detection Using a Novel Lightweight Deformable Separable Fusion NetworkIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.347004573(1-12)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3470045
Liu BMu KXu MWang FFeng L(2022)A novel spatiotemporal attention enhanced discriminative network for video salient object detectionApplied Intelligence10.1007/s10489-021-02649-z52:6(5922-5937)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10489-021-02649-z
Turan MErzin E(2021)Domain Adaptation for Food Intake Classification With Teacher/Student LearningIEEE Transactions on Multimedia10.1109/TMM.2020.303831523(4220-4231)Online publication date: 2021
https://doi.org/10.1109/TMM.2020.3038315
Wang BLiu WHan GHe S(2020)Learning Long-Term Structural Dependencies for Video Salient Object DetectionIEEE Transactions on Image Processing10.1109/TIP.2020.302359129(9017-9031)Online publication date: 2020
https://doi.org/10.1109/TIP.2020.3023591
Ramaswamy ASeemakurthy KGubbi JPurushothaman B(2020)Spatio-temporal action detection and localization using a hierarchical LSTM2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW50498.2020.00390(3303-3312)Online publication date: Jun-2020
https://doi.org/10.1109/CVPRW50498.2020.00390
Startsev MDorr M(2020)Supersaliency: A Novel Pipeline for Predicting Smooth Pursuit-Based Attention Improves Generalisability of Video SaliencyIEEE Access10.1109/ACCESS.2019.29618358(1276-1289)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2019.2961835
He ZChow CZhang J(2019)STCNN: A Spatio-Temporal Convolutional Neural Network for Long-Term Traffic Prediction2019 20th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM.2019.00-53(226-233)Online publication date: Jun-2019
https://doi.org/10.1109/MDM.2019.00-53
Guan JLai RXiong A(2019)Learning Spatiotemporal Features for Single Image Stripe Noise RemovalIEEE Access10.1109/ACCESS.2019.29442397(144489-144499)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2944239

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents