Article

Visual attention detection in video sequences using spatiotemporal cues

Authors:

Mubarak ShahAuthors Info & Claims

MM '06: Proceedings of the 14th ACM international conference on Multimedia

Pages 815 - 824

https://doi.org/10.1145/1180639.1180824

Published: 23 October 2006 Publication History

Abstract

Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract our first sight than their surrounding neighbors. In this paper, we propose a spatiotemporal video attention detection technique for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention model. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which is estimated by applying RANSAC on point correspondences in the scene. To compensate the non-uniformity of spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attention model, a fast method for computing pixel-level saliency maps has been developed using color histograms of images. A hierarchical spatial attention representation is established to reveal the interesting points in images as well as the interesting regions. Finally, a dynamic fusion technique is applied to combine both the temporal and spatial saliency maps, where temporal attention is dominant over the spatial model when large motion contrast exists, and vice versa. The proposed spatiotemporal attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

References

[1]

J.C. Baccon, L. Hafemeister and P. Gaussier, "A Context and Task Dependent Visual Attention System to Control A Mobile Robot", ICIRS, 2002.

[2]

O. Boiman and M. Irani, "Detecting Irregularities in Images and in Video", ICCV, 2005.

Digital Library

[3]

L-Q. Chen, X. Xie, W-Y. Ma, H.J. Zhang and H-Q. Zhou, "Image Adaptation Based on Attention Model for Small-Form-Factor Devices", ICMM, 2003.

[4]

W-H. Cheng, W-T. Chu, J-H. Kuo and J-L. Wu, "Automatic Video Region-of-Interest Determination Based on User Attention Model", ISCS, 2005.

[5]

J.A. Driscoll, R.A. Peters II and K.R. Cave, "A visual attention network for a humanoid robot", ICIRS, 1998.

[6]

J. Duncan and G.W. Humphreys, "Visual Search and Stimulus Similarity", Psychological Review, 1989.

[7]

L.L. Galdino and D.L. Borges, "A Visual Attention Model for Tracking Regions Based on Color Correlograms", Brazilian Symposium on Computer Graphics and Image Processing, 2000.

Digital Library

[8]

J. Han, K.N. Ngan, M. Li and H.J. Zhang, "Towards Unsupervised Attention Object Extraction by Integrating Visual Attention and Object Growing", ICIP, 2004.

[9]

Y. Hu, D. Rajan and L-T. Chia, "Adaptive Local Context Suppression of Multiple Cues for Salient Visual Attention Detection", ICME, 2005.

[10]

L. Itti, C. Koch and E. Niebur, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis", PAMI, 1998.

Digital Library

[11]

T. Kohonen, "A Computational Model of Visual Attention", IJCNN, 2003.

[12]

O. Le Meur, D. Thoreau, P. Le Callet and D. Barba, "A Spatio-Temporal Model of the Selective Human Visual Attention", ICIP, 2005.

[13]

D.G. Lowe, "Distinctive Image Features From Scale-Invariant Keypoints", IJCV, 2004.

Digital Library

[14]

Z. Lu, W. Lin, X. Yang, E.P. Ong and S. Yao, "Modeling visual attentions modulatory aftereffects on visual sensitivity and quality evaluation", T-IP, 2005.

[15]

Y.F. Ma and H.J. Zhang, "Contrast-Based Image Attention Analysis by Using Fuzzy Growing", ACM Multimedia, 2003.

Digital Library

[16]

R. Milanese, H. Wechsler, S. Gill, J.-M. Bost and T. Pun, "Integration of Bottom-Up and Top-Down Cues for Visual Attention Using Non-Linear Relaxation", CVPR, 1994.

[17]

A. Nguyen, V. Chandrun and S. Sridharan, "Visual Attention Based ROI Maps From Gaze Tracking Data", ICIP, 2004.

[18]

A. Oliva, A. Torralba, M.S. Castelhano and J.M. Henderson, "Top-Down Control of Visual Attention in Object Detection", ICIP, 2003.

[19]

N. Ouerhani and H. Hugli, "Computing Visual Attention from Scene Depth", ICPR, 2000.

[20]

N. Ouerhani, J. Bracamonte, H. Hugli, M. Ansorge and F. Pellandini, "Adaptive Color Image Compression Based on Visual Attention", ICIAP, 2001.

[21]

O. Oyekoya and F. Stentiford, "Exploring human eye behaviour using a model of visual attention", ICPR,2004.

Digital Library

[22]

C. Peters and C.O. Sullivan, "Bottom-Up Visual Attention for Virtual Human Animation", CASA, 2003.

Digital Library

[23]

K. Rapantzikos, N. Tsapatsoulis and Y. Avrithis, "Spatiotemporal Visual Attention Architecture for Video Analysis", MMSP, 2004.

[24]

U. Rutishauser, D. Walther, C. Koch, and P. Perona, "Is Bottom-Up Attention Useful for Object Recognition?", CVPR, 2004.

Digital Library

[25]

F. Stentiford, "A Visual Attention Estimator Applied to Image Subject Enhancement and Colour and Grey Level Compression", ICPR, 2004.

Digital Library

[26]

TRECVID BBC Rushes Data, TREC Video Retrieval Evaluation Forum, NIST, 2005.

[27]

A. Treisman and G. Gelade, "A Feature-Integration Theory of Attention", Cognitive Psychology, 1980.

[28]

X.J. Wang, W.Y. Ma, and X. Li, "Data-driven approach for bridging the cognitive gap in image retrieval", ICME, 2004.

Cited By

Zhao JKong LLv J(2025)An Overview of Deep Neural Networks for Few-Shot LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200498:1(145-188)Online publication date: Feb-2025
https://doi.org/10.26599/BDMA.2024.9020049
Liu WCui RLi YZhang S(2025)Hybrid-Input Convolutional Neural Network-Based Underwater Image Quality AssessmentIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332834036:1(1790-1798)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3328340
Yang YWang CGong LWu MChen ZLi XChen XZhou X(2025)Knowledge Probabilization in Ensemble Distillation: Improving Accuracy and Uncertainty Quantification for Object DetectorsIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34746546:1(221-233)Online publication date: Jan-2025
https://doi.org/10.1109/TAI.2024.3474654
Show More Cited By

Index Terms

Visual attention detection in video sequences using spatiotemporal cues
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
  2. Computer graphics
    1. Graphics systems and interfaces
      1. Perception

Recommendations

Video image assessment with a distortion-weighing spatiotemporal visual attention model

For the purpose of extracting attention regions from distorted videos, a distortion-weighing spatiotemporal visual attention model is proposed. On the impact of spatial and temporal saliency maps, visual attention regions are acquired directed in a ...
Keyframe-Based Video Summary Using Visual Attention Clues

A visual attention index descriptor based on a visual attention model bridges the semantic gap between low-level descriptors used by computers and high-level concepts perceived by humans.
Attention prediction in egocentric video using motion and visual saliency
PSIVT'11: Proceedings of the 5th Pacific Rim conference on Advances in Image and Video Technology - Volume Part I

We propose a method of predicting human egocentric visual attention using bottom-up visual saliency and egomotion information. Computational models of visual saliency are often employed to predict human attention; however, its mechanism and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '06: Proceedings of the 14th ACM international conference on Multimedia

October 2006

1072 pages

ISBN:1595934472

DOI:10.1145/1180639

General Chairs:
Klara Nahrstedt
UIUC
,
Matthew Turk
UCSB
,
Program Chairs:
Yong Rui
Microsoft Research
,
Wolfgang Klas
Universität Wien
,
Ketan Mayer-Patel
UNC

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM06

Sponsor:

MM06: The 14th ACM International Conference on Multimedia 2006

October 23 - 27, 2006

CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

716
Total Citations
View Citations
3,959
Total Downloads

Downloads (Last 12 months)136
Downloads (Last 6 weeks)24

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao JKong LLv J(2025)An Overview of Deep Neural Networks for Few-Shot LearningBig Data Mining and Analytics10.26599/BDMA.2024.90200498:1(145-188)Online publication date: Feb-2025
https://doi.org/10.26599/BDMA.2024.9020049
Liu WCui RLi YZhang S(2025)Hybrid-Input Convolutional Neural Network-Based Underwater Image Quality AssessmentIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332834036:1(1790-1798)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3328340
Yang YWang CGong LWu MChen ZLi XChen XZhou X(2025)Knowledge Probabilization in Ensemble Distillation: Improving Accuracy and Uncertainty Quantification for Object DetectorsIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34746546:1(221-233)Online publication date: Jan-2025
https://doi.org/10.1109/TAI.2024.3474654
Mignotte M(2024)Fusion of Color-Based Multi-Dimensional Scaling Maps For Saliency EstimationDigital Image Processing - Latest Advances and Applications10.5772/intechopen.113077Online publication date: 2-Apr-2024
https://doi.org/10.5772/intechopen.113077
Liu XWang ZGao HLi XWang LMiao Q(2024)HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention TransformerRemote Sensing10.3390/rs1605080316:5(803)Online publication date: 25-Feb-2024
https://doi.org/10.3390/rs16050803
Liu ZYang YWu KLiu QXu XMa XTang J(2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366589320:9(1-23)Online publication date: 23-May-2024
https://dl.acm.org/doi/10.1145/3665893
Bai RGuo X(2024)Abstract painting image orientation recognition based on eye movement and multitask learningJournal of Electronic Imaging10.1117/1.JEI.33.2.02301833:02Online publication date: 1-Mar-2024
https://doi.org/10.1117/1.JEI.33.2.023018
Liu NLuo ZZhang NHan J(2024)VST++: Efficient and Stronger Visual Saliency TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338815346:11(7300-7316)Online publication date: Nov-2024
https://doi.org/10.1109/TPAMI.2024.3388153
Liu KZhang BLu JYan H(2024)Toward Integrity and Detail With Ensemble Learning for Salient Object Detection in Optical Remote-Sensing ImagesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.340003262(1-15)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3400032
Ye YZhang JZhou LLi JRen XFan J(2024)Optical and SAR Image Fusion Based on Complementary Feature Decomposition and Visual Saliency FeaturesIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.336651962(1-15)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3366519
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents