skip to main content
10.1145/1180639.1180824acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Visual attention detection in video sequences using spatiotemporal cues

Published: 23 October 2006 Publication History

Abstract

Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract our first sight than their surrounding neighbors. In this paper, we propose a spatiotemporal video attention detection technique for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention model. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which is estimated by applying RANSAC on point correspondences in the scene. To compensate the non-uniformity of spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attention model, a fast method for computing pixel-level saliency maps has been developed using color histograms of images. A hierarchical spatial attention representation is established to reveal the interesting points in images as well as the interesting regions. Finally, a dynamic fusion technique is applied to combine both the temporal and spatial saliency maps, where temporal attention is dominant over the spatial model when large motion contrast exists, and vice versa. The proposed spatiotemporal attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

References

[1]
J.C. Baccon, L. Hafemeister and P. Gaussier, "A Context and Task Dependent Visual Attention System to Control A Mobile Robot", ICIRS, 2002.
[2]
O. Boiman and M. Irani, "Detecting Irregularities in Images and in Video", ICCV, 2005.
[3]
L-Q. Chen, X. Xie, W-Y. Ma, H.J. Zhang and H-Q. Zhou, "Image Adaptation Based on Attention Model for Small-Form-Factor Devices", ICMM, 2003.
[4]
W-H. Cheng, W-T. Chu, J-H. Kuo and J-L. Wu, "Automatic Video Region-of-Interest Determination Based on User Attention Model", ISCS, 2005.
[5]
J.A. Driscoll, R.A. Peters II and K.R. Cave, "A visual attention network for a humanoid robot", ICIRS, 1998.
[6]
J. Duncan and G.W. Humphreys, "Visual Search and Stimulus Similarity", Psychological Review, 1989.
[7]
L.L. Galdino and D.L. Borges, "A Visual Attention Model for Tracking Regions Based on Color Correlograms", Brazilian Symposium on Computer Graphics and Image Processing, 2000.
[8]
J. Han, K.N. Ngan, M. Li and H.J. Zhang, "Towards Unsupervised Attention Object Extraction by Integrating Visual Attention and Object Growing", ICIP, 2004.
[9]
Y. Hu, D. Rajan and L-T. Chia, "Adaptive Local Context Suppression of Multiple Cues for Salient Visual Attention Detection", ICME, 2005.
[10]
L. Itti, C. Koch and E. Niebur, "A Model of Saliency-Based Visual Attention for Rapid Scene Analysis", PAMI, 1998.
[11]
T. Kohonen, "A Computational Model of Visual Attention", IJCNN, 2003.
[12]
O. Le Meur, D. Thoreau, P. Le Callet and D. Barba, "A Spatio-Temporal Model of the Selective Human Visual Attention", ICIP, 2005.
[13]
D.G. Lowe, "Distinctive Image Features From Scale-Invariant Keypoints", IJCV, 2004.
[14]
Z. Lu, W. Lin, X. Yang, E.P. Ong and S. Yao, "Modeling visual attentions modulatory aftereffects on visual sensitivity and quality evaluation", T-IP, 2005.
[15]
Y.F. Ma and H.J. Zhang, "Contrast-Based Image Attention Analysis by Using Fuzzy Growing", ACM Multimedia, 2003.
[16]
R. Milanese, H. Wechsler, S. Gill, J.-M. Bost and T. Pun, "Integration of Bottom-Up and Top-Down Cues for Visual Attention Using Non-Linear Relaxation", CVPR, 1994.
[17]
A. Nguyen, V. Chandrun and S. Sridharan, "Visual Attention Based ROI Maps From Gaze Tracking Data", ICIP, 2004.
[18]
A. Oliva, A. Torralba, M.S. Castelhano and J.M. Henderson, "Top-Down Control of Visual Attention in Object Detection", ICIP, 2003.
[19]
N. Ouerhani and H. Hugli, "Computing Visual Attention from Scene Depth", ICPR, 2000.
[20]
N. Ouerhani, J. Bracamonte, H. Hugli, M. Ansorge and F. Pellandini, "Adaptive Color Image Compression Based on Visual Attention", ICIAP, 2001.
[21]
O. Oyekoya and F. Stentiford, "Exploring human eye behaviour using a model of visual attention", ICPR,2004.
[22]
C. Peters and C.O. Sullivan, "Bottom-Up Visual Attention for Virtual Human Animation", CASA, 2003.
[23]
K. Rapantzikos, N. Tsapatsoulis and Y. Avrithis, "Spatiotemporal Visual Attention Architecture for Video Analysis", MMSP, 2004.
[24]
U. Rutishauser, D. Walther, C. Koch, and P. Perona, "Is Bottom-Up Attention Useful for Object Recognition?", CVPR, 2004.
[25]
F. Stentiford, "A Visual Attention Estimator Applied to Image Subject Enhancement and Colour and Grey Level Compression", ICPR, 2004.
[26]
TRECVID BBC Rushes Data, TREC Video Retrieval Evaluation Forum, NIST, 2005.
[27]
A. Treisman and G. Gelade, "A Feature-Integration Theory of Attention", Cognitive Psychology, 1980.
[28]
X.J. Wang, W.Y. Ma, and X. Li, "Data-driven approach for bridging the cognitive gap in image retrieval", ICME, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '06: Proceedings of the 14th ACM international conference on Multimedia
October 2006
1072 pages
ISBN:1595934472
DOI:10.1145/1180639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. spatiotemporal saliency map
  2. video attention detection

Qualifiers

  • Article

Conference

MM06
MM06: The 14th ACM International Conference on Multimedia 2006
October 23 - 27, 2006
CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)136
  • Downloads (Last 6 weeks)24
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media