research-article

Hierarchical space-time model enabling efficient search for human actions

Authors:

Dirk B. Walther,

Thomas S. HuangAuthors Info & Claims

IEEE Transactions on Circuits and Systems for Video Technology, Volume 19, Issue 6

Pages 808 - 820

https://doi.org/10.1109/TCSVT.2009.2017399

Published: 01 June 2009 Publication History

Abstract

We propose a five-layer hierarchical space-time model (HSTM) for representing and searching human actions in videos. From a features point of view, both invariance and selectivity are desirable characteristics, which seem to contradict each other. To make these characteristics coexist, we introduce a coarse-to-fine search and verification scheme for action searching, based on the HSTM model. Because going through layers of the hierarchy corresponds to progressively turning the knob between invariance and selectivity, this strategy enables search for human actions ranging from rapid movements of sports to subtle motions of facial expressions. The introduction of the Histogram of Gabor Orientations feature makes the searching for actions go smoothly across the hierarchical layers of the HSTM model. The efficient matching is achieved by applying integral histograms to compute the features in the top two layers. The HSTM model was tested on three selected challenging video sequences and on the KTH human action database. And it achieved improvement over other state-of-theart algorithms. These promising results validate that the HSTM model is both selective and robust for searching human actions.

References

[1]

K. Fukushima. "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biol. Cybern., vol. 36, no. 4, pp. 193-202, 1980.

[2]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278- 2324, 1998.

[3]

M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex." Nat. Neurosci., vol. 2, no. 11, pp. 1019-1025, 1999.

[4]

J. K. Tsotsos. S. M. Culhane, W. Y. K. Wai, Y. Lai, N. Davis, and F. Nufio, "Modeling visual attention via selective tuning," Artificial Intell., vol. 78, no. 1-2, pp. 507-545, 1995.

Digital Library

[5]

D. Hubel and T. Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex," J. Neurophysiol., vol. 160, pp. 106-154, 1962.

[6]

D. Hubel and T. Wiesel, "Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat," J. Neurophysiol., vol. 28, pp. 229-289, 1965.

[7]

F. Porikli, "Integral histogram: A fast way to extract histograms in Cartesian spaces," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 829-836.

[8]

C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: A local SVM approach," in Proc. Intern. Conf. Pattern Recognition, 2004, pp. 32-36.

[9]

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, "Actions as space-time shapes," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 1395-1402.

[10]

A. Yilmaz and M. Shah, "Recognizing human actions in videos acquired by uncalibrated moving cameras," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 150-157.

[11]

N. Vaswani, A. RoyChowdhury, and R. Chellappa, "Activity recognition using the dynamics of the configuration of interacting objects," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2003, pp. 633-640.

[12]

D. Ramanan and D. A. Forsyth, "Automatic annotation of everyday movements," in Proc. Advances Neural Inform. Process. Syst., 2004.

[13]

N. P. Cuntoor and R. Chellappa, "Epitomic representation of human activities," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-8.

[14]

J. C. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words," in Proc. Brit. Mach. Vision Conf., 2006, pp. 1249-1258.

[15]

D. G. Lowe, "Distinctive image features from scale-invariant keypoints," Intern. J. Comput. Vision, vol. 60, no. 2, pp. 91-110, 2004.

[16]

C. Harris and M. J. Stephens, "A combined corner and edge detector," in Proc. 4th Alvey Vision Conf., 1988, pp. 147-151.

[17]

I. Laptev and T. Lindeberg, "Space-time interest points," in Proc. IEEE Intern. Conf. Comput. Vision, 2003, pp. 432-439.

[18]

J. C. Niebles and L. Fei-Fei, "A hierarchical model of shape and appearance for human action classification," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-8.

[19]

S.-F. Wong, T.-K. Kim, and R. Cipolla, "Learning motion categories using both semantic and structural information," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-6.

[20]

P. DolláT, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proc. IEEE Intern. Workshop Visual Surveillance Performance Evaluation Tracking Surveillance, 2005, pp. 65-72.

[21]

Y. Ke, R. Sukthankar, and M. Hebert, "Efficient visual event detection using volumetric features," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 166-173.

[22]

E. Shechtman and M. Irani, "Space-time behavior based correlation," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 405-412.

[23]

O. Chomat and J. L. Crowley, "Probabilistic recognition of activity using local appearance," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 1999, pp. 104-109.

[24]

O. Chomat, J. Martin, and J. L. Crowley, "A probabilistic sensor for the perception and recognition of activities," in Proc. Eur. Conf. Comput. Vision, 2000, pp. 487-503.

[25]

O. Boiman and M. Irani, "Detecting irregularities in images and in video," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 462-469.

[26]

A. A. Efros, A. C. Berg, G. Mori, and J. Malik, "Recognizing action at a distance," in Proc. IEEE Intern. Conf. Comput. Vision, 2003, pp. 726- 733.

[27]

L. Zelnik-Manor and M. Irani, "Event-based analysis of video," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2001, pp. 123-130.

[28]

T.-K. Kim, S.-F. Wong, and R. Cipolla, "Tensor canonical correlation analysis for action classification," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-8.

[29]

T. Serre, L. Wolf, and T. Poggio, "Object recognition with features inspired by visual cortex," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 994-1000.

Digital Library

[30]

T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, "Robust object recognition with cortex-like mechanisms," IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 411-426, 2007.

Digital Library

[31]

J. Mutch and D. G. Lowe, "Multiclass object recognition with sparse, localized features," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2006, pp. 11-18.

Digital Library

[32]

M. A. Giese and T. Poggio, "Neural mechanisms for the recognition of biological movements," Nat. Rev. Neurosci., vol. 4, no. 3, pp. 179-192 2003.

[33]

H. Jhuang, T. Serre, L. Wolf, and T. Poggio, "A biologically inspired system for action recognition," in Proc. IEEE Intern. Conf. Comput. Vision, 2007, pp. 1-8.

[34]

N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 886-893.

[35]

B. Schiele and J. L. Crowley, "Recognition without correspondence using multidimensional receptive field histograms," Intern. J. Comput. Vision, vol. 36, no. 1, pp. 31-50, 2000.

[36]

S. Kullback, Information Theory and Statistics. New York: Dover Publications, 1968.

[37]

A. Agarwal and B. Triggs, "Learning to track 3-D human motion from silhouettes," in Proc. 21st Intern. Conf. Mach. Learning, 2004, pp. 9-16.

[38]

H. Ning, "Hierarchical space-time model enabling efficient search for human actions," 2007. {Online). Available: http://www.ifp.uiuc.edu/~hning2/hstm.htm.

Cited By

Huang SCheng CCheng FLin PHuang W(2017)A Fastest Patchwise Histogram Construction Algorithm based on Cloud-Computing ArchitectureInternational Journal of Web Services Research10.4018/IJWSR.201701010114:1(1-12)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.4018/IJWSR.2017010101
Suriani NWahab M(2017)Human Action Recognition using Bio-inspired Visual Motion EstimationProceedings of the International Conference on Imaging, Signal Processing and Communication10.1145/3132300.3132318(10-14)Online publication date: 26-Jul-2017
https://dl.acm.org/doi/10.1145/3132300.3132318
Zaman KYli-Piipari SHnat TBanerjee AGupta S(2014)Kinematic-based sedentary and light-intensity activity detection for wearable medical applicationsProceedings of the 1st Workshop on Mobile Medical Applications10.1145/2676431.2676433(28-33)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2676431.2676433
Show More Cited By

Index Terms

Hierarchical space-time model enabling efficient search for human actions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding

Index terms have been assigned to the content through auto-classification.

Recommendations

A differential geometric approach to representing the human actions

This paper presents a novel representation for human actions which encodes the variations in the shape and motion of the performing actor. When an actor performs an action, at each time instant, the outer object boundary is projected to the image plane ...
Variable silhouette energy image representations for recognizing human actions

Recognizing human actions is an important topic in the computer vision community. One of the challenges of recognizing human actions is describing for the variability that arises when arbitrary view camera captures human performing actions. In this ...
Recognizing Human Actions in Basketball Video Sequences on the Basis of Global and Local Pairwise Representation

A feature-representation method for recognizing actions in sports videos on the basis of the relationship between human actions and camera motions is proposed. The method involves the following steps: First, keypoint trajectories are extracted as motion ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology

IEEE Transactions on Circuits and Systems for Video Technology Volume 19, Issue 6

June 2009

136 pages

ISSN:1051-8215

Issue’s Table of Contents

Copyright © 2009.

Publisher

IEEE Press

Publication History

Published: 01 June 2009

Revised: 18 May 2008

Received: 05 February 2008

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang SCheng CCheng FLin PHuang W(2017)A Fastest Patchwise Histogram Construction Algorithm based on Cloud-Computing ArchitectureInternational Journal of Web Services Research10.4018/IJWSR.201701010114:1(1-12)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.4018/IJWSR.2017010101
Suriani NWahab M(2017)Human Action Recognition using Bio-inspired Visual Motion EstimationProceedings of the International Conference on Imaging, Signal Processing and Communication10.1145/3132300.3132318(10-14)Online publication date: 26-Jul-2017
https://dl.acm.org/doi/10.1145/3132300.3132318
Zaman KYli-Piipari SHnat TBanerjee AGupta S(2014)Kinematic-based sedentary and light-intensity activity detection for wearable medical applicationsProceedings of the 1st Workshop on Mobile Medical Applications10.1145/2676431.2676433(28-33)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2676431.2676433
Pei LYe MXu PZhao XGuo G(2014)One example based action detection in hough spaceMultimedia Tools and Applications10.1007/s11042-013-1478-972:2(1751-1772)Online publication date: 1-Sep-2014
https://dl.acm.org/doi/10.1007/s11042-013-1478-9
Zhao XLiu YFu YJaimes ASebe NBoujemaa NGatica-Perez DShamma DWorring MZimmermann R(2013)Exploring discriminative pose sub-patterns for effective action classificationProceedings of the 21st ACM international conference on Multimedia10.1145/2502081.2502094(273-282)Online publication date: 21-Oct-2013
https://dl.acm.org/doi/10.1145/2502081.2502094
Liu LShao LRockett P(2013)Human action recognition based on boosted feature selection and naive Bayes nearest-neighbor classificationSignal Processing10.1016/j.sigpro.2012.07.01793:6(1521-1530)Online publication date: 1-Jun-2013
https://dl.acm.org/doi/10.1016/j.sigpro.2012.07.017
Cao LJi RGao YLiu WTian Q(2013)Mining spatiotemporal video patterns towards robust action retrievalNeurocomputing10.1016/j.neucom.2012.06.044105(61-69)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1016/j.neucom.2012.06.044
Kim WLee JKim MOh DKim C(2010)Human action recognition using ordinal measure of accumulated motionEURASIP Journal on Advances in Signal Processing10.1155/2010/2191902010(1-10)Online publication date: 1-Feb-2010
https://dl.acm.org/doi/10.1155/2010/219190

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents