skip to main content
research-article

Hierarchical space-time model enabling efficient search for human actions

Published: 01 June 2009 Publication History

Abstract

We propose a five-layer hierarchical space-time model (HSTM) for representing and searching human actions in videos. From a features point of view, both invariance and selectivity are desirable characteristics, which seem to contradict each other. To make these characteristics coexist, we introduce a coarse-to-fine search and verification scheme for action searching, based on the HSTM model. Because going through layers of the hierarchy corresponds to progressively turning the knob between invariance and selectivity, this strategy enables search for human actions ranging from rapid movements of sports to subtle motions of facial expressions. The introduction of the Histogram of Gabor Orientations feature makes the searching for actions go smoothly across the hierarchical layers of the HSTM model. The efficient matching is achieved by applying integral histograms to compute the features in the top two layers. The HSTM model was tested on three selected challenging video sequences and on the KTH human action database. And it achieved improvement over other state-of-theart algorithms. These promising results validate that the HSTM model is both selective and robust for searching human actions.

References

[1]
K. Fukushima. "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biol. Cybern., vol. 36, no. 4, pp. 193-202, 1980.
[2]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278- 2324, 1998.
[3]
M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex." Nat. Neurosci., vol. 2, no. 11, pp. 1019-1025, 1999.
[4]
J. K. Tsotsos. S. M. Culhane, W. Y. K. Wai, Y. Lai, N. Davis, and F. Nufio, "Modeling visual attention via selective tuning," Artificial Intell., vol. 78, no. 1-2, pp. 507-545, 1995.
[5]
D. Hubel and T. Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex," J. Neurophysiol., vol. 160, pp. 106-154, 1962.
[6]
D. Hubel and T. Wiesel, "Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat," J. Neurophysiol., vol. 28, pp. 229-289, 1965.
[7]
F. Porikli, "Integral histogram: A fast way to extract histograms in Cartesian spaces," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 829-836.
[8]
C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: A local SVM approach," in Proc. Intern. Conf. Pattern Recognition, 2004, pp. 32-36.
[9]
M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, "Actions as space-time shapes," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 1395-1402.
[10]
A. Yilmaz and M. Shah, "Recognizing human actions in videos acquired by uncalibrated moving cameras," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 150-157.
[11]
N. Vaswani, A. RoyChowdhury, and R. Chellappa, "Activity recognition using the dynamics of the configuration of interacting objects," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2003, pp. 633-640.
[12]
D. Ramanan and D. A. Forsyth, "Automatic annotation of everyday movements," in Proc. Advances Neural Inform. Process. Syst., 2004.
[13]
N. P. Cuntoor and R. Chellappa, "Epitomic representation of human activities," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-8.
[14]
J. C. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words," in Proc. Brit. Mach. Vision Conf., 2006, pp. 1249-1258.
[15]
D. G. Lowe, "Distinctive image features from scale-invariant keypoints," Intern. J. Comput. Vision, vol. 60, no. 2, pp. 91-110, 2004.
[16]
C. Harris and M. J. Stephens, "A combined corner and edge detector," in Proc. 4th Alvey Vision Conf., 1988, pp. 147-151.
[17]
I. Laptev and T. Lindeberg, "Space-time interest points," in Proc. IEEE Intern. Conf. Comput. Vision, 2003, pp. 432-439.
[18]
J. C. Niebles and L. Fei-Fei, "A hierarchical model of shape and appearance for human action classification," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-8.
[19]
S.-F. Wong, T.-K. Kim, and R. Cipolla, "Learning motion categories using both semantic and structural information," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-6.
[20]
P. DolláT, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proc. IEEE Intern. Workshop Visual Surveillance Performance Evaluation Tracking Surveillance, 2005, pp. 65-72.
[21]
Y. Ke, R. Sukthankar, and M. Hebert, "Efficient visual event detection using volumetric features," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 166-173.
[22]
E. Shechtman and M. Irani, "Space-time behavior based correlation," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 405-412.
[23]
O. Chomat and J. L. Crowley, "Probabilistic recognition of activity using local appearance," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 1999, pp. 104-109.
[24]
O. Chomat, J. Martin, and J. L. Crowley, "A probabilistic sensor for the perception and recognition of activities," in Proc. Eur. Conf. Comput. Vision, 2000, pp. 487-503.
[25]
O. Boiman and M. Irani, "Detecting irregularities in images and in video," in Proc. IEEE Intern. Conf. Comput. Vision, 2005, pp. 462-469.
[26]
A. A. Efros, A. C. Berg, G. Mori, and J. Malik, "Recognizing action at a distance," in Proc. IEEE Intern. Conf. Comput. Vision, 2003, pp. 726- 733.
[27]
L. Zelnik-Manor and M. Irani, "Event-based analysis of video," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2001, pp. 123-130.
[28]
T.-K. Kim, S.-F. Wong, and R. Cipolla, "Tensor canonical correlation analysis for action classification," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2007, pp. 1-8.
[29]
T. Serre, L. Wolf, and T. Poggio, "Object recognition with features inspired by visual cortex," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 994-1000.
[30]
T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, "Robust object recognition with cortex-like mechanisms," IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 411-426, 2007.
[31]
J. Mutch and D. G. Lowe, "Multiclass object recognition with sparse, localized features," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2006, pp. 11-18.
[32]
M. A. Giese and T. Poggio, "Neural mechanisms for the recognition of biological movements," Nat. Rev. Neurosci., vol. 4, no. 3, pp. 179-192 2003.
[33]
H. Jhuang, T. Serre, L. Wolf, and T. Poggio, "A biologically inspired system for action recognition," in Proc. IEEE Intern. Conf. Comput. Vision, 2007, pp. 1-8.
[34]
N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition, 2005, pp. 886-893.
[35]
B. Schiele and J. L. Crowley, "Recognition without correspondence using multidimensional receptive field histograms," Intern. J. Comput. Vision, vol. 36, no. 1, pp. 31-50, 2000.
[36]
S. Kullback, Information Theory and Statistics. New York: Dover Publications, 1968.
[37]
A. Agarwal and B. Triggs, "Learning to track 3-D human motion from silhouettes," in Proc. 21st Intern. Conf. Mach. Learning, 2004, pp. 9-16.
[38]
H. Ning, "Hierarchical space-time model enabling efficient search for human actions," 2007. {Online). Available: http://www.ifp.uiuc.edu/~hning2/hstm.htm.

Cited By

View all

Index Terms

  1. Hierarchical space-time model enabling efficient search for human actions
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Circuits and Systems for Video Technology
      IEEE Transactions on Circuits and Systems for Video Technology  Volume 19, Issue 6
      June 2009
      136 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 June 2009
      Revised: 18 May 2008
      Received: 05 February 2008

      Author Tags

      1. Action recognition
      2. action recognition
      3. action search
      4. hierarchical space-time model (HSTM)
      5. histogram of Gabor Orientations (HIGO)
      6. histogram of gabor orientations (HIGO)

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media