skip to main content
research-article

Discriminative Dictionary Learning With Motion Weber Local Descriptor for Violence Detection

Published: 01 March 2017 Publication History

Abstract

Automatic violence detection from video is a hot topic for many video surveillance applications. However, there has been little success in developing an algorithm that can detect violence in surveillance videos with high performance. In this paper, following our recently proposed idea of motion Weber local descriptor (WLD), we make two major improvements and propose a more effective and efficient algorithm for detecting violence from motion images. First, we propose an improved WLD (IWLD) to better depict low-level image appearance information, and then extend the spatial descriptor IWLD by adding a temporal component to capture local motion information and hence form the motion IWLD. Second, we propose a modified sparse-representation-based classification model to both control the reconstruction error of coding coefficients and minimize the classification error. Based on the proposed sparse model, a class-specific dictionary containing dictionary atoms corresponding to the class labels is learned using class labels of training samples. With this learned dictionary, not only the representation residual but also the representation coefficients become discriminative. A classification scheme integrating the modified sparse model is developed to exploit such discriminative information. The experimental results on three benchmark data sets have demonstrated the superior performance of the proposed approach over the state of the arts.

References

[1]
L. R. Huesmann, J. Moise-Titus, C.-L. Podolski, and L. D. Eron, “Longitudinal relations between children's exposure to TV violence and their aggressive and violent behavior in young adulthood,” Develop. Psychol., vol. Volume 39, no. Issue 2, pp. 201–221, 2003.
[2]
E. B. Nievas, O. D. Suarez, G. B. García, and R. Sukthankar, “Violence detection in video using computer vision techniques,” in Proc. 14th Int. Conf. Comput. Anal. Images Patterns, 2011, pp. 332–339.
[3]
J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Comput. Surv., vol. Volume 43, no. Issue 3, 2011, Art. no. .
[4]
O. P. Popoola and K. Wang, “Video-based abnormal human behavior recognition—A review,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. Volume 42, no. Issue 6, pp. 865–878, 2012.
[5]
D. Tran and A. Sorokin, “Human activity recognition with metric learning,” in Proc. 10th Eur. Conf. Comput. Vis. (ECCV), 2008, pp. 548–561.
[6]
V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2010, pp. 1975–1981.
[7]
M. Cristani, M. Bicego, and V. Murino, “Audio-visual event recognition in surveillance video sequences,” IEEE Trans. Multimedia, vol. Volume 9, no. Issue 2, pp. 257–267, 2007.
[8]
J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden Markov model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 1992, pp. 379–385.
[9]
W.-H. Cheng, W.-T. Chu, and J.-L. Wu, “Semantic context detection based on hierarchical audio models,” in Proc. 5th ACM SIGMM Workshop Multimedia Inf. Retr., 2003, pp. 109–115.
[10]
J. Lin and W. Wang, “Weakly-supervised violence detection in movies with audio and video based co-training,” in Proc. 10th IEEE Pacific-Rim Conf. Multimedia, Dec. 2009, pp. 930–935.
[11]
A. Datta, M. Shah, and N. D. V. Lobo, “Person-on-person violence detection in video data,” in Proc. IEEE 16th Int. Conf. Image Process. (ICIP), Aug. 2002, pp. 433–438.
[12]
C. T. Clarin, M. Echavez, and P. C. Naval, “DOVE: Detection of movie violence using motion intensity analysis on skin and blood,” in Proc. 6th Philippine Comput. Sci. Congr. (Pcsc), Mar. 2005, pp. 150–156.
[13]
T. Hassner, Y. Itcher, and O. Kliper-Gross, “Violent flows: Real-time detection of violent crowd behavior,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops (CVPR), Jun. 2012, pp. 1–6.
[14]
T. Zhang, Z. Yang, W. Jia, B. Yang, J. Yang, and X. He, “A new method for violence detection in surveillance scenes,” Multimedia Tools Appl., vol. Volume 75, no. Issue 12, pp. 7327–7349, 2016.
[15]
F. D. M. de Souza, G. C. Chavez, E. A. do Valle, and A. de A. Araujo, “Violence detection in video using spatio-temporal features,” in Proc. 23rd SIBGRAPI Conf. Graph., Patterns Images, Aug./Sep. 2010, pp. 224–230.
[16]
W. Zhou, C. Wang, B. Xiao, and Z. Zhang, “Action recognition via structured codebook construction,” Signal Process., Image Commun., vol. Volume 29, no. Issue 4, pp. 546–555, 2014.
[17]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. Volume 60, no. Issue 2, pp. 91–110, 2004.
[18]
J. Yang, K. Yu, and T. Huang, “Supervised translation-invariant sparse coding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2010, pp. 3517–3524.
[19]
M. Y. Chen and A. Hauptmann, “MoSIFT: Recognizing human actions in surveillance videos,” Ann. Pharmacotherapy, vol. Volume 39, no. Issue 1, pp. 150–162, 2009.
[20]
H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,” in Proc. Brit. Mach. Vis. Conf., 2009, pp. 1–11.
[21]
J. Chen et al., “WLD: A robust local image descriptor,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 32, no. Issue 9, pp. 1705–1720, 2010.
[22]
B. Wang, W. Li, W. Yang, and Q. Liao, “Illumination normalization based on Weber's law with application to face recognition,” IEEE Trans. Signal Process. Lett., vol. Volume 18, no. Issue 8, pp. 462–465, 2011.
[23]
S. Li, D. Gong, and Y. Yuan, “Face recognition using Weber local descriptors,” Neurocomputing, vol. Volume 122, pp. 272–283, 2013.
[24]
T. Zhang, W. Jia, B. Yang, J. Yang, X. He, and Z. Zheng, “MoWLD: A robust motion image descriptor for violence detection,” Multimedia Tools Appl., pp. 1–20, 2015.
[25]
N. Gkalelis, A. Tefas, and I. Pitas, “Combining fuzzy vector quantization with linear discriminant analysis for continuous human movement recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 18, no. Issue 11, pp. 1511–1521, 2008.
[26]
A. Iosifidis, A. Tefas, and A. Pitas, “View-invariant action recognition based on artificial neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. Volume 23, no. Issue 3, pp. 412–424, 2012.
[27]
J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 31, no. Issue 2, pp. 210–227, 2009.
[28]
J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 1794–1801.
[29]
Y. Zhu, X. Zhao, Y. Fu, and Y. Liu, “Sparse coding on local spatial-temporal volumes for human action recognition,” in Proc. 10th Asian Conf. Comput. Vis. (ACCV), 2011, pp. 660–671.
[30]
Z. I. Botev, J. F. Grotowski, and D. P. Kroese, “Kernel density estimation via diffusion,” Ann. Statist., vol. Volume 38, no. Issue 5, pp. 2916–2957, 2010.
[31]
X. Geng and G. Hu, “Unsupervised feature selection by kernel density estimation in wavelet-based spike sorting,” Biomed. Signal Process. Control, vol. Volume 7, no. Issue 2, pp. 112–117, 2012.
[32]
T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions,” Pattern Recognit., vol. Volume 29, pp. 51–59, 1996.
[33]
S. Vishwakarma and A. Agrawal, “A survey on activity recognition and behavior understanding in video surveillance,” Vis. Comput., vol. Volume 29, no. Issue 10, pp. 983–1009, 2013.
[34]
D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, “Modeling individual and group actions in meetings with layered HMMs,” IEEE Trans. Multimedia, vol. Volume 8, no. Issue 3, pp. 509–520, 2006.
[35]
N. T. Nguyen, D. Q. Phung, S. Venkatesh, and H. Bui, “Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2005, pp. 955–960.
[36]
Y. Shi, Y. Huang, D. Minnen, A. Bobick, and I. Essa, “Propagation networks for recognition of partially ordered sequential action,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun./Jul. 2004, pp. 862–869.
[37]
P. Dai, H. Di, L. Dong, L. Tao, and G. Xu, “Group interaction analysis in dynamic context,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. Volume 39, no. Issue 1, pp. 34–42, 2009.
[38]
D. Damen and D. Hogg, “Recognizing linked events: Searching the space of feasible explanations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 927–934.
[39]
A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 23, no. Issue 3, pp. 257–267, 2001.
[40]
M. Bertini, A. Del Bimbo, and L. Seidenari, “Multi-scale and real-time non-parametric approach for anomaly detection and localization,” Comput. Vis. Image Understand., vol. Volume 116, no. Issue 3, pp. 320–329, 2012.
[41]
S. Baysal and P. Duygulu, “A line based pose representation for human action recognition,” Signal Process., Image Commun., vol. Volume 28, no. Issue 5, pp. 458–471, 2013.
[42]
B. Saghafi and D. Rajan, “Human action recognition using pose-based discriminant embedding,” Signal Process., Image Commun., vol. Volume 27, no. Issue 1, pp. 96–111, 2012.
[43]
A. Oikonomopoulos, I. Patras, M. Pantic, and N. Paragios, “Trajectory-based representation of human actions,” in Proc. Artif. Intell. Human Comput., 2007, pp. 133–154.
[44]
S. Vishwakarma, A. Sapre, and A. Agrawal, “Action recognition using cuboids of interest points,” in Proc. IEEE Int. Conf. Signal Process., Commun. Comput. (ICSPCC), Sep. 2011, pp. 1–6.
[45]
Y. Yang, J. Song, Z. Huang, Z. Ma, N. Sebe, and A. G. Hauptmann, “Multi-feature fusion via hierarchical regression for multimedia analysis,” IEEE Trans. Multimedia, vol. Volume 15, no. Issue 3, pp. 572–581, 2013.
[46]
L. Gao, J. Song, F. Nie, Y. Yan, N. Sebe, and H. T. Shen, “Optimal graph learning with partial tags and multiple features for image and video annotation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 4371–4379.
[47]
M. J. Roshtkhari and M. D. Levine, “Online dominant and anomalous behavior detection in videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013, pp. 2611–2618.
[48]
D. Xu, E. Ricci, Y. Yan, J. Song, and N. Sebe, “Learning deep representations of appearance and motion for anomalous event detection,” in Proc. Brit. Mach. Vis. Conf. (BMVC), 2015, pp. 1–12.
[49]
J. Mairal, F. Bach, and J. Ponce, “Task-driven dictionary learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 34, no. Issue 4, pp. 791–804, 2012.
[50]
Q. Zhang and B. Li, “Discriminative K-SVD for dictionary learning in face recognition,” in Proc. IEEE Int. Conf. CVPR, Jun. 2010, pp. 2691–2698.
[51]
H. Wang, C. Yuan, W. Hu, and C. Sun, “Supervised class-specific dictionary learning for sparse modeling in action recognition,” Pattern Recognit., vol. Volume 45, no. Issue 11, pp. 3902–3911, 2012.
[52]
Z. L. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: Learning a discriminative dictionary for recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 35, no. Issue 11, pp. 2651–2664, 2013.
[53]
J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. R. Bach, “Supervised dictionary learning,” in Proc. NIPS, 2009, pp. 1033–1040.
[54]
S. Bengio, F. Pereira, Y. Singer, and D. Strelow, “Group sparse coding,” in Proc. NIPS, 2009, pp. 82–89.
[55]
M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. Volume 54, no. Issue 11, pp. 4311–4322, 2006.
[56]
I. Ramirez, P. Sprechmann, and G. Sapiro, “Classification and clustering via dictionary learning with structured incoherence and shared features,” in Proc. IEEE Int. Conf. CVPR, Jun. 2010, pp. 3501–3508.
[57]
M. Yang, L. Zhang, X. C. Feng, and D. Zhang, “Fisher discrimination dictionary learning for sparse representation,” in Proc. ICCV, Nov. 2011, pp. 543–550.
[58]
J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Learning discriminative dictionaries for local image analysis,” in Proc. IEEE CVPR, Jun. 2008, pp. 1233–1240.
[59]
A. Castrodad and G. Sapiro, “Sparse modeling of human actions from motion imagery,” Int. J. Comput. Vis., vol. Volume 100, no. Issue 1, pp. 1–15, 2012.
[60]
M. Yang, L. Zhang, X. Feng, and D. Zhang, “Sparse representation based fisher discrimination dictionary learning for image classification,” Int. J. Comput. Vis., vol. Volume 109, no. Issue 3, pp. 209–232, 2014.
[61]
N. Zhou, Y. Shen, J. Peng, and J. Fan, “Learning inter-related visual dictionary for object recognition,” in Proc. IEEE CVPR, Jun. 2012, pp. 3490–3497.
[62]
S. Kong and D. Wang, “A dictionary learning approach for classification: Separating the particularity and the commonality,” in Proc. 12th ECCV, 2012, pp. 186–199.
[63]
L. Shen, S. Wang, G. Sun, S. Jiang, and Q. Huang, “Multi-level discriminative dictionary learning towards hierarchical visual categorization,” in Proc. IEEE CVPR, Jun. 2013, pp. 383–390.
[64]
S. Cai, W. Zuo, L. Zhang, X. Feng, and P. Wang, “Support vector guided dictionary learning,” in Proc. 13th ECCV, Sep. 2014, pp. 624–639.
[65]
E. L. Andrade, S. Blunsden, and R. B. Fisher, “Modelling crowd scenes for event detection,” in Proc. IEEE 18th Int. Conf. Pattern Recognit. (ICPR), vol. Volume 1 . Aug. 2006, pp. 175–178.
[66]
B. Yang, T. Zhang, C. Gu, and X. Guan, “A novel face recognition method based on iwld and iwbc,” Multimedia Tools Appl., vol. Volume 75, no. Issue 12, pp. 6979–7002, 2016.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology  Volume 27, Issue 3
March 2017
305 pages

Publisher

IEEE Press

Publication History

Published: 01 March 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media