research-article

Sparse Coding Guided Spatiotemporal Feature Learning for Abnormal Event Detection in Large Videos

Authors:

Wenqing Chu,

Hongyang Xue,

Chengwei Yao,

Deng CaiAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 21, Issue 1

Pages 246 - 255

https://doi.org/10.1109/TMM.2018.2846411

Published: 01 January 2019 Publication History

Abstract

Abnormal event detection in large videos is an important task in research and industrial applications, which has attracted considerable attention in recent years. Existing methods usually solve this problem by extracting local features and then learning an outlier detection model on training videos. However, most previous approaches merely employ hand-crafted visual features, which is a clear disadvantage due to their limited representation capacity. In this paper, we present a novel unsupervised deep feature learning algorithm for the abnormal event detection problem. To exploit the spatiotemporal information of the inputs, we utilize the deep three-dimensional convolutional network (C3D) to perform feature extraction. Then, the key problem is how to train the C3D network without any category labels. Here, we employ the sparse coding results of the hand-crafted features generated from the inputs to guide the unsupervised feature learning. Specifically, we define a multilevel similarity relationship between these inputs according to the statistical information of the shared atoms. In the following, we introduce the quadruplet concept to model the multilevel similarity structure, which could be used to construct a generalized triplet loss for training the C3D network. Furthermore, the C3D network could be utilized to generate the features for sparse coding again, and this pipeline could be iterated for several times. By jointly optimizing between the sparse coding and the unsupervised feature learning, we can obtain robust and rich feature representations. Based on the learned representations, the sparse reconstruction error is applied to predicting the anomaly score of each testing input. Experiments on several publicly available video surveillance datasets in comparison with a number of existing works demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.

References

[1]

Y. Xie et al., “A unified framework for locating and recognizing human actions,” in Proc. IEEE Conf. Comput. Vis Pattern Recognit., 2011, pp. 25–32.

Abstract

References

Cited By

Index Terms

Recommendations

Abnormal event detection in surveillance videos based on multi-scale feature and channel-wise attention mechanism

Abnormal event detection in crowded scenes based on deep learning

Detection guided deconvolutional network for hierarchical feature learning

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations