tutorial

Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations

Authors:

Alexander G. HauptmannAuthors Info & Claims

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

Pages 121 - 128

https://doi.org/10.1145/2578726.2578739

Published: 01 April 2014 Publication History

Abstract

Spatial Pyramid Matching (SPM) assumes that the spatial Bag-of-Words (BoW) representation is independent of data. However, evidence has shown that the assumption usually leads to a suboptimal representation. In this paper, we propose a novel method called Jensen-Shannon (JS) Tiling to learn the BoW representation from data directly at the BoW level. The proposed JS Tiling is especially appropriate for large-scale datasets as it is orders of magnitude faster than existing methods, but with comparable or even better classification precision. Experimental results on four benchmarks including two TRECVID12 datasets validate that JS Tiling outperforms the SPM and the state-of-the-art methods. The runtime comparison demonstrates that selecting BoW representations by JS Tiling is more than 1,000 times faster than running classifiers. Besides, JS Tiling is an important component contributing to CMU Teams' final submission in TRECVID 2012 Multimedia Event Detection.

References

[1]

JS Tiling webpage. https://code.google.com/p/learning2tile/.

[2]

Y. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, 2010.

[3]

Y. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling in visual recognition. In ICML, 2010.

[4]

C. Chang and C. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.

Digital Library

[5]

M. Chen and A. Hauptmann. MoSift: Recognizing human actions in surveillance videos. Technical report, Carnegie Mellon University, 2009.

[6]

M. Er. A fast algorithm for generating set partitions. The Computer Journal, 31(3):283--284, 1988.

Digital Library

[7]

M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303--338, 2010.

Digital Library

[8]

J. Feng, B. Ni, Q. Tian, and S. Yan. Geometric lp-norm feature pooling for image classification. In CVPR, 2011.

[9]

A. Habibian, K. E. van de Sande, and C. G. Snoek. Recommendations for video event recognition using concept vocabularies. In ICMR, pages 89--96, 2013.

Digital Library

[10]

Y. Jia, C. Huang, and T. Darrell. Beyond spatial pyramids: Receptive field learning for pooled image features. In CVPR, 2012.

Digital Library

[11]

L. Jiang, A. G. Hauptmann, and G. Xiang. Leveraging high-level and low-level features for multimedia event detection. In Multimedia, pages 449--458, 2012.

Digital Library

[12]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

Digital Library

[13]

X. Li, C. G. Snoek, M. Worring, and A. W. Smeulders. Fusing concept detection and geo context for visual search. In ICMR, page 4, 2012.

Digital Library

[14]

D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004.

Digital Library

[15]

M. Marszalek and C. Schmid. Learning representations for visual object class recognition. In PASCAL VOC workshop, 2007.

[16]

M. Orlov. Efficient Generation of Set Partitions. Technical report, University of ULM, 2002.

[17]

P. Over, J. Fiscus, and G. Sanders. Trecvid 2012--an overview to the goals, tasks, data, evaluation mechanisms, and metrics. In TRECVID, NIST, 2012.

[18]

G. Sharma and F. Jurie. Learning discriminative spatial representation for image classification. In BMVC, 2011.

[19]

G. Sharma, F. Jurie, and C. Schmid. Discriminative spatial saliency for image classification. In CVPR, 2012.

Digital Library

[20]

H. Sharp. Cardinality of finite topologies. Journal of Combinatorial Theory, 5(1):82--86, 1968.

[21]

W. Tong, Y. Yang, L. Jiang, S. I. Yu, Z. Lan, Z. Ma, W. Sze, E. Younessian, and A. G. Hauptmann. E-LAMP: integration of innovative ideas for multimedia event detection. Machine Vision and Applications, pages 1--11, 2013.

Digital Library

[22]

A. Torralba and A. A. Efros. Unbiased look at dataset bias. In CVPR, pages 1521--1528. IEEE, 2011.

Digital Library

[23]

J. C. van Gemert, C. J. Veenman, A. W. Smeulders, and J. Geusebroek. Visual word ambiguity. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(7):1271--1283, 2010.

Digital Library

[24]

V. Viitaniemi and J. Laaksonen. Spatial extensions to bag of visual words. In ACM CIVR, 2009.

Digital Library

[25]

H. Wang, M. M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition. In BMVC, 2009.

[26]

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, 2010.

[27]

J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009.

[28]

J. Yang, K. Yu, and T. Huang. Efficient highly over-complete sparse coding using a mixture model. In ECCV, 2010.

Digital Library

[29]

S. I. Yu, Z. Xu, D. Ding, W. Sze, F. Vicente, Z. Lan, Y. Cai, S. Rawat, P. Schulam, N. Markandaiah, et al. Informedia E-LAMP@ trecvid 2012 multimedia event detection and recounting (med and mer). In TRECVID, NIST, 2012.

[30]

L. Zhang, L. Jiang, L. Bao, S. Takahashi, Y. Li, and A. Hauptmann. Informedia@ trecvid 2011: Surveillance event detection. In TRECVID, NIST, 2011.

Cited By

Liu ZYu Y(2018)An Autonomous Selection Method for Spatial Pooling Based on Chi-Square Test2018 IEEE International Conference on Information and Automation (ICIA)10.1109/ICInfA.2018.8812586(580-585)Online publication date: Aug-2018
https://doi.org/10.1109/ICInfA.2018.8812586
Yu YSun ZZhu WGu J(2018)A Homotopy Iterative Hard Thresholding Algorithm With Extreme Learning Machine for Scene RecognitionIEEE Access10.1109/ACCESS.2018.28452986(30424-30436)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2845298
Xu WLiu WChi HHuang XYang J(2018)Multi-task classification with sequential instances and tasksSignal Processing: Image Communication10.1016/j.image.2018.02.01364(59-67)Online publication date: May-2018
https://doi.org/10.1016/j.image.2018.02.013
Show More Cited By

Index Terms

Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing
2. Software and its engineering
  1. Software notations and tools
    1. Context specific languages

Recommendations

Content-based medical image retrieval by spatial matching of visual words
Abstract
Content-Based Image Retrieval (CBIR) systems have recently emerged as one of the most promising and best image retrieval paradigms. To pacify the semantic gap associated with CBIR systems, the Bag of Visual Words (BoVW) techniques are ...
Land-use scene classification: a comparative study on bag of visual word framework

With successful launch of high spatial resolution (HSR) sensors, highly detailed spatial information is provided for remote sensing research. This improvement has allowed researchers to monitor environmental changes on a small spatial scale. However ...
Learning topic of dynamic scene using belief propagation and weighted visual words approach

In this paper, we are tackling the problem of distinguishing scenes, including static and dynamic scenes. We propose a framework of scene recognition, based on bag of visual words and topic model. We achieve the task using the topic model by belief ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

April 2014

564 pages

ISBN:9781450327824

DOI:10.1145/2578726

Conference Chairs:
Mohan Kankanhalli
National University of Singapore
,
Stefan Rueger
The Open University, UK
,
R. Manmatha
A9.com, USA
,
General Chairs:
Joemon Jose
University of Glasgow, UK
,
Keith van Rijsbergen
University of Glasgow, UK

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

ICMR '14

ICMR '14: International Conference on Multimedia Retrieval

April 1 - 4, 2014

Glasgow, United Kingdom

Acceptance Rates

ICMR '14 Paper Acceptance Rate 21 of 111 submissions, 19%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
163
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu ZYu Y(2018)An Autonomous Selection Method for Spatial Pooling Based on Chi-Square Test2018 IEEE International Conference on Information and Automation (ICIA)10.1109/ICInfA.2018.8812586(580-585)Online publication date: Aug-2018
https://doi.org/10.1109/ICInfA.2018.8812586
Yu YSun ZZhu WGu J(2018)A Homotopy Iterative Hard Thresholding Algorithm With Extreme Learning Machine for Scene RecognitionIEEE Access10.1109/ACCESS.2018.28452986(30424-30436)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2845298
Xu WLiu WChi HHuang XYang J(2018)Multi-task classification with sequential instances and tasksSignal Processing: Image Communication10.1016/j.image.2018.02.01364(59-67)Online publication date: May-2018
https://doi.org/10.1016/j.image.2018.02.013
Liu CZhang QLu BLi C(2017)Feature Encodings and Poolings for Action and Event Recognition: A Comprehensive SurveyInformation10.3390/info80401348:4(134)Online publication date: 29-Oct-2017
https://doi.org/10.3390/info8040134
Li PMa HMing A(2017)A non-rigid 3D model retrieval method based on scale-invariant heat kernel signature featuresMultimedia Tools and Applications10.1007/s11042-016-3606-976:7(10207-10230)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s11042-016-3606-9
Feng QZhou RXu CCheng YTesta BYin HWeippl EKatzenbeisser SKruegel CMyers AHalevi S(2016)Scalable Graph-based Bug Search for Firmware ImagesProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security10.1145/2976749.2978370(480-491)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2976749.2978370
Jiang LBourdeau JHendler JNkambou RHorrocks IZhao B(2016)Web-scale Multimedia Search for Internet Video ContentProceedings of the 25th International Conference Companion on World Wide Web10.1145/2872518.2888599(311-316)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.1145/2872518.2888599
Wu LYu YGu J(2016)A scene recognition method using sparse features with layout-sensitive pooling and extreme learning machine2016 IEEE International Conference on Information and Automation (ICIA)10.1109/ICInfA.2016.7831818(178-183)Online publication date: Aug-2016
https://doi.org/10.1109/ICInfA.2016.7831818
Jiang LYu SMeng DMitamura THauptmann AHauptmann ANgo CXue XJiang YSnoek CVasconcelos N(2015)Bridging the Ultimate Semantic GapProceedings of the 5th ACM on International Conference on Multimedia Retrieval10.1145/2671188.2749399(27-34)Online publication date: 22-Jun-2015
https://dl.acm.org/doi/10.1145/2671188.2749399
Jiang LYu SMeng DMitamura THauptmann A(2015)Text-to-video: a semantic search engine for internet videosInternational Journal of Multimedia Information Retrieval10.1007/s13735-015-0093-05:1(3-18)Online publication date: 24-Dec-2015
https://doi.org/10.1007/s13735-015-0093-0
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents