research-article

Fast and Accurate Content-based Semantic Search in 100M Internet Videos

Authors:

Teruko Mitamura,

Alexander G. HauptmannAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 49 - 58

https://doi.org/10.1145/2733373.2806237

Published: 13 October 2015 Publication History

Abstract

Large-scale content-based semantic search in video is an interesting and fundamental problem in multimedia analysis and retrieval. Existing methods index a video by the raw concept detection score that is dense and inconsistent, and thus cannot scale to "big data" that are readily available on the Internet. This paper proposes a scalable solution. The key is a novel step called concept adjustment that represents a video by a few salient and consistent concepts that can be efficiently indexed by the modified inverted index. The proposed adjustment model relies on a concise optimization framework with interpretations. The proposed index leverages the text-based inverted index for video retrieval. Experimental results validate the efficacy and the efficiency of the proposed method. The results show that our method can scale up the semantic search while maintaining state-of-the-art search performance. Specifically, the proposed method (with reranking) achieves the best result on the challenging TRECVID Multimedia Event Detection (MED) zero-example task. It only takes 0.2 second on a single CPU core to search a collection of 100 million Internet videos.

References

[1]

E. Apostolidis, V. Mezaris, M. Sahuguet, B. Huet, B.vCervenková, D. Stein, S. Eickeler, J. L. Redondo Garcia, R. Troncy, and L. Pikora. Automatic fine-grained hyperlinking of videos within a closed collection using scene segmentation. In MM, 2014.

Digital Library

[2]

S. Bhattacharya, F. X. Yu, and S.-F. Chang. Minimally needed evidence for complex event recognition in unconstrained videos. In ICMR, 2014.

Digital Library

[3]

E. F. Can and R. Manmatha. Modeling concept dependencies for event detection. In ICMR, 2014.

Digital Library

[4]

J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, and H. Adam. Large-scale object classification using label relation graphs. In ECCV, 2014.

[5]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.

[6]

M. L. Fisher. The lagrangian relaxation method for solving integer programming problems. Management science, 50(12):1861--1871, 2004.

Digital Library

[7]

N. Gkalelis and V. Mezaris. Video event detection using generalized subclass discriminant analysis and linear support vector machines. In ICMR, 2014.

Digital Library

[8]

M. Grant, S. Boyd, and Y. Ye. CVX: Matlab software for disciplined convex programming, 2008.

[9]

A. Habibian, T. Mensink, and C. G. Snoek. Composite concept discovery for zero-shot video event detection. In ICMR, 2014.

Digital Library

[10]

A. Habibian, T. Mensink, and C. G. Snoek. Videostory: A new multimedia embedding for few-example recognition and translation of events. In MM, 2014.

Digital Library

[11]

A. Habibian, K. E. van de Sande, and C. G. Snoek. Recommendations for video event recognition using concept vocabularies. In ICMR, 2013.

Digital Library

[12]

E. Hatcher and O. Gospodnetic. Lucene in action. In Manning Publications, 2004.

Digital Library

[13]

L. Jiang, A. Hauptmann, and G. Xiang. Leveraging high-level and low-level features for multimedia event detection. In MM, 2012.

Digital Library

[14]

L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann. Easy samples first: Self-paced reranking for zero-example multimedia search. In MM, 2014.

Digital Library

[15]

L. Jiang, D. Meng, S.-I. Yu, Z. Lan, S. Shan, and A. G. Hauptmann. Self-paced learning with diversity. In NIPS, 2014.

Digital Library

[16]

L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann. Self-paced curriculum learning. In AAAI, 2015.

Digital Library

[17]

L. Jiang, T. Mitamura, S.-I. Yu, and A. G. Hauptmann. Zero-example event search using multimodal pseudo relevance feedback. In ICMR, 2014.

Digital Library

[18]

L. Jiang, S.-I. Yu, D. Meng, T. Mitamura, and A. G. Hauptmann. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR, 2015.

Digital Library

[19]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.

Digital Library

[20]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

Digital Library

[21]

H. Lee. Analyzing complex events and human actions in" in-the-wild" videos. In UMD Ph.D Theses and Dissertations, 2014.

[22]

M. Mazloom, X. Li, and C. G. Snoek. Few-example video event retrieval using tag propagation. In ICMR, 2014.

Digital Library

[23]

Y. Miao, L. Jiang, H. Zhang, and F. Metze. Improvements to speaker adaptive training of deep neural networks. In SLT, 2014.

[24]

Y. Miao and F. Metze. Improving low-resource cd-dnn-hmm using dropout and multilingual dnn training. In INTERSPEECH, 2013.

[25]

D. Moise, D. Shestakov, G. Gudmundsson, and L. Amsaleg. Indexing and searching 100m images with map-reduce. In ICMR, 2013.

Digital Library

[26]

M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. MultiMedia, IEEE, 13(3):86--91, 2006.

Digital Library

[27]

M. R. Naphade and J. R. Smith. On the detection of semantic concepts at trecvid. In MM, 2004.

Digital Library

[28]

S. Oh, S. McCloskey, I. Kim, A. Vahdat, K. J. Cannons, H. Hajimirsadeghi, G. Mori, A. A. Perera, M. Pandey, and J. J. Corso. Multimedia event detection with multimodal feature fusion and temporal concept localization. Machine vision and applications, 25(1):49--69, 2014.

Digital Library

[29]

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, W. Kraaij, A. F. Smeaton, and G. Quéenot. TRECVID 2014 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID, 2014.

[30]

D. Povey, A. Ghoshal, G. Boulianne, et al. The kaldi speech recognition toolkit. In ASRU, 2011.

[31]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575, 2014.

[32]

B. Safadi, M. Sahuguet, and B. Huet. When textual and visual information join forces for multimedia retrieval. In ICMR, 2014.

Digital Library

[33]

N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2):231--245, 2013.

[34]

J. Sivic and A. Zisserman. Video google: Efficient visual search of videos. In Toward Category-Level Object Recognition, 2006.

[35]

J. R. Smith. Riding the multimedia big data wave. In SIGIR, 2013.

Digital Library

[36]

C. Snoek, K. van de Sande, D. Fontijne, A. Habibian, M. Jain, S. Kordumova, Z. Li, M. Mazloom, S. Pintea, R. Tao, et al. Mediamill at trecvid 2013: Searching concepts, objects, instances and events in video. In TRECVID, 2013.

[37]

B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, 2015.

[38]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.

[39]

W. Tong, Y. Yang, L. Jiang, et al. E-LAMP: integration of innovative ideas for multimedia event detection. Machine Vision and Applications, 25(1):5--15, 2014.

Digital Library

[40]

F. Wang, Z. Sun, Y. Jiang, and C. Ngo. Video event detection using motion relativity and feature selection. In TMM, 2013.

[41]

H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.

Digital Library

[42]

S. Wu, S. Bondugula, F. Luisier, X. Zhuang, and P. Natarajan. Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In CVPR, 2014.

Digital Library

[43]

X. Wu, A. G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In MM, 2007.

Digital Library

[44]

S.-I. Yu, L. Jiang, and A. Hauptmann. Instructional videos for unsupervised harvesting and learning of action examples. In MM, 2014.

Digital Library

[45]

S.-I. Yu, L. Jiang, Z. Xu, et al. Informedia@ trecvid 2014 med and mer. In TRECVID, 2014.

[46]

M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49--67, 2006.

Cited By

Wu YJiang LYang Y(2023)Switchable Novel Object CaptionerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314498445:1(1162-1173)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TPAMI.2022.3144984
Guan WSong XChang XNie LGuan WSong XChang XNie L(2022)Heterogeneous Graph Learning for Personalized OCMGraph Learning for Fashion Compatibility Modeling10.1007/978-3-031-18817-6_6(89-108)Online publication date: 2-Nov-2022
https://doi.org/10.1007/978-3-031-18817-6_6
Spolaôr NLee HTakaki WEnsina LParmezan AOliva JCoy CWu F(2021)A video indexing and retrieval computational prototype based on transcribed speechMultimedia Tools and Applications10.1007/s11042-021-11401-1Online publication date: 30-Aug-2021
https://doi.org/10.1007/s11042-021-11401-1
Show More Cited By

Index Terms

Fast and Accurate Content-based Semantic Search in 100M Internet Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Semantic search in video is a novel and challenging problem in information and multimedia retrieval. Existing solutions are mainly limited to text matching, in which the query words are matched against the textual metadata generated by users. This paper ...
Web-scale Multimedia Search for Internet Video Content
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

The World Wide Web has been witnessing an explosion of video content. Video data are becoming one of the most valuable sources to assess insights and information. However, existing video search methods are still based on text matching (text-to-text ...
Assessing thesaurus-based annotations for semantic search applications

Statistical methods for automated document indexing are becoming an alternative to the manual assignment of keywords. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
Intelligence Advanced Research Projects Activity

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
819
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu YJiang LYang Y(2023)Switchable Novel Object CaptionerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314498445:1(1162-1173)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TPAMI.2022.3144984
Guan WSong XChang XNie LGuan WSong XChang XNie L(2022)Heterogeneous Graph Learning for Personalized OCMGraph Learning for Fashion Compatibility Modeling10.1007/978-3-031-18817-6_6(89-108)Online publication date: 2-Nov-2022
https://doi.org/10.1007/978-3-031-18817-6_6
Spolaôr NLee HTakaki WEnsina LParmezan AOliva JCoy CWu F(2021)A video indexing and retrieval computational prototype based on transcribed speechMultimedia Tools and Applications10.1007/s11042-021-11401-1Online publication date: 30-Aug-2021
https://doi.org/10.1007/s11042-021-11401-1
Sagar DGarg JKansal PBhalla SShah RYu Y(2020)PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM)10.1109/BigMM50055.2020.00039(221-230)Online publication date: Sep-2020
https://doi.org/10.1109/BigMM50055.2020.00039
Spolaôr NLee HTakaki WEnsina LCoy CWu F(2020)A systematic review on content-based video retrievalEngineering Applications of Artificial Intelligence10.1016/j.engappai.2020.10355790:COnline publication date: 1-Apr-2020
https://dl.acm.org/doi/10.1016/j.engappai.2020.103557
Song XNie LWang Y(2019)Compatibility Modeling: Data and Knowledge Applications for Clothing MatchingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00952ED1V01Y201909ICR06911:3(1-138)Online publication date: 2-Oct-2019
https://doi.org/10.2200/S00952ED1V01Y201909ICR069
Song XHan XLi YChen JXu XNie LAmsaleg LHuet BLarson MGravier GHung HNgo CTsang Ooi W(2019)GP-BPRProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350956(320-328)Online publication date: 15-Oct-2019
https://dl.acm.org/doi/10.1145/3343031.3350956
Han XSong XYin JWang YNie LPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Prototype-guided Attribute-wise Interpretable Scheme for Clothing MatchingProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331245(785-794)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331245
Chen S(2019)Is Artificial Intelligence New to Multimedia?IEEE MultiMedia10.1109/MMUL.2019.291498226:2(5-7)Online publication date: 1-Apr-2019
https://doi.org/10.1109/MMUL.2019.2914982
Asim MWasim MGhani Khan MMahmood NMahmood W(2019)The Use of Ontology in Retrieval: A Study on Textual, Multilingual, and Multimedia RetrievalIEEE Access10.1109/ACCESS.2019.28978497(21662-21686)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2897849
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents