skip to main content
10.1145/2393347.2393412acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Leveraging high-level and low-level features for multimedia event detection

Published: 29 October 2012 Publication History

Abstract

This paper addresses the challenge of Multimedia Event Detection by proposing a novel method for high-level and low-level features fusion based on collective classification. Generally, the method consists of three steps: training a classifier from low-level features; encoding high-level features into graphs; and diffusing the scores on the established graph to obtain the final prediction. The final prediction is derived from multiple graphs each of which corresponds to a high-level feature. The paper investigates two graph construction methods using logarithmic and exponential loss functions, respectively and two collective classification algorithms, i.e. Gibbs sampling and Markov random walk. The theoretical analysis demonstrates that the proposed method converges and is computationally scalable and the empirical analysis on TRECVID 2011 Multimedia Event Detection dataset validates its outstanding performance compared to state-of-the-art methods, with an added benefit of interpretability.

References

[1]
Laptev, T. Lindeberg. Space-time interest points. In ICCV, pages 432--439, Nice, France, 2003.
[2]
Li-Jia Li, Hao Su, Eric Xing, Fei-Fei Li. Object bank: a high-level image representation for scene classification and semantic feature sparsification. In NIPS, pages 1378--1386, Vancouver, Canada, 2010.
[3]
C. Snoek, M. Worring, A. W. M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005.
[4]
T. Pham, N. Maillot, J. Lim, J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In CIKM, pages 439--444, Lisbon, Portugal, 2007.
[5]
H. Escalante, C. Hernández, L. Sucar, M. Montes. Late fusion of heterogeneous methods for multimedia image retrieval. In ACM MIR, pages 172--179, Vancouver, Canada, 2008.
[6]
J. Kludas, E. Bruno, S. Marchand-Maillet. Information fusion in multimedia information retrieval. In Adaptive Multimedia Retrieval, pages 147--159, Paris, France, 2007.
[7]
L. Bao et al. Informedia@TRECVID 2011. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2011.
[8]
H. Eldardiry, J. Neville. Across-Model collective ensemble classification. In AAAI, to appear, San Francisco, USA, 2011.
[9]
P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93--106, 2008.
[10]
S. Macskassy, and F. Provost. Classification in networked data: A toolkit and a univariate case study. JMLR, 8:935--983, 2007.
[11]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, pages 2169--2178, New York, USA, 2006.
[12]
A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, pages 401--408, Amsterdam, Netherlands, 2007.
[13]
Y. Wu, E. Y. Chang, K. C. Chang, J. R. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, pages 572--579, New York, USA, 2004.
[14]
N. Rasiwasia, JC. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, Firenze, Italy, 2010.
[15]
L. K. McDowell, K.M. Gupta, D.W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, Vancouver, Canada, 2007.
[16]
W. R. Gilks,S. Richardson and D. J. Spiegelhalter. Markov chain Monte Carlo in Practice. Chapman Hall/CRC Interdisciplinary Statistics, 1996.
[17]
J. Gemert, J. Geusebroek, C. Veenman, A. Smeulders. Kernel codebooks for scene categorization. In ECCV, pages 696--709, Marseille, France, 2008.
[18]
H. Hotelling. Relations between two sets of variates. Biometrika, 28:321--377, 1936.
[19]
P. Over, G. Awad, J. Fiscus, B. Antonishek, and M. Michel. Trecvid 2010 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2010.
[20]
Doeblin, W. Exposé sur la théorie des chaînes simples constantes de Markoff à un nombre fini d'états. Rev. Math. Union Interbalkanique, 2:77--105, 1938.

Cited By

View all

Index Terms

  1. Leveraging high-level and low-level features for multimedia event detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '12: Proceedings of the 20th ACM international conference on Multimedia
    October 2012
    1584 pages
    ISBN:9781450310895
    DOI:10.1145/2393347
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. collective classification
    2. feature fusion
    3. multi-modal integration

    Qualifiers

    • Research-article

    Conference

    MM '12
    Sponsor:
    MM '12: ACM Multimedia Conference
    October 29 - November 2, 2012
    Nara, Japan

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media