skip to main content
10.1145/2733373.2806228acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Temporal Matching Kernel with Explicit Feature Maps

Published: 13 October 2015 Publication History

Abstract

This paper proposes a framework for content-based video retrieval that addresses various tasks as particular event retrieval, copy detection or video synchronization. Given a video query, the method is able to efficiently retrieve, from a large collection, similar video events or near-duplicates with temporarily consistent excerpts. As a byproduct of the representation, it provides a precise temporal alignment of the query and the detected video excerpts.
Our method converts a series of frame descriptors into a single visual-temporal descriptor, called a temporal invariant match kernel. This representation takes into account the relative positions of the visual frames: the frame descriptors are jointly encoded with their timestamps. When matching two videos, the method produces a score function for all possible relative timestamps, which is maximized to obtain both the similarity score and the relative time offset.
Then, we propose two complementary contributions to further improve the detection and localization performance.The first is a novel query expansion method that takes advantage of the joint descriptor/timestamp representation to automatically align the first result set and produce an enriched temporal query. In contrast to other query expansion methods proposed for videos, it preserves the localization capability. Second, we improve the localization trade-off between quality and representation size by using several complementary temporal match kernels.
We evaluate our approach on benchmarks for particular event retrieval, copy detection and video synchronization. Our experiments show that our approach achieve excellent detection and localization results.

References

[1]
R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, Jun. 2012.
[2]
R. Arandjelovic and A. Zisserman. All about VLAD. In CVPR, Jun. 2013.
[3]
L. Bo, X. Ren, and D. Fox. Kernel descriptors for visual recognition. In NIPS, Dec. 2010.
[4]
L. Bo and C. Sminchisescu. Efficient match kernel between sets of features for visual recognition. In NIPS, Dec. 2009.
[5]
O. Chum, A. Mikulik, M. Perdoch, and J. Matas. Total recall II: Query expansion revisited. In CVPR, Jun. 2011.
[6]
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, Oct. 2007.
[7]
M. Douze, H. Jégou, C. Schmid, and P. Pérez. Compact video description for copy detection with precise temporal alignment. In ECCV, Sep. 2010.
[8]
M. Douze, J. Revaud, C. Schmid, and H. Jégou. Stable hyper-pooling and query expansion for event detection. In ICCV, Dec. 2013.
[9]
M. K. et al. The visual object tracking VOT2014 challenge results. In ICCV Workshops, Jun. 2014.
[10]
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploiting the circulant structure of tracking-by-detection with kernels. In ECCV, Oct. 2012.
[11]
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High-speed tracking with kernelized correlation filters. Trans. PAMI, 2015. to appear.
[12]
H. Jégou and O. Chum. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV, Oct. 2012.
[13]
H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local descriptors into compact codes. In Trans. PAMI, Sep. 2012.
[14]
H. Jégou and A. Zisserman. Triangulation embedding and democratic kernels for image search. In CVPR, Jun. 2014.
[15]
J. Law-To, L. Chen, A. Joly, I. Laptev, O. Buisson, V. Gouet-Brunet, N. Boujemaa, and F. Stentiford. Video copy detection: a comparative study. In CIVR, pages 371--378, 2007.
[16]
F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, Jun. 2007.
[17]
F. Perronnin, J. Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In ECCV, Sep. 2010.
[18]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, Jun. 2008.
[19]
J. Revaud, M. Douze, C. Schmid, and H. Jégou. Event retrieval in large video collections with circulant temporal encoding. In CVPR, Jun. 2013.
[20]
A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and trecvid. In MIR, pages 321--330, 2006.
[21]
G. Tolias, T. Furon, and H. Jégou. Orientation covariant aggregation of local descriptors with embeddings. In ECCV, Sep. 2014.
[22]
G. Tolias and H. Jégou. Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recognition, Apr. 2014.
[23]
A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. Trans. PAMI, 34(3):480--492, Mar. 2012.
[24]
J. Wang, J. Yang, F. L. K. Yu, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, Jun. 2010.
[25]
M.-C. Yeh and K.-T. Cheng. Video copy detection by fast sequence matching. In CIVR, 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '15: Proceedings of the 23rd ACM international conference on Multimedia
October 2015
1402 pages
ISBN:9781450334594
DOI:10.1145/2733373
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event detection
  2. explicit feature map
  3. multiple kernel
  4. temporal match kernel
  5. time consistent query expansion
  6. video synchronization

Qualifiers

  • Research-article

Funding Sources

  • ERC project Viamass
  • KAKENHI

Conference

MM '15
Sponsor:
MM '15: ACM Multimedia Conference
October 26 - 30, 2015
Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)17
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media