research-article

Blip10000: a social video dataset containing SPUG content for tagging and retrieval

Authors:

Sebastian Schmiedeke,

Isabelle Ferrané,

Maria Eskevich,

Christoph Kofler,

Martha A. Larson,

Yannick Estève,

Gareth J. F. Jones,

Thomas SikoraAuthors Info & Claims

MMSys '13: Proceedings of the 4th ACM Multimedia Systems Conference

Pages 96 - 101

https://doi.org/10.1145/2483977.2483988

Published: 28 February 2013 Publication History

Abstract

The increasing amount of digital multimedia content available is inspiring potential new types of user interaction with video data. Users want to easily find the content by searching and browsing. For this reason, techniques are needed that allow automatic categorisation, searching the content and linking to related information. In this work, we present a dataset that contains comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'. We describe the principal characteristics of this dataset and present results that have been achieved on different tasks.

References

[1]

J. Almeida, T. Salles, E. Martins, O. Penatti, R. da S. Torres, M. Gonçalves, and J. Almeida. UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR-WS.org, ISSN 1613-0073, October 4-5 2012.

[2]

M. Eskevich, G. J. Jones, S. Chen, R. Aly, R. Ordelman, and M. A. Larson. Search and Hyperlinking Task at MediaEval 2012. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR-WS.org, ISSN 1613-0073, October 4-5 2012.

[3]

B. Ionescu, I. Mironica, K. Seyerlehner, P. Knees, J. Schlüter, M. Schedl, A. B. Horia Cucu, and P. Lambert. ARF @ MediaEval 2012: Multimodal Video Classification. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR-WS.org, ISSN 1613-0073, October 4-5 2012.

[4]

Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 29:1--29:8, 2011.

Digital Library

[5]

P. Kelm, S. Schmiedeke, and T. Sikora. Feature-based Video Key Frame Extraction for low Quality Video Sequences. In 10th Workshop on Image Analysis for Multimedia Interactive Services, 2009.

[6]

L. Lamel and J.-L. Gauvain. Speech processing for audio indexing. In B. Nordström and A. Ranta, editors, Advances in Natural Language Processing, volume 5221 of Lecture Notes in Computer Science, pages 4--15. Springer Berlin Heidelberg, 2008.

Digital Library

[7]

M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pages 51:1--51:8, New York, NY, USA, 2011. ACM.

Digital Library

[8]

L. Y. Meng Wang and X.-S. Hua. MSRA-MM: Bridging Research and Industrial Societies for Multimedia Information Retrieval. In TechReport: MSR-TR-2009-30, 2008.

[9]

M. Naaman. Social multimedia: highlighting opportunities for search and mining of multimedia data in social media applications. Multimedia Tools Appl., 56(1):9--34, Jan. 2012.

Digital Library

[10]

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, B. Shaw, W. Kraaij, A. F. Smeaton, and G. Quéenot. TRECVID 2012 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics. In Proceedings of TRECVID 2012. NIST, USA, 2012.

[11]

K. K. Reddy and M. Shah. Recognizing 50 Human Action Categories of Web Videos. In Machine Vision and Applications Journal (MVAP), 2012.

[12]

A. Rousseau, F. Bougares, P. Deléglise, H. Schwenk, and Y. Estève. LIUM's systems for the IWSLT 2011 Speech Translation Tasks. In International Workshop on Spoken Language Translation, San Francisco (USA), 8-9 Sept 2011.

[13]

S. Schmiedeke, P. Kelm, and T. Sikora. TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches. In Working Notes Proceedings of the MediaEval 2012 Workshop.

[14]

S. Schmiedeke, C. Kofler, and I. Ferrané. Overview of the MediaEval 2012 Tagging Task. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR-WS.org, ISSN 1613-0073, October 4-5 2012.

[15]

T. Semela, M. Tapaswi, H. K. Ekenel, and R. Stiefelhagen. KIT at MediaEval 2012 - Content-based Genre Classification with Visual Cues. In Working Notes Proceedings of the MediaEval 2012 Workshop.

[16]

Y. Shi, M. A. Larson, P. Wiggers, and C. M. Jonker. MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR-WS.org, ISSN 1613-0073, October 4-5 2012.

[17]

X. Wu, A. G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th international conference on Multimedia, MULTIMEDIA '07, pages 218--227, New York, NY, USA, 2007. ACM.

Digital Library

[18]

P. Xu, Y. Shi, and M. A. Larson. TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers. In Working Notes Proceedings of the MediaEval 2012 Workshop. CEUR-WS.org, ISSN 1613-0073, October 4-5 2012.

[19]

S. Zanetti, L. Zelnik-manor, and P. Perona. A walk through the web's video clips. In In: IEEE Workshop on Internet Vision, associated with CVPR, 2008.

Cited By

Hao YNgo CZhu B(2021)Learning to Match Anchor-Target Video Pairs With Dual Attentional Holographic NetworksIEEE Transactions on Image Processing10.1109/TIP.2021.311316530(8130-8143)Online publication date: 2021
https://doi.org/10.1109/TIP.2021.3113165
Moriya YJones G(2021)Augmenting ASR for User-Generated Videos with Semi-supervised Training and Acoustic Model Adaptation for Spoken Content RetrievalStatistical Language and Speech Processing10.1007/978-3-030-89579-2_7(73-84)Online publication date: 17-Oct-2021
https://doi.org/10.1007/978-3-030-89579-2_7
Hao YNgo CHuet B(2020)Neighbourhood Structure Preserving Cross-Modal Embedding for Video HyperlinkingIEEE Transactions on Multimedia10.1109/TMM.2019.292312122:1(188-200)Online publication date: Jan-2020
https://doi.org/10.1109/TMM.2019.2923121
Show More Cited By

Index Terms

Blip10000: a social video dataset containing SPUG content for tagging and retrieval
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

A data-driven approach for tag refinement and localization in web videos

Our approach locates the temporal positions of tags in videos at the keyframe level.We deal with a scenario in which there is no pre-defined set of tags.We report experiments about the use of different web sources (Flickr, Google, Bing).We show state-of-...
VSTAR: Visual Semantic Thumbnails and tAgs Revitalization
Abstract
Nowadays, video-sharing portals’ popularity has entailed massive growth in data uploads over the Internet. For several applications (e.g., browsing, retrieval, or recommendation of videos), dealing with vast data volumes has become a critical ...
Highlights
- Exploiting image captioning to simultaneously suggest tags and thumbnails.
- Retrieval of semantically relevant trends to be suggested as video tags.
- Providing of user-driven trade-off between tags/thumbnails quality and quantity.
Data-driven approaches for social image and video tagging

The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSys '13: Proceedings of the 4th ACM Multimedia Systems Conference

February 2013

304 pages

ISBN:9781450318945

DOI:10.1145/2483977

General Chair:
Carsten Griwodz
Simula Research Laboratory & University of Oslo, Norway

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Seventh Framework Programme

Conference

MMSys '13

Sponsor:

MMSys '13: Multimedia Systems Conference 2013

February 28 - March 1, 2013

Oslo, Norway

Acceptance Rates

MMSys '13 Paper Acceptance Rate 15 of 63 submissions, 24%;

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
243
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hao YNgo CZhu B(2021)Learning to Match Anchor-Target Video Pairs With Dual Attentional Holographic NetworksIEEE Transactions on Image Processing10.1109/TIP.2021.311316530(8130-8143)Online publication date: 2021
https://doi.org/10.1109/TIP.2021.3113165
Moriya YJones G(2021)Augmenting ASR for User-Generated Videos with Semi-supervised Training and Acoustic Model Adaptation for Spoken Content RetrievalStatistical Language and Speech Processing10.1007/978-3-030-89579-2_7(73-84)Online publication date: 17-Oct-2021
https://doi.org/10.1007/978-3-030-89579-2_7
Hao YNgo CHuet B(2020)Neighbourhood Structure Preserving Cross-Modal Embedding for Video HyperlinkingIEEE Transactions on Multimedia10.1109/TMM.2019.292312122:1(188-200)Online publication date: Jan-2020
https://doi.org/10.1109/TMM.2019.2923121
Ibrahim ZSaab MSbeity I(2019)VideoToVecs: a new video representation based on deep learning techniques for video classification and clusteringSN Applied Sciences10.1007/s42452-019-0573-61:6Online publication date: 14-May-2019
https://doi.org/10.1007/s42452-019-0573-6
Kavoosifar MApiletti DBaralis EGarza PHuet B(2019)Effective video hyperlinking by means of enriched feature sets and monomodal query combinationsInternational Journal of Multimedia Information Retrieval10.1007/s13735-019-00173-y9:3(215-227)Online publication date: 10-Jun-2019
https://doi.org/10.1007/s13735-019-00173-y
Jones G(2019)About Sound and Vision: CLEF Beyond Text Retrieval TasksInformation Retrieval Evaluation in a Changing World10.1007/978-3-030-22948-1_13(307-329)Online publication date: 14-Aug-2019
https://doi.org/10.1007/978-3-030-22948-1_13
Bracamonte TBustos BPoblete BSchreck T(2018)Extracting semantic knowledge from web context for multimedia IRMultimedia Tools and Applications10.1007/s11042-017-4997-y77:11(13853-13889)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11042-017-4997-y
Cheng ZZhang HWu XNgo CIonescu BSebe NFeng JLarson MLienhart RSnoek C(2017)On the Selection of Anchors and Targets for Video HyperlinkingProceedings of the 2017 ACM on International Conference on Multimedia Retrieval10.1145/3078971.3079025(287-293)Online publication date: 6-Jun-2017
https://dl.acm.org/doi/10.1145/3078971.3079025
Rotman DPorat DAshour G(2017)Robust video scene detection using multimodal fusion of optimally grouped features2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)10.1109/MMSP.2017.8122267(1-6)Online publication date: Oct-2017
https://doi.org/10.1109/MMSP.2017.8122267
Khwileh AWay AJones G(2017)Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance PredictionExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-65813-1_4(43-56)Online publication date: 17-Aug-2017
https://doi.org/10.1007/978-3-319-65813-1_4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents