skip to main content
research-article

Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018

Published: 13 February 2019 Publication History

Abstract

This work summarizes the findings of the 7th iteration of the Video Browser Showdown (VBS) competition organized as a workshop at the 24th International Conference on Multimedia Modeling in Bangkok. The competition focuses on video retrieval scenarios in which the searched scenes were either previously observed or described by another person (i.e., an example shot is not available). During the event, nine teams competed with their video retrieval tools in providing access to a shared video collection with 600 hours of video content. Evaluation objectives, rules, scoring, tasks, and all participating tools are described in the article. In addition, we provide some insights into how the different teams interacted with their video browsers, which was made possible by a novel interaction logging mechanism introduced for this iteration of the VBS. The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches that were showcased during the event. Given only a short textual description, finding the correct scene is even harder. In ad hoc search with multiple relevant scenes, the tools were mostly able to find at least one scene, whereas recall was the issue for many teams. The logs also reveal that even though recent exciting advances in machine learning narrow the classical semantic gap problem, user-centric interfaces are still required to mediate access to specific content. Finally, open challenges and lessons learned are presented for future VBS events.

References

[1]
Elasticsearch: RESTful, Distributed Search 8 Analytics. Home Page. Retrieved March 30, 2018, from https://www.elastic.co/products/elasticsearch.
[2]
NearPy. Home Page. Retrieved March 30, 2018, from https://github.com/pixelogik/NearPy.
[3]
Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2017. Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (CMBI’17). 26:1--26:4.
[4]
George Awad, Asad Butt, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, et al. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of the 17th AnnualTREC Video Retrieval Evaluation (TRECVID’17).
[5]
Kai Uwe Barthel and Nico Hezel. 2018. Visually exploring millions of images using image maps and graphs. In Big Data Analytics for Large-Scale Multimedia Search, B. Huet, S. Vrochidis, and E. Chang (Eds.). John Wiley 8 Sons, New Jersey, 251--275.
[6]
Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2015. Graph-based browsing for large video collections. In MultiMedia Modeling, X. He, S. Luo, D. Tao, C. Xu, J. Yang, and M. A. Hasan (Eds.). Springer International Publishing, Cham, Switzerland, 237--242.
[7]
Kai Uwe Barthel, Nico Hezel, and Radek Mackowiak. 2016. Navigating a graph of scenes for exploring large video collections. In MultiMedia Modeling, Q. Tian, N. Sebe, G.-J. Qi, B. Huet, R. Hong, and X. Liu (Eds.). Springer International Publishing, Cham, Switzerland, 418--423.
[8]
Claudiu Cobârzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, et al. 2017. Interactive video search tools: A detailed analysis of the Video Browser Showdown 2015. Multimedia Tools and Applications 76, 4, 5539--5571.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 248--255.
[10]
Ivan Giangreco and Heiko Schuldt. 2016. ADAM pro: Database support for big multimedia retrieval. Datenbank-Spektrum 16, 1, 17--26.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. arXiv:1603.05027. http://arxiv.org/abs/1603.05027.
[12]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. 448--456.
[13]
Melody Y. Ivory and Marti A. Hearst. 2001. The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys 33, 4, 470--516.
[14]
Justin Johnson, Andrej Karpathy, and Fei-Fei Li. 2015. DenseCap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571. http://arxiv.org/abs/1511.07571.
[15]
Teuvo Kohonen. 1998. The self-organizing map. Neurocomputing 21, 1-3, 1--6.
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[17]
Martha Larson, Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, and Gareth J. F. Jones. 2017. The benchmarking initiative for multimedia evaluation: MediaEval 2016. IEEE MultiMedia 24, 1, 93--96.
[18]
Andreas Leibetseder, Sabrina Kletz, and Klaus Schoeffmann. 2018. Sketch-based similarity search for collaborative feature maps. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 425--430.
[19]
Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1, 1--19.
[20]
N. Liu and J. Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 678--686.
[21]
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer, and G. Awad. 2018. On influential trends in interactive video retrieval: Video Browser Showdown 2015-2017. IEEE Transactions on Multimedia 20, 12, 3361--3376.
[22]
Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2018. Revisiting SIRET video retrieval tool. In Proceedings of the 24th International Conference on Multimedia Modeling (MMM’18), Part II. 419--424.
[23]
Jakub Lokoč, Tomáš Souček, and Gregor Kovalčík. 2018. Using an interactive video retrieval tool for lifelog data. In Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge (LSC’18). ACM, New York, NY, 15--19.
[24]
Yi-Jie Lu, Phuong Anh Nguyen, Hao Zhang, and Chong-Wah Ngo. 2017. Concept-based interactive search system. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N.E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 463--468.
[25]
Michael McCandless, Erik Hatcher, and Otis Gospodnetic. 2010. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications, Greenwich, CT.
[26]
Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 407--412.
[27]
Manfred Jürgen Primus, Bernd Münzer, Andreas Leibetseder, and Klaus Schoeffmann. 2018. The ITEC collaborative video search system at the Video Browser Showdown 2018. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 438--443.
[28]
Marek Rogozinski Rafal Kuc. 2013. Mastering ElasticSearch. Packt Publishing.
[29]
Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, faster, stronger. arXiv:1612.08242. http://arxiv.org/abs/1612.08242
[30]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS).
[31]
Luca Rossetto, Ivan Giangreco, Ralph Gasser, and Heiko Schuldt. 2018. Competitive video retrieval with vitrivr. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 403--406.
[32]
Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: A multi-feature sketch-based video retrieval engine. In Proceedings of the 2014 IEEE International Symposium on Multimedia. 18--23.
[33]
Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. Vitrivr: A flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In Proceedings of the 2016 ACM Conference on Multimedia (MM’16). ACM, New York, NY, 1183--1186.
[34]
Sitapa Rujikietgumjorn, Nattachai Watcharapinchai, and Sanparith Marukatat. 2018. Sloth search system. In Proceedings of the 24th International Conference on MultiMedia Modeling (MMM’18), Part II. 431--437.
[35]
Klaus Schoeffmann. 2014. A user-centric media retrieval competition: The Video Browser Showdown 2012-2014. IEEE MultiMedia 21, 4, 8--13.
[36]
Klaus Schoeffmann, Frank Hopfgartner, Oge Marques, Laszlo Boeszoermenyi, and Joemon M. Jose. 2010. Video browsing interfaces and applications: A review. SPIE Reviews 1, 1, 018004.
[37]
Klaus Schoeffmann, Marco A. Hudelist, and Jochen Huber. 2015. Video interaction tools: A survey of recent work. ACM Computing Surveys 48, 1, Article 14 (Sept. 2015), 34 pages.
[38]
Klaus Schoeffmann, Manfred Jürgen Primus, Bernd Muenzer, Stefan Petscharnig, Christof Karisch, Qing Xu, et al. 2017. Collaborative Feature Maps for Interactive Video Search. Springer International Publishing, Cham, Switzerland, 457--462.
[39]
Mei-Ling Shyu, Zongxing Xie, Min Chen, and Shu-Ching Chen. 2008. Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia 10, 2, 252--259.
[40]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556. http://arxiv.org/abs/1409.1556.
[41]
R. Smith. 2007. An overview of the tesseract OCR engine. In Proceedings of the 9th International Conference on Document Analysis and Recognition—Volume 02 (ICDAR’07). IEEE, Los Alamitos, CA, 629--633.
[42]
Cees G. M. Snoek and Marcel Worring. 2005. Multimedia event-based video indexing using time intervals. IEEE Transactions on Multimedia 7, 4, 638--647.
[43]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, et al. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.
[44]
Thanh-Dat Truong, Vinh-Tiep Nguyen, Minh-Triet Tran, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo, et al. 2018. Video search based on semantic extraction and locally regional object proposal. In MultiMedia Modeling, K. Schoeffmann, T. H. Chalidabhongse, C. W. Ngo, S. Aramvith, N. E. O’Connor, Y.-S. Ho, et al. (Eds.). Springer International Publishing, Cham, Switzerland, 451--456.
[45]
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2017. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4, 652--663.
[46]
Marcel Worring, Paul Sajda, Simone Santini, David A. Shamma, Alan F. Smeaton, and Qiang Yang. 2012. Where is the user in multimedia retrieval? IEEE MultiMedia 19, 4, 6--10.
[47]
Zheng-Jun Zha, Meng Wang, Yan-Tao Zheng, Yi Yang, Richang Hong, and Tat-Seng Chua. 2012. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia 14, 1, 17--27.
[48]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6, 1452--1464.
[49]
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’14). 487--495.

Cited By

View all

Index Terms

  1. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1
    February 2019
    265 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3309717
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 February 2019
    Accepted: 01 November 2018
    Revised: 01 November 2018
    Received: 01 July 2018
    Published in TOMM Volume 15, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Interactive video retrieval
    2. content-based methods
    3. evaluation campaigns
    4. video browsing

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Swiss National Science Foundation (SNSF)
    • European Regional Development Fund and the Carinthian Economic Promotion Fund (KWF)
    • Universität Klagenfurt and Lakeside Labs GmbH, Klagenfurt, Austria
    • Council of the Hong Kong Special Administrative Region, China
    • Czech Science Foundation (GAČR)
    • Horizon 2020 Research and Innovation Programme V4Design
    • CHIST-ERA project IMOTION

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media