skip to main content
10.1145/3323503.3349548acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

An optimization model for temporal video lecture segmentation using word2vec and acoustic features

Published: 29 October 2019 Publication History

Abstract

Video lectures are part of our daily lives. Whether to learn something new, review content for exams or just out of curiosity. People are increasingly looking for video lectures that address what they are looking for. Unfortunately, finding specific content in this type of video is not an easy task. Many video lectures are extensive and cover several topics, and not all of these topics are relevant to the user who has found the video. The result is that the user spends so much time trying to find topic of interest in the middle of content irrelevant to him. The temporal segmentation of video lectures in topics can solve this problem allowing users to navigate of a non-linear way through all topics of a video lecture. However, temporal video lecture segmentation is not an easy task and needs to be automatized. For this reason, in this paper we propose an optimization model for the temporal video lecture segmentation problem. This model uses as features the Word2Vec representation of video lecture's audio transcripts and low-level acoustic characteristics. To find the best video partition, an genetic algorithm with local search is used. We have performed experiments in two data sets and results showed that our proposal is able to overcome state-of-the-art methods and achieve good results for different kinds of video lectures.

References

[1]
Arun Balagopalan, Lalitha Lakshmi Balasubramanian, Vidhya Balasubramanian, Nithin Chandrasekharan, and Aswin Damodar. 2012. Automatic keyphrase extraction and segmentation of video lectures. In 2012 IEEE International Conference on Technology Enhanced Education (ICTEE). IEEE, 1--10.
[2]
Arijit Biswas, Ankit Gandhi, and Om Deshmukh. 2015. Mmtoc: A multimodal method for table of content creation in educational videos. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 621--630.
[3]
Xiaoyin Che, Sheng Luo, Haojin Yang, and Christoph Meinel. 2016. Sentence-Level Automatic Lecture Highlighting Based on Acoustic Analysis. In Computer and Information Technology (CIT), 2016 IEEE International Conference on. IEEE, 328--334.
[4]
Xiaoyin Che, Haojin Yang, and Christoph Meinel. 2018. Automatic Online Lecture Highlighting Based on Multimedia Analysis. IEEE Transactions on Learning Technologies 11, 1 (2018), 27--40.
[5]
Kenny Davila and Richard Zanibbi. 2017. Whiteboard Video Summarization via Spatio-Temporal Conflict Minimization. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, Vol. 1. IEEE, 355--362.
[6]
Kenneth A De Jong and William M Spears. 1992. A formal analysis of the role of multi-point crossover in genetic algorithms. Annals of mathematics and Artificial intelligence 5, 1 (1992), 1--26.
[7]
Marco Furini, Silvia Mirri, and Manuela Montangero. 2018. Topic-based playlist to improve video lecture accessibility. In 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1--5.
[8]
Damianos Galanopoulos and Vasileios Mezaris. 2019. Temporal lecture video fragmentation using word embeddings. In International Conference on Multimedia Modeling. Springer, 254--265.
[9]
Fred Glover and Manuel Laguna. 1998. Tabu search. In Handbook of combinatorial optimization. Springer, 2093--2229.
[10]
David E Goldberg and John H Holland. 1988. Genetic algorithms and machine learning. Machine learning 3, 2 (1988), 95--99.
[11]
Yoav Goldberg and Omer Levy. 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014).
[12]
Sander Greenland, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, and Douglas G Altman. 2016. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology 31, 4 (2016), 337--350.
[13]
Rodrigo Mitsuo Kishi and Rudinei Goularte. 2016. Video scene segmentation through an early fusion multimodal approach. Anais do XXII Simpósio Brasileiro de Sistemas Multimídia e Web 2 (2016), 41--46.
[14]
Daniël Lakens. 2017. Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science 8, 4 (2017), 355--362.
[15]
Greg C Lee, Fu-Hao Yeh, Ying-Ju Chen, and Tao-Ku Chang. 2017. Robust handwriting extraction and lecture video summarization. Multimedia Tools and Applications 76, 5 (2017), 7067--7085.
[16]
Ming Lin, Michael Chau, Jinwei Cao, and Jay F Nunamaker Jr. 2005. Automated video segmentation for lecture videos: A linguistics-based approach. International Journal of Technology and Human Interaction (IJTHI) 1, 2 (2005), 27--45.
[17]
Wang Ling, Chris Dyer, Alan W Black, and Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1299--1304.
[18]
Debabrata Mahapatra, Ragunathan Mariappan, and Vaibhav Rajan. 2018. Automatic Hierarchical Table of Contents Generation for Educational Videos. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, 267--274.
[19]
Marco Ronchetti. 2010. Using video lectures to make teaching more interactive. International Journal of Emerging Technologies in Learning (iJET) 5, 2 (2010).
[20]
Rajiv Ratn Shah, Yi Yu, Anwar Dilawar Shaikh, Suhua Tang, and Roger Zimmermann. 2014. ATLAS: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 209--212.
[21]
Eduardo R Soares and Eduardo Barrére. 2018. Automatic Topic Segmentation for Video Lectures Using Low and High-Level Audio Features. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web. ACM, 189--196.
[22]
Badri Narayan Subudhi, T Veerakumar, Deepak Yadav, Amol P Suryavanshi, and SN Disha. 2017. Video Skimming for Lecture Video Sequences Using Histogram Based Low Level Features. In 2017 IEEE 7th International Advance Computing Conference (IACC). IEEE, 684--689.
[23]
Tayfun Tuna, Mahima Joshi, Varun Varghese, Rucha Deshpande, Jaspal Subhlok, and Rakesh Verma. 2015. Topic based segmentation of classroom videos. In 2015 IEEE Frontiers in Education Conference (FIE). IEEE, 1--9.
[24]
Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 753--761.
[25]
Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 43--52.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
October 2019
537 pages
ISBN:9781450367639
DOI:10.1145/3323503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ASR
  2. audio analysis
  3. genetic algorithm
  4. temporal segmentation
  5. video lectures

Qualifiers

  • Research-article

Funding Sources

  • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conference

WebMedia '19
WebMedia '19: Brazilian Symposium on Multimedia and the Web
October 29 - November 1, 2019
Rio de Janeiro, Brazil

Acceptance Rates

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media