research-article

An optimization model for temporal video lecture segmentation using word2vec and acoustic features

Authors:

Eduardo R. Soares,

Eduardo BarréreAuthors Info & Claims

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

Pages 513 - 520

https://doi.org/10.1145/3323503.3349548

Published: 29 October 2019 Publication History

Abstract

Video lectures are part of our daily lives. Whether to learn something new, review content for exams or just out of curiosity. People are increasingly looking for video lectures that address what they are looking for. Unfortunately, finding specific content in this type of video is not an easy task. Many video lectures are extensive and cover several topics, and not all of these topics are relevant to the user who has found the video. The result is that the user spends so much time trying to find topic of interest in the middle of content irrelevant to him. The temporal segmentation of video lectures in topics can solve this problem allowing users to navigate of a non-linear way through all topics of a video lecture. However, temporal video lecture segmentation is not an easy task and needs to be automatized. For this reason, in this paper we propose an optimization model for the temporal video lecture segmentation problem. This model uses as features the Word2Vec representation of video lecture's audio transcripts and low-level acoustic characteristics. To find the best video partition, an genetic algorithm with local search is used. We have performed experiments in two data sets and results showed that our proposal is able to overcome state-of-the-art methods and achieve good results for different kinds of video lectures.

References

[1]

Arun Balagopalan, Lalitha Lakshmi Balasubramanian, Vidhya Balasubramanian, Nithin Chandrasekharan, and Aswin Damodar. 2012. Automatic keyphrase extraction and segmentation of video lectures. In 2012 IEEE International Conference on Technology Enhanced Education (ICTEE). IEEE, 1--10.

[2]

Arijit Biswas, Ankit Gandhi, and Om Deshmukh. 2015. Mmtoc: A multimodal method for table of content creation in educational videos. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 621--630.

Digital Library

[3]

Xiaoyin Che, Sheng Luo, Haojin Yang, and Christoph Meinel. 2016. Sentence-Level Automatic Lecture Highlighting Based on Acoustic Analysis. In Computer and Information Technology (CIT), 2016 IEEE International Conference on. IEEE, 328--334.

[4]

Xiaoyin Che, Haojin Yang, and Christoph Meinel. 2018. Automatic Online Lecture Highlighting Based on Multimedia Analysis. IEEE Transactions on Learning Technologies 11, 1 (2018), 27--40.

[5]

Kenny Davila and Richard Zanibbi. 2017. Whiteboard Video Summarization via Spatio-Temporal Conflict Minimization. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, Vol. 1. IEEE, 355--362.

[6]

Kenneth A De Jong and William M Spears. 1992. A formal analysis of the role of multi-point crossover in genetic algorithms. Annals of mathematics and Artificial intelligence 5, 1 (1992), 1--26.

[7]

Marco Furini, Silvia Mirri, and Manuela Montangero. 2018. Topic-based playlist to improve video lecture accessibility. In 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1--5.

[8]

Damianos Galanopoulos and Vasileios Mezaris. 2019. Temporal lecture video fragmentation using word embeddings. In International Conference on Multimedia Modeling. Springer, 254--265.

[9]

Fred Glover and Manuel Laguna. 1998. Tabu search. In Handbook of combinatorial optimization. Springer, 2093--2229.

[10]

David E Goldberg and John H Holland. 1988. Genetic algorithms and machine learning. Machine learning 3, 2 (1988), 95--99.

[11]

Yoav Goldberg and Omer Levy. 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014).

[12]

Sander Greenland, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, and Douglas G Altman. 2016. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology 31, 4 (2016), 337--350.

[13]

Rodrigo Mitsuo Kishi and Rudinei Goularte. 2016. Video scene segmentation through an early fusion multimodal approach. Anais do XXII Simpósio Brasileiro de Sistemas Multimídia e Web 2 (2016), 41--46.

[14]

Daniël Lakens. 2017. Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science 8, 4 (2017), 355--362.

[15]

Greg C Lee, Fu-Hao Yeh, Ying-Ju Chen, and Tao-Ku Chang. 2017. Robust handwriting extraction and lecture video summarization. Multimedia Tools and Applications 76, 5 (2017), 7067--7085.

Digital Library

[16]

Ming Lin, Michael Chau, Jinwei Cao, and Jay F Nunamaker Jr. 2005. Automated video segmentation for lecture videos: A linguistics-based approach. International Journal of Technology and Human Interaction (IJTHI) 1, 2 (2005), 27--45.

[17]

Wang Ling, Chris Dyer, Alan W Black, and Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1299--1304.

[18]

Debabrata Mahapatra, Ragunathan Mariappan, and Vaibhav Rajan. 2018. Automatic Hierarchical Table of Contents Generation for Educational Videos. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, 267--274.

[19]

Marco Ronchetti. 2010. Using video lectures to make teaching more interactive. International Journal of Emerging Technologies in Learning (iJET) 5, 2 (2010).

[20]

Rajiv Ratn Shah, Yi Yu, Anwar Dilawar Shaikh, Suhua Tang, and Roger Zimmermann. 2014. ATLAS: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 209--212.

Digital Library

[21]

Eduardo R Soares and Eduardo Barrére. 2018. Automatic Topic Segmentation for Video Lectures Using Low and High-Level Audio Features. In Proceedings of the 24th Brazilian Symposium on Multimedia and the Web. ACM, 189--196.

Digital Library

[22]

Badri Narayan Subudhi, T Veerakumar, Deepak Yadav, Amol P Suryavanshi, and SN Disha. 2017. Video Skimming for Lecture Video Sequences Using Histogram Based Low Level Features. In 2017 IEEE 7th International Advance Computing Conference (IACC). IEEE, 684--689.

[23]

Tayfun Tuna, Mahima Joshi, Varun Varghese, Rucha Deshpande, Jaspal Subhlok, and Rakesh Verma. 2015. Topic based segmentation of classroom videos. In 2015 IEEE Frontiers in Education Conference (FIE). IEEE, 1--9.

Digital Library

[24]

Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 753--761.

Digital Library

[25]

Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 43--52.

Cited By

Halawa AGamalel-Din SNasr A(2023)EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGSApplied Computer Science10.35784/acs-2023-2019:2(126-141)Online publication date: 30-Jun-2023
https://doi.org/10.35784/acs-2023-20
Liu XGu WOta KHasegawa S(2023)Design of Voice Style Detection of Lecture ArchivesTENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)10.1109/TENCON58879.2023.10322336(1139-1144)Online publication date: 31-Oct-2023
https://doi.org/10.1109/TENCON58879.2023.10322336
Mishra GRaj AKumar AKasaudhan AKumar Mishra PMaini T(2023)Indexing and Segmentation of Video Contents: A Review2023 4th International Conference for Emerging Technology (INCET)10.1109/INCET57972.2023.10170589(1-9)Online publication date: 26-May-2023
https://doi.org/10.1109/INCET57972.2023.10170589
Show More Cited By

Index Terms

An optimization model for temporal video lecture segmentation using word2vec and acoustic features
1. Applied computing
  1. Document management and text processing
    1. Document preparation
  2. Education
    1. E-learning
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Automatic Topic Segmentation for Video Lectures Using Low and High-Level Audio Features
WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

Nowadays, video lectures are a very popular way to transmit knowledge, and because of that, there are many repositories with a large catalog of those videos on web. Despite all benefits that this high availability of video lectures brings, some problems ...
Optimisation des hyperparamètres des réseaux de Neurones Profonds
Trends Analysis of Topics Based on Temporal Segmentation
DaWaK '09: Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery

Extracting interesting information from large unstructured document sets is a time consuming task. In this paper, we describe an approach to analyze the temporal trends of a given topic in a time-stamped document set based on time series segmentation. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

October 2019

537 pages

ISBN:9781450367639

DOI:10.1145/3323503

General Chairs:
Joel dos Santos
CEFET/RJ
,
Débora Christina Muchaluat Saade
UFF
,
Maria da Graça C. Pimentel
University of Sao Paulo, Brazil
,
Alessandra Alaniz Macedo
University of Sao Paulo, Brazil

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conference

WebMedia '19

WebMedia '19: Brazilian Symposium on Multimedia and the Web

October 29 - November 1, 2019

Rio de Janeiro, Brazil

Acceptance Rates

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
248
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Halawa AGamalel-Din SNasr A(2023)EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGSApplied Computer Science10.35784/acs-2023-2019:2(126-141)Online publication date: 30-Jun-2023
https://doi.org/10.35784/acs-2023-20
Liu XGu WOta KHasegawa S(2023)Design of Voice Style Detection of Lecture ArchivesTENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)10.1109/TENCON58879.2023.10322336(1139-1144)Online publication date: 31-Oct-2023
https://doi.org/10.1109/TENCON58879.2023.10322336
Mishra GRaj AKumar AKasaudhan AKumar Mishra PMaini T(2023)Indexing and Segmentation of Video Contents: A Review2023 4th International Conference for Emerging Technology (INCET)10.1109/INCET57972.2023.10170589(1-9)Online publication date: 26-May-2023
https://doi.org/10.1109/INCET57972.2023.10170589
Wu JSun YKong YShu HSenhadji L(2023)AHMN: A multi-modal network for long MOOC videos chapter segmentationMultimedia Tools and Applications10.1007/s11042-023-17654-283:40(88523-88541)Online publication date: 20-Nov-2023
https://doi.org/10.1007/s11042-023-17654-2
Shihab FEkmekci D(2023)Tweet Classification on the Base of Sentiments Using Deep LearningComputer Vision and Robotics10.1007/978-981-19-7892-0_12(139-156)Online publication date: 28-Apr-2023
https://doi.org/10.1007/978-981-19-7892-0_12
Chand DOgul H(2021)A Framework for Lecture Video Segmentation from Extracted Speech Content2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)10.1109/SAMI50585.2021.9378632(000299-000304)Online publication date: 21-Jan-2021
https://doi.org/10.1109/SAMI50585.2021.9378632
Davila KXu FSetlur SGovindaraju V(2021)FCN-LectureNet: Extractive Summarization of Whiteboard and Chalkboard Lecture VideosIEEE Access10.1109/ACCESS.2021.30994279(104469-104484)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3099427
Xu FDavila KSetlur SGovindaraju V(2021)Skeleton-Based Methods for Speaker Action Classification on Lecture VideosPattern Recognition. ICPR International Workshops and Challenges10.1007/978-3-030-68799-1_18(250-264)Online publication date: 5-Mar-2021
https://doi.org/10.1007/978-3-030-68799-1_18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents