research-article

Open access

Narrative Graph for Narrative Generation from Long Videos

Authors:

Rishabh Sheoran,

Mohan KankanhalliAuthors Info & Claims

NarSUM '23: Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos

Pages 31 - 40

https://doi.org/10.1145/3607540.3617142

Published: 29 October 2023 Publication History

Abstract

Advancements in camera technology and cloud storage have led to a surge in video content creation, making videos more accessible. However, consuming raw, unprocessed, and lengthy videos can be unengaging. While videos with human-authored narratives (such as videos on YouTube) are captivating, creating such video requires a tremendous amount of effort and skill, and its scalability remains a bottleneck. To address this, we propose an algorithmic narrator that generates topic-specific narratives in real-time from raw videos, inspired by ChatGPT's natural language processing capabilities. Specifically, we proposed a novel narrative graph structure that captures narrative-worthy and semantically enriched factual information, as well as establishes temporal and causal links between narrative segments. The narrative graph is then fed to the algorithmic narrator to generate a textual narrative summary. Our comprehensive empirical study demonstrates the potential of algorithmic narrators and narrative graphs in creating engaging and coherent narratives, offering insights for the future of video content consumption.

References

[1]

Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1 (ACL '98/COLING '98). Association for Computational Linguistics, 86--90.

[2]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, LAW-ID@ACL 2013, August 8--9, 2013, Sofia, Bulgaria. The Association for Computer Linguistics, 178--186.

[3]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72.

[4]

Xiaoyi Bao, Wang Zhongqing, Xiaotong Jiang, Rong Xiao, and Shoushan Li. 2022. Aspect-based Sentiment Analysis with Opinion Tree Generation. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4044--4050.

[5]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).

[6]

Vinay Chaudhri, Chaitanya Baru, Naren Chittar, Xin Dong, Michael Genesereth, James Hendler, Aditya Kalyanpur, Douglas Lenat, Juan Sequeda, Denny Vrande?i?, and Kuansan Wang. 2022. Knowledge Graphs: Introduction, History and, Perspectives., Vol. 43 (2022), 17--29.

[7]

Vincent S Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, and Li Fei-Fei. 2019. Scene graph prediction with limited labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2580--2590.

[8]

Dorrit Cohn. 2000. The distinction of fiction. JHU Press.

[9]

Ann Copestake, Dan Flickinger, Carl Pollard, and Ivan Sag. 2005. Minimal Recursion Semantics: An Introduction. Research On Language And Computation, Vol. 3 (2005), 281--332.

[10]

Harper Eric, Majumdar Somshubra, Kuchaiev Oleksii, Jason Li, Zhang Yang, Bakhturina Evelina, Noroozi Vahid, Subramanian Sandeep, Nithin Koluguri, Jocelyn Huang, Jia Fei, Balam Jagadeesh, Yang Xuesong, Livne Micha, Dong Yi, Naren Sean, and Ginsburg Boris. 2022. NeMo: a toolkit for Conversational AI and Large Language Models. https://nvidia.github.io/NeMo/

[11]

Charles J Fillmore, Christopher R Johnson, and Miriam RL Petruck. 2003. Background to framenet. International journal of lexicography, Vol. 16, 3 (2003), 235--250.

[12]

Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang, and Yu Qiao. 2022. Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18--24, 2022. IEEE, 2980--2989.

[13]

Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep Semantic Role Labeling: What Works and What's Next. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 473--483.

[14]

Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 961--970.

[15]

Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, Vol. 8 (2020), 64--77.

[16]

Daniel Jurafsky and James H. Martin. 2023. Speech and Language Processing (3rd Edition Draft). https://web.stanford.edu/ jurafsky/slp3/ed3book_jan72023.pdf

[17]

Di Kang, Zheng Ma, and Antoni B. Chan. 2019. Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks - Counting, Detection, and Tracking. IEEE Trans. Circuits Syst. Video Technol., Vol. 29, 5 (2019), 1408--1422.

Digital Library

[18]

Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramon Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue, et al. 2020. Leveraging abstract meaning representation for knowledge base question answering. arXiv preprint arXiv:2012.01707 (2020).

[19]

Insoo Kim, Seungju Han, Seong-Jin Park, Ji-Won Baek, Jinwoo Shin, Jae-Joon Han, and Changkyu Choi. 2020. DiscFace: Minimum Discrepancy Learning for Deep Face Recognition. In Proceedings of the Asian Conference on Computer Vision (ACCV).

[20]

Minchul Kim, Anil K. Jain, and Xiaoming Liu. 2022. AdaFace: Quality Adaptive Margin for Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18750--18759.

[21]

Paul Kingsbury and Martha Palmer. 2003. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, Vol. 3. Citeseer.

[22]

Jose Angel Garcia Landa. 2005. Narrative theory. University of Zaragoza. On Line Edition (2005).

[23]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online.

[24]

Changmao Li and Jeffrey Flanigan. 2022. Improving Neural Machine Translation with the Abstract Meaning Representation by Combining Graph and Sequence Transformers. In Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022). Association for Computational Linguistics, 12--21.

[25]

Tianshan Liu and Kin-Man Lam. 2022. A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13904--13913.

[26]

Weizhe Liu, Nikita Durasov, and Pascal Fua. 2022. Leveraging Self-Supervision for Cross-Domain Crowd Counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5341--5352.

[27]

Manuel Mager, Ramón Fernandez Astudillo, Tahira Naseem, Md Arafat Sultan, Young-Suk Lee, Radu Florian, and Salim Roukos. 2020. GPT-too: A Language-Model-First Approach for AMR-to-Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1846--1852.

[28]

Diego Marcheggiani and Ivan Titov. 2017. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1506--1515.

[29]

Gardner Matt, Grus Joel, Neumann Mark, Tafjord Oyvind, Dasigi Pradeep, Liu Nelson, Peters Matthew, Schmitz Michael, and Zettlemoyer Luke. [n.,d.]. AllenNLP: A Deep Semantic Natural Language Processing Platform. https://github.com/allenai/allennlp

[30]

Niall McLaughlin, Jesus Martinez del Rincon, and Paul Miller. 2016. Recurrent Convolutional Network for Video-Based Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]

Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixé, and Christoph Feichtenhofer. 2022. TrackFormer: Multi-Object Tracking with Transformers. In CVPR. 8844--8854.

[32]

Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 186--191.

[33]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., Vol. 21 (2020), 140:1--140:67.

[34]

Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher D Manning. 2010. A multi-pass sieve for coreference resolution. In Proceedings of the 2010 conference on empirical methods in natural language processing. 492--501.

Digital Library

[35]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]

Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich Schütze, and Iryna Gurevych. 2021. Investigating Pretrained Language Models for Graph-to-Text Generation. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguistics.

[37]

Brian Richardson. 2000. Recent concepts of narrative and the narratives of narrative theory. Style, Vol. 34, 2 (2000), 168--175.

[38]

Josef Ruppenhofer, Michael Ellsworth, Miriam R. L. Petruck, Christopher R. Johnson, and Jan Scheffczyk. 2006. FrameNet II: Extended theory and practice.

[39]

Xindi Shang, Zehuan Yuan, Anran Wang, and Changhu Wang. 2021. Multimodal Video Summarization via Time-Aware Transformers. In ACM Multimedia. 1756--1765.

[40]

Peng Shi and Jimmy J. Lin. 2019. Simple BERT Models for Relation Extraction and Semantic Role Labeling. ArXiv, Vol. abs/1904.05255 (2019).

[41]

Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2016. Sparsifying Neural Network Connections for Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]

Sellam Thibault, Das Dipanjan, and Parikh Ankur. 2020. BLEURT: Learning Robust Metrics for Text Generation. In ACL.

[43]

Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, Vol. abs/1910.01108 (2019).

[44]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 38--45.

[45]

Yongkang Wong, Shaojing Fan, Yangyang Guo, Ziwei Xu, Karen Stephen, Rishabh Sheoran, Anusha Bhamidipati, Vivek Barsopia, Jianquan Liu, and Mohan S. Kankanhalli. 2022. Compute to Tell the Tale: Goal-Driven Narrative Generation. In MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022. ACM, 6875--6882.

[46]

Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).

[47]

Zhihua Yan and Xijin Tang. 2023. Narrative Graph: Telling Evolving Stories Based on Event-centric Temporal Knowledge Graph. Journal of Systems Science and Systems Engineering, Vol. 32, 2 (2023), 206--221.

[48]

Fan Yang, Xin Chang, Sakriani Sakti, Yang Wu, and Satoshi Nakamura. 2021. ReMOT: A model-agnostic refinement for multiple object tracking. Image and Vision Computing, Vol. 106 (2021), 104091.

[49]

Jinrui Yang, Wei-Shi Zheng, Qize Yang, Ying-Cong Chen, and Qi Tian. 2020. Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]

Ting Yao, Tao Mei, and Yong Rui. 2016. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization. In CVPR. 982--990.

[51]

Shuai Yi, Xiaogang Wang, Cewu Lu, and Jiaya Jia. 2014. L0 Regularized Stationary Time Estimation for Crowd Group Analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23--28, 2014. IEEE Computer Society, 2219--2226.

Digital Library

[52]

Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. In CVPR. 7405--7414.

[53]

Luowei Zhou, Chenliang Xu, and Jason J Corso. 2017. Towards Automatic Learning of Procedures from Web Instructional Videos. arXiv preprint arXiv:1703.09788 (2017). io

Index Terms

Narrative Graph for Narrative Generation from Long Videos
1. Applied computing
2. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

Compute to Tell the Tale: Goal-Driven Narrative Generation
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Man is by nature a social animal. One important facet of human evolution is through narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The factual narrative, such as news, journalism, field report, etc., is ...
Narrative Urgency: Motivating Action in Interactive Digital Media
Interactive Storytelling
Abstract
In this paper, we address specific problems related to the temporal development of narratives in games and interactive media in general. Game narratives can be inconsequential when they progress isolated from coherent temporal aspects, which in ...
Interactive Narrative in Virtual Reality
MUM '18: Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia

Interactive fiction is a literary genre that is rapidly gaining popularity. In this genre, readers are able to explicitly take actions in order to guide the course of the story. With the recent popularity of narrative focused games, we propose to design ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

NarSUM '23: Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos

October 2023

82 pages

ISBN:9798400702778

DOI:10.1145/3607540

General Chairs:
Mohan S. Kankanhalli
National University of Singapore
,
Ioannis (Yiannis) Patras
Queen Mary University of London
,
Program Chairs:
Jianquan Liu
NEC Corporation, Japan
,
Yongkang Wong
National University of Singapore
,
Takahiro Komamizu
Nagoya University, Japan

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation, Singapore

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29, 2023

Ottawa ON, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
333
Total Downloads

Downloads (Last 12 months)277
Downloads (Last 6 weeks)37

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents