research-article

Open access

Compute to Tell the Tale: Goal-Driven Narrative Generation

Authors:

Rishabh Sheoran,

Anusha Bhamidipati,

Vivek Barsopia,

Mohan KankanhalliAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6875 - 6882

https://doi.org/10.1145/3503161.3549202

Published: 10 October 2022 Publication History

Abstract

Man is by nature a social animal. One important facet of human evolution is through narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The factual narrative, such as news, journalism, field report, etc., is based on real-world events and often requires extensive human efforts to create. In the era of big data where video capture devices are commonly available everywhere, a massive amount of raw videos (including life-logging, dashcam or surveillance footage) are generated daily. As a result, it is rather impossible for humans to digest and analyze these video data. This paper reviews the problem of computational narrative generation where a goal-driven narrative (in the form of text with or without video) is generated from a single or multiple long videos. Importantly, the narrative generation problem makes itself distinguished from the existing literature by its focus on a comprehensive understanding of user goal, narrative structure and open-domain input. We tentatively outline a general narrative generation framework and discuss the potential research problems and challenges in this direction. Informed by the real-world impact of narrative generation, we then illustrate several practical use cases in Video Logging as a Service platform which enables users to get more out of the data through a goal-driven intelligent storytelling AI agent.

Supplementary Material

MP4 File (mmbni17.mp4)

This work introduces a novel goal-driven computational factual narrative generation task for long videos. We delineate inspiration from social science literature, as well as discuss the challenges of the proposed task. This video provides a brief overview of the paper.

Download
24.19 MB

References

[1]

Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. 2011. Analyzing User Modeling on Twitter for Personalized News Recommendations. In UMAP (Lecture Notes in Computer Science, Vol. 6787). Springer, 1--12.

[2]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR. 6077--6086.

[3]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV. 2425--2433.

[4]

Markus Appel and Tobias Richter. 2007. Persuasive effects of fictional narratives increase over time. Media Psychology 10, 1 (2007), 113--134.

[5]

Skylar Bayer and Annaliese Hettinger. 2019. Storytelling: A Natural Tool to Weave the Threads of Science and Community Together. The Bulletin of the Ecological Society of America 100, 2 (2019), e01542.

[6]

Sébastien Caquard and Daniel Naud. 2014. Aspatial typology of cinematographic narratives. Modern Cartography Series 5 (2014), 161--174.

[7]

João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In CVPR. 4724--4733.

[8]

Gregory D. Castañón, Yuting Chen, Ziming Zhang, and Venkatesh Saligrama. 2015. Efficient Activity Retrieval through Semantic Graph Queries. In ACM Multimedia. 391--400.

[9]

Hong Chen, Yifei Huang, Hiroya Takamura, and Hideki Nakayama. 2021. Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling. In AAAI. 999--1008.

[10]

Shizhe Chen, Yida Zhao, Qin Jin, and Qi Wu. 2020. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning. In CVPR. 10635--10644.

[11]

Dorrit Cohn. 2000. The distinction of fiction. JHU Press.

[12]

John M. Conroy and Dianne P. O'Leary. 2001. Text Summarization via Hidden Markov Models. In SIGIR. 406--407.

[13]

Martin Cortazzi. 1994. Narrative analysis. Language Teaching 27, 3 (1994), 157--170.

[14]

Mariam Daoud, Lynda Tamine-Lechani, Mohand Boughanem, and Bilal Chebaro. 2009. A session based personalized search using an ontological user profile. In SAC. 1732--1736.

[15]

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, and Dhruv Batra. 2017. Visual Dialog. In CVPR. 1080--1089.

[16]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In CVPR. 4690--4699.

[17]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.

[18]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.

[19]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In ICCV. 6201--6210.

[20]

Tian Gan, Yongkang Wong, Daqing Zhang, and Mohan S. Kankanhalli. 2013. Temporal encoded F-formation system for social interaction detection. In ACM Multimedia. 937--946.

[21]

Susan Gauch, Mirco Speretta, Aravind Chandramouli, and Alessandro Micarelli. 2007. User Profiles for Personalized Information Access. In The Adaptive Web (Lecture Notes in Computer Science, Vol. 4321). Springer, 54--89.

[22]

James Paul Gee and Francois Grosjean. 1984. Empirical evidence for narrative structure. Cognitive Science 8, 1 (1984), 59--85.

[23]

John C Georgesen and Cecilia H Solano. 1999. The effects of motivation on narrative content and structure. Journal of Language and Social Psychology 18, 2 (1999), 175--194.

[24]

Eleonora Giunchiglia and Thomas Lukasiewicz. 2021. Multi-Label Classification Neural Networks with Hard Logical Constraints. Journal of Artificial Intelligence Research 72 (2021), 759--818.

Digital Library

[25]

Melanie C Green. 2006. Narratives and cancer communication. Journal of Communication 56 (2006), S163--S183.

[26]

Jian Guan and Minlie Huang. 2020. UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation. In EMNLP. 9157--9166.

[27]

Jian Guan, Yansen Wang, and Minlie Huang. 2019. Story Ending Generation with Incremental Encoding and Commonsense Knowledge. In AAAI. 6473--6480.

[28]

Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Xin-Shun Xu, and Mohan S. Kankanhalli. 2018. Multi-modal Preference Modeling for Product Search. In ACM Multimedia. 1865--1873.

[29]

Yangyang Guo, Liqiang Nie, Yongkang Wong, Yibing Liu, Zhiyong Cheng, and Mohan Kankanhalli. 2022. A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA. In ACM Multimedia.

[30]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 386--397.

[31]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.

[32]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. 173--182.

[33]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[34]

Hans Hoeken and Jop Sinkeldam. 2014. The role of identification and perception of just outcome in evoking emotions in narrative persuasion. Journal of Communication 64, 5 (2014), 935--955.

[35]

Qiuyuan Huang, Zhe Gan, Asli Celikyilmaz, Dapeng Oliver Wu, Jianfeng Wang, and Xiaodong He. 2019. Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation. In AAAI. 8465--8472.

[36]

Qingbao Huang, Chuan Huang, Linzhang Mo, Jielong Wei, Yi Cai, Ho-fung Leung, and Qing Li. 2021. IgSEG: Image-guided Story Ending Generation. In ACL/IJCNLP (Findings). 3114--3123.

[37]

JenaD. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, and Yejin Choi. 2021. (Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs. In AAAI. 6384--6392.

[38]

Jingwei Ji, Ranjay Krishna, Li Fei-Fei, and Juan Carlos Niebles. 2020. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs. In CVPR. 10233--10244.

[39]

Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137.

[40]

Gibak Kim, Yang Lu, Y. Hu, and Philipos C. Loizou. 2009. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America 126, 3 (2009), 1486--1494.

[41]

Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In ICML (Proceedings of Machine Learning Research, Vol. 139). PMLR, 5583--5594.

[42]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision 123, 1 (2017), 32--73.

Digital Library

[43]

Wojciech Kryscinski, Bryan McCann, Caiming Xiong, and Richard Socher. 2020. Evaluating the Factual Consistency of Abstractive Text Summarization. In EMNLP. 9332--9346.

[44]

Jose Angel Garcia Landa. 2005. Narrative theory. University of Zaragoza. On Line Edition (2005).

[45]

Quoc V. Le and Tomás Mikolov. 2014. Distributed Representations of Sentences and Documents. In ICML (JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 1188--1196.

[46]

Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran. 2020. Action- Centric Relation Transformer Network for Video Question Answering. In CVPR. 9972--9981.

[47]

Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2017. Dual- Glance Model for Deciphering Social Relationships. In ICCV. 2669--2678.

[48]

Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2020. Video Storytelling: Textual Summaries for Events. IEEE Transactions on Multimedia 22, 2 (2020), 554--565.

Digital Library

[49]

Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2020. Visual Social Relationship Recognition. International Journal of Computer Vision 128, 6 (2020), 1750--1764.

Digital Library

[50]

Liang Li, Xingyu Gao, Jincan Deng, Yunbin Tu, Zheng-Jun Zha, and Qingming Huang. 2022. Long Short-Term Relation Transformer With Global Gating for Video Captioning. In IEEE Transactions on Image Processing, Vol. 31. 2726--2738.

Digital Library

[51]

Cewu Lu, Ranjay Krishna, Michael S. Bernstein, and Li Fei-Fei. 2016. Visual Relationship Detection with Language Priors. In ECCV (Lecture Notes in Computer Science, Vol. 9905). Springer, 852--869.

[52]

Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. 2017. The More You Know: Using Knowledge Graphs for Image Classification. In CVPR. 20--28.

[53]

Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixé, and Christoph Feichtenhofer. 2022. TrackFormer: Multi-Object Tracking with Transformers. In CVPR. 8844--8854.

[54]

Manel Mezghani, Corinne Amel Zayani, Ikram Amous, and Faïez Gargouri. 2012. A user profile modelling using social annotations: a survey. In WWW (Companion Volume). 969--976.

[55]

Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS. 3111--3119.

[56]

Emily Moyer-Gusé and Robin L Nabi. 2010. Explaining the effects of narrative in an entertainment television program: Overcoming resistance to persuasion. Human communication research 36, 1 (2010), 26--52.

[57]

J Murphy, S McDonough, R van Haren, B Triglone, and J Salinas. 2001. Practical strategies: STELLA narratives. Literacy Learning: the Middle Years 9, 2 (2001).

[58]

Medhini Narasimhan, Svetlana Lazebnik, and Alexander G. Schwing. 2018. Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering. In NeurIPS. 2659--2670.

[59]

Ira L Panzer, Alan D Sharpley, and William D Voiers. 1993. A comparison of subjective methods for evaluating speech quality. In Speech and audio coding for wireless and network applications. Springer, 59--65.

[60]

Cesc C. Park and Gunhee Kim. 2015. Expressing an Image Stream with a Sequence of Natural Sentences. In NIPS. 73--81.

Digital Library

[61]

Jiaxin Qi, Yulei Niu, Jianqiang Huang, and Hanwang Zhang. 2020. Two Causal Principles for Improving Visual Dialog. In CVPR. 10857--10866.

[62]

Brian Richardson. 2000. Recent concepts of narrative and the narratives of narrative theory. Style 34, 2 (2000), 168--175.

[63]

Mark O. Riedl and Robert Michael Young. 2010. Narrative Planning: Balancing Plot and Character. Journal of Artificial Intelligence Research 39 (2010), 217--268.

Digital Library

[64]

Devendra Singh Sachan, Siva Reddy, William L. Hamilton, Chris Dyer, and Dani Yogatama. 2021. End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering. In NeurIPS.

[65]

Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI. 3027--3035.

Digital Library

[66]

Anna Senina, Marcus Rohrbach, Wei Qiu, Annemarie Friedrich, Sikandar Amin, Mykhaylo Andriluka, Manfred Pinkal, and Bernt Schiele. 2014. Coherent Multi- Sentence Video Description with Variable Level of Detail. In German Conference on Pattern Recognition. 184--195.

[67]

Xindi Shang, Zehuan Yuan, AnranWang, and ChanghuWang. 2021. Multimodal Video Summarization via Time-Aware Transformers. In ACM Multimedia. 1756--1765.

[68]

Yang Shao and DeLiang Wang. 2008. Robust speaker identification using auditory features and computational auditory scene analysis. In ICASSP. 1589--1592.

[69]

Danqing Shi, Xinyue Xu, Fuling Sun, Yang Shi, and Nan Cao. 2021. Calliope: Automatic Visual Data Story Generation from a Spreadsheet. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2021), 453--463.

[70]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

[71]

Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Approach to Object Matching in Videos. In ICCV. 1470--1477.

Digital Library

[72]

Jeffrey R. Smitten and Ann Daghistany. 1981. Spatial Form in Narrative. Cornell Univ Press.

[73]

Richard Socher, Cliff Chiung-Yu Lin, AndrewY. Ng, and Christopher D. Manning. 2011. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML. 129--136.

[74]

Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In AAAI. 4444--4451.

[75]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104--3112.

Digital Library

[76]

Reuben Tan, Huijuan Xu, Kate Saenko, and Bryan A. Plummer. 2021. LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval. In WACV. 2082--2091.

[77]

Thomas Pellissier Tanon, Denny Vrandecic, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In WWW. 1419--1428.

[78]

Chongyang Tao, Lili Mou, Dongyan Zhao, and Rui Yan. 2018. RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems. In AAAI. 722--729.

[79]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR. 6450--6459.

[80]

Tao Tu, Qing Ping, Govindarajan Thattai, Gökhan Tür, and Prem Natarajan. 2021. Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation. In CVPR. 5622--5631.

[81]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.

[82]

Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, and Bastian Leibe. 2019. MOTS: Multi- Object Tracking and Segmentation. In CVPR. 7942--7951.

[83]

DeLiang Wang and Jitong Chen. 2018. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech and Language Processing 26, 10 (2018), 1702--1726.

Digital Library

[84]

Peng Wang, Qi Wu, Chunhua Shen, Anthony R. Dick, and Anton van den Hengel. 2018. FVQA: Fact-Based Visual Question Answering. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 10 (2018), 2413--2427.

Digital Library

[85]

YuxuanWang, Kun Han, and DeLiangWang. 2013. Exploring Monaural Features for Classification-Based Speech Segregation. IEEE/ACM Transactions on Audio, Speech and Language Processing 21, 2 (2013), 270--279.

Digital Library

[86]

Stephen G. Ware, R. Michael Young, Brent Harrison, and David L. Roberts. 2014. A Computational Model of Plan-Based Narrative Conflict at the Fabula Level. IEEE Transactions on Computational Intelligence and AI in Games 6, 3 (2014), 271--288.

[87]

Shuang Wu, Shaojing Fan, Zhiqi Shen, Mohan S. Kankanhalli, and Anthony K. H. Tung. 2020. Who You Are Decides How You Tell. In ACM Multimedia. 4013--4022.

[88]

Shuang Wu, Mohan S. Kankanhalli, and Anthony K. H. Tung. 2022. Superclassaware network for few-shot learning. Computer Vision and Image Understanding 216 (2022), 103349.

[89]

Yu Xiang, Alexandre Alahi, and Silvio Savarese. 2015. Learning to Track: Online Multi-object Tracking by Decision Making. In ICCV. 4705--4713.

[90]

Yaqi Xie, Ziwei Xu, Kuldeep S. Meel, Mohan S. Kankanhalli, and Harold Soh. 2019. Embedding Symbolic Knowledge into Deep Networks. In NeurIPS. 4235--4245.

[91]

Bingjie Xu, Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2020. Interact as You Intend: Intention-Driven Human-Object Interaction Detection. IEEE Transactions on Multimedia 22, 6 (2020), 1423--1432.

[92]

Bingjie Xu, Yongkang Wong, Junnan Li, Qi Zhao, and Mohan S. Kankanhalli. 2019. Learning to Detect Human-Object Interactions With Knowledge. In CVPR. 2019--2028.

[93]

Chunpu Xu, Min Yang, Chengming Li, Ying Shen, Xiang Ao, and Ruifeng Xu. 2021. Imagine, Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning. In AAAI. 3022--3029.

[94]

Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In CVPR. 5288--5296.

[95]

Ning Xu, An-An Liu, Yongkang Wong, Yongdong Zhang, Weizhi Nie, Yuting Su, and Mohan S. Kankanhalli. 2019. Dual-Stream Recurrent Neural Network for Video Captioning. IEEE Transactions on Circuits and Systems for Video Technology 29, 8 (2019), 2482--2493.

[96]

Ziwei Xu, Xudong Shen, Yongkang Wong, and Mohan S. Kankanhalli. 2021. Unsupervised Motion Representation Learning with Capsule Autoencoders. In NeurIPS. 3205--3217.

[97]

Su Yan, Xin Chen, Ran Huo, Xu Zhang, and Leyu Lin. 2020. Learning to Build User-tag Profile in Recommendation System. In CIKM. 2877--2884.

[98]

Fan Yang, Xin Chang, Sakriani Sakti, Yang Wu, and Satoshi Nakamura. 2021. ReMOT: A model-agnostic refinement for multiple object tracking. Image and Vision Computing 106 (2021), 104091.

[99]

Xu Yang, Chongyang Gao, Hanwang Zhang, and Jianfei Cai. 2020. Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. In ACM Multimedia. 4181--4189.

[100]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS. 5754--5764.

[101]

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. 2015. Describing Videos by Exploiting Temporal Structure. In ICCV. 4507--4515.

[102]

Ting Yao, Tao Mei, and Yong Rui. 2016. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization. In CVPR. 982--990.

[103]

Serena Yeung, Alireza Fathi, and Li Fei-Fei. 2014. In VideoSET: Video Summary Evaluation through Text. CVPR Workshop.

[104]

Yawen Zeng, Da Cao, Xiaochi Wei, Meng Liu, Zhou Zhao, and Zheng Qin. 2021. Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval. In CVPR. 2215--2224.

[105]

Xiaohua Zhai, Yuxin Peng, and Jianguo Xiao. 2013. Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval. In AAAI. 1198--1204.

[106]

Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, and Larry S. Davis. 2019. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. In CVPR. 1247--1257.

[107]

Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, and Zheng-Jun Zha. 2020. Object Relational Graph With Teacher-Recommended Learning for Video Captioning. In CVPR. 13278--13288.

[108]

Ke Zhou, Shuang-Hong Yang, and Hongyuan Zha. 2011. Functional matrix factorizations for cold-start recommendation. In SIGIR. 315--324.

[109]

Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering Temporal Context for Video Question and Answering. In International Journal of Computer Vision, Vol. 124. Springer, 409--421.

Cited By

Panda RMishra AMohanty A(2024)Innovating RealityThe Pioneering Applications of Generative AI10.4018/979-8-3693-3278-8.ch004(85-105)Online publication date: 28-Jun-2024
https://doi.org/10.4018/979-8-3693-3278-8.ch004
Avcı E(2024)Akıllı Şehirler için Üretken Yapay Zeka Kavramsal ÇerçevesiKent Akademisi10.35674/kent.1490925Online publication date: 16-Sep-2024
https://doi.org/10.35674/kent.1490925
Świerczyńska-Kaczor U(2024)Empirical insights into traditional and AI enhanced interactive narratives based on children’s fablesJournal of Economics and Management10.22367/jem.2024.46.0246(25-54)Online publication date: 2024
https://doi.org/10.22367/jem.2024.46.02
Show More Cited By

Index Terms

Compute to Tell the Tale: Goal-Driven Narrative Generation

Recommendations

Narrative Dataset: Towards Goal-Driven Narrative Generation
NarSUM '22: Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos

In this paper, we propose a new dataset called the Narrative dataset, which is a work in progress, towards generating video and text narratives of complex daily events from long videos, captured from multiple cameras. As most of the existing datasets ...
Narrative Graph for Narrative Generation from Long Videos
NarSUM '23: Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos

Advancements in camera technology and cloud storage have led to a surge in video content creation, making videos more accessible. However, consuming raw, unprocessed, and lengthy videos can be unengaging. While videos with human-authored narratives (...
Plan-based narrative generation with coordinated subplots
ECAI'16: Proceedings of the Twenty-second European Conference on Artificial Intelligence

Despite recent progress in plan-based narrative generation, one major limitation is that systems tend to produce a single plotline whose progression entirely determines the narrative experience. However, for certain narrative genres such as serial ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation Singapore

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,168
Total Downloads

Downloads (Last 12 months)561
Downloads (Last 6 weeks)44

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Panda RMishra AMohanty A(2024)Innovating RealityThe Pioneering Applications of Generative AI10.4018/979-8-3693-3278-8.ch004(85-105)Online publication date: 28-Jun-2024
https://doi.org/10.4018/979-8-3693-3278-8.ch004
Avcı E(2024)Akıllı Şehirler için Üretken Yapay Zeka Kavramsal ÇerçevesiKent Akademisi10.35674/kent.1490925Online publication date: 16-Sep-2024
https://doi.org/10.35674/kent.1490925
Świerczyńska-Kaczor U(2024)Empirical insights into traditional and AI enhanced interactive narratives based on children’s fablesJournal of Economics and Management10.22367/jem.2024.46.0246(25-54)Online publication date: 2024
https://doi.org/10.22367/jem.2024.46.02
Li HZhou YGu XLi BWang WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Diversified Semantic Distribution Matching for Dataset DistillationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680900(7542-7550)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680900
Yamazaki SLiu JKankanhalli MKankanhalli MPatras ILiu JWong YKomamizu T(2023)Sequential Action Retrieval for Generating Narratives from Long VideosProceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos10.1145/3607540.3617143(25-29)Online publication date: 29-Oct-2023
https://doi.org/10.1145/3607540.3617143
Hussain M(2023)When, Where, and Which?: Navigating the Intersection of Computer Vision and Generative AI for Strategic Business IntegrationIEEE Access10.1109/ACCESS.2023.333246811(127202-127215)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3332468
Fui-Hoon Nah FZheng RCai JSiau KChen L(2023)Generative AI and ChatGPT: Applications, challenges, and AI-human collaborationJournal of Information Technology Case and Application Research10.1080/15228053.2023.223381425:3(277-304)Online publication date: 21-Jul-2023
https://doi.org/10.1080/15228053.2023.2233814

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents