research-article

A Study on the Use of Attention for Explaining Video Summarization

Authors:

Evlampios Apostolidis,

Vasileios Mezaris,

Ioannis PatrasAuthors Info & Claims

NarSUM '23: Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos

Pages 41 - 49

https://doi.org/10.1145/3607540.3617138

Published: 29 October 2023 Publication History

Get Access

Abstract

In this paper we present our study on the use of attention for explaining video summarization. We build on a recent work that formulates the task, called XAI-SUM, and we extend it by: a) taking into account two additional network architectures and b) introducing two novel explanation signals that relate to the entropy and diversity of attention weights. In total, we examine the effectiveness of seven types of explanation, using three state-of-the-art attention-based network architectures (CA-SUM, VASNet, SUM-GDA) and two datasets (SumMe, TVSum) for video summarization. The conducted evaluations show that the inherent attention weights are more suitable for explaining network architectures which integrate mechanisms for estimating attentive diversity (SUM-GDA) and uniqueness (CA-SUM). The explanation of simpler architectures (VASNet) can benefit from taking into account estimates about the strength of the input vectors, while another option is to consider the entropy of attention weights.

Supplementary Material

MP4 File (nars13-video.mp4)

In this video, we present our study on the use of attention for explaining the output of network architectures for video summarization. We start by discussing why explainable video summarization is important, and continue by briefly describing existing approaches for explaining network architectures dealing with the analysis of video data. Following, we present the applied methodology for evaluating the use of attention as explanation for video summarization, and report on the used network architectures, explanation signals and evaluation measures. Finally, we discuss the results of the conducted quantitative and qualitative evaluations, and outline the conclusions of our study.

Download
31.07 MB

References

[1]

Sathyanarayanan N. Aakur, Fillipe D. M. de Souza, and Sudeep Sarkar. 2018. An Inherently Explainable Model for Video Activity Interpretation. In The Workshops of the 32nd AAAI Conf. on Artificial Intelligence.

Abstract

Supplementary Material

References

Index Terms

Recommendations

Attention history-based attention for abstractive text summarization

Hierarchical Variational Network for User-Diversified & Query-Focused Video Summarization

Effective Video Summarization by Extracting Parameter-Free Motion Attention

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations