research-article

Open access

VideoMap: Supporting Video Exploration, Brainstorming, and Prototyping in the Latent Space

Authors:

David Chuan-En Lin,

Fabian Caba Heilbron,

Joon-Young Lee,

Nikolas MartelaroAuthors Info & Claims

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

Pages 311 - 327

https://doi.org/10.1145/3635636.3656192

Published: 23 June 2024 Publication History

All formats PDF

Abstract

Video editing is a creative and complex endeavor and we believe that there is potential for reimagining a new video editing interface to better support the creative and exploratory nature of video editing. We take inspiration from latent space exploration tools that help users find patterns and connections within complex datasets. We present VideoMap, a proof-of-concept video editing interface that operates on video frames projected onto a latent space. We support intuitive navigation through map-inspired navigational elements and facilitate transitioning between different latent spaces through swappable lenses. We built three VideoMap components to support editors in three common video tasks. In a user study with both professionals and non-professionals, editors found that VideoMap helps reduce grunt work, offers a user-friendly experience, provides an inspirational way of editing, and effectively supports the exploratory nature of video editing. We further demonstrate the versatility of VideoMap by implementing three extended applications. For interactive examples, we invite you to visit our project page: https://chuanenlin.com/videomap.

References

[1]

2023. Descript — All-in-one video & podcast editing, easy as a doc.Retrieved March 10, 2023 from https://www.descript.com

[2]

2023. Get directions and show routes. Retrieved March 10, 2023 from https://support.google.com/maps/answer/144339

[3]

2023. Match Cuts and Creative Transitions with Examples — Editing Techniques. Retrieved March 10, 2023 from https://www.studiobinder.com/blog/match-cuts-creative-transitions-examples

[4]

2023. Reflections on Foundation Models. Retrieved August 15, 2023 from https://hai.stanford.edu/news/reflections-foundation-models

[5]

2023. Surprising Facts on The History of Video Editing. Retrieved March 10, 2023 from https://www.videoeditinginstitute.com/surprising-facts-on-the-history-of-video-editing

[6]

2023. Type Studio — Edit Your Video By Editing Text. Retrieved March 10, 2023 from https://www.typestudio.co

[7]

2023. Upwork. Retrieved March 10, 2023 from https://www.upwork.com

[8]

2023. Working in the Project panel. Retrieved March 10, 2023 from https://helpx.adobe.com/premiere-pro/using/customizing-project-panel.html

[9]

Hervé Abdi and Lynne J Williams. 2010. Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2, 4 (2010), 433–459.

Digital Library

[10]

Angie Boggust, Brandon Carter, and Arvind Satyanarayan. 2022. Embedding comparator: Visualizing differences in global structure and local neighborhoods via small multiples. In 27th international conference on intelligent user interfaces. 746–766.

Digital Library

[11]

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).

[12]

Diogo Cabral and Nuno Correia. 2012. Videoink: a pen-based approach for video editing. In Adjunct proceedings of the 25th annual ACM symposium on User interface software and technology. 67–68.

Digital Library

[13]

Diogo Cabral and Nuno Correia. 2017. Video editing with pen-based technology. Multimedia tools and applications 76 (2017), 6889–6914.

[14]

Juan Casares, A Chris Long, Brad A Myers, Rishi Bhatnagar, Scott M Stevens, Laura Dabbish, Dan Yocum, and Albert Corbett. 2002. Simplifying video editing using metadata. In Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques. 157–166.

Digital Library

[15]

Renan G Cattelan, Cesar Teixeira, Rudinei Goularte, and Maria Da Graça C Pimentel. 2008. Watch-and-comment as a paradigm toward ubiquitous interactive video editing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 4, 4 (2008), 1–24.

Digital Library

[16]

Minsuk Chang, Léonore V Guillain, Hyeungshik Jung, Vivian M Hare, Juho Kim, and Maneesh Agrawala. 2018. Recipescape: An interactive tool for analyzing cooking instructions at scale. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–12.

Digital Library

[17]

Minsuk Chang, Mina Huh, and Juho Kim. 2021. Rubyslippers: Supporting content-based voice navigation for how-to videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.

Digital Library

[18]

Boris Chen, Amir Ziai, Rebecca S Tucker, and Yuchen Xie. 2023. Match Cutting: Finding Cuts with Smooth Visual Transitions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2115–2125.

[19]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

[20]

Peggy Chi, Nathan Frey, Katrina Panovich, and Irfan Essa. 2021. Automatic Instructional Video Creation from a Markdown-Formatted Tutorial. In The 34th Annual ACM Symposium on User Interface Software and Technology. 677–690.

Digital Library

[21]

Peggy Chi, Zheng Sun, Katrina Panovich, and Irfan Essa. 2020. Automatic video creation from a web page. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 279–292.

Digital Library

[22]

Pei-Yu Chi, Joyce Liu, Jason Linder, Mira Dontcheva, Wilmot Li, and Bjoern Hartmann. 2013. Democut: generating concise instructional videos for physical demonstrations. In Proceedings of the 26th annual ACM symposium on User interface software and technology. 141–150.

Digital Library

[23]

Brock Craft and Paul Cairns. 2005. Beyond guidelines: what can we learn from the visual information seeking mantra?. In Ninth International Conference on Information Visualisation (IV’05). IEEE, 110–118.

Digital Library

[24]

Hai Dang and Daniel Buschek. 2021. GestureMap: Supporting Visual Analytics and Quantitative Analysis of Motion Elicitation Data by Learning 2D Embeddings. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–12.

Digital Library

[25]

Klaus Eckelt, Andreas Hinterreiter, Patrick Adelberger, Conny Walchshofer, Vaishali Dhanoa, Christina Humer, Moritz Heckmann, Christian Steinparz, and Marc Streit. 2022. Visual exploration of relationships and structure in low-dimensional embeddings. IEEE Transactions on Visualization and Computer Graphics (2022).

[26]

Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. 2019. Text-based editing of talking-head video. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–14.

Digital Library

[27]

Andreas Girgensohn, John Boreczky, Patrick Chiu, John Doherty, Jonathan Foote, Gene Golovchinsky, Shingo Uchihashi, and Lynn Wilcox. 2000. A semi-automatic approach to home video editing. In Proceedings of the 13th annual ACM symposium on User interface software and technology. 81–89.

Digital Library

[28]

Dan B Goldman, Chris Gonterman, Brian Curless, David Salesin, and Steven M Seitz. 2008. Video object annotation, navigation, and composition. In Proceedings of the 21st annual ACM symposium on User interface software and technology. 3–12.

Digital Library

[29]

Nicolas Grossmann, Eduard Gröller, and Manuela Waldner. 2022. Concept splatters: Exploration of latent spaces based on human interpretable concepts. Computers & Graphics 105 (2022), 73–84.

Digital Library

[30]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[31]

Marius Hogräfer, Magnus Heitzler, and Hans-Jörg Schulz. 2020. The state of the art in map-like visualization. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 647–674.

[32]

Xian-Sheng Hua, Lie Lu, and Hong-Jiang Zhang. 2003. AVE: automated home video editing. In Proceedings of the eleventh ACM international conference on Multimedia. 490–497.

Digital Library

[33]

Chong Huang, Chuan-En Lin, Zhenyu Yang, Yan Kong, Peng Chen, Xin Yang, and Kwang-Ting Cheng. 2019. Learning to film from professional human motion videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4244–4253.

[34]

Yuzhong Huang, Xue Bai, Oliver Wang, Fabian Caba, and Aseem Agarwala. 2021. Learning Where to Cut from Edited Videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3215–3223.

[35]

Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, and Gautham J Mysore. 2019. B-script: Transcript-based B-roll video editing with recommendations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.

Digital Library

[36]

Edwin L Hutchins, James D Hollan, and Donald A Norman. 1985. Direct manipulation interfaces. Human–computer interaction 1, 4 (1985), 311–338.

Digital Library

[37]

Dan Jackson, James Nicholson, Gerrit Stoeckigt, Rebecca Wrobel, Anja Thieme, and Patrick Olivier. 2013. Panopticon: A parallel video overview system. In proceedings of the 26th annual ACM symposium on User interface software and technology. 123–130.

Digital Library

[38]

Thorsten Karrer, Malte Weiss, Eric Lee, and Jan Borchers. 2008. Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 247–250.

Digital Library

[39]

Jeongyeon Kim, Daeun Choi, Nicole Lee, Matt Beane, and Juho Kim. 2023. Surch: Enabling Structural Search and Comparison for Surgical Videos. (2023).

[40]

Don Kimber, Tony Dunnigan, Andreas Girgensohn, Frank Shipman, Thea Turner, and Tao Yang. 2007. Trailblazing: Video playback control by direct object manipulation. In 2007 IEEE International Conference on Multimedia and Expo. IEEE, 1015–1018.

[41]

Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational video editing for dialogue-driven scenes.ACM Trans. Graph. 36, 4 (2017), 130–1.

Digital Library

[42]

Mackenzie Leake, Hijung Valentina Shin, Joy O Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. In CHI, Vol. 20. 25–30.

[43]

Clayton Lewis. 1982. Using the" thinking-aloud" method in cognitive interface design. IBM TJ Watson Research Center Yorktown Heights, NY.

[44]

Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, and Xiaojuan Ma. 2018. Embeddingvis: A visual analytics approach to comparative network embedding inspection. In 2018 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 48–59.

[45]

David Chuan-En Lin, Anastasis Germanidis, Cristóbal Valenzuela, Yining Shi, and Nikolas Martelaro. 2023. Soundify: Matching sound effects to video. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–13.

[46]

David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, and Nikolas Martelaro. 2024. Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior. In Proceedings of the 16th Conference on Creativity and Cognition.

[47]

David Chuan-En Lin and Nikolas Martelaro. 2024. Jigsaw: Supporting Designers to Prototype Multimodal Applications by Assembling AI Foundation Models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems.

[48]

Shusen Liu, Peer-Timo Bremer, Jayaraman J Thiagarajan, Vivek Srikumar, Bei Wang, Yarden Livnat, and Valerio Pascucci. 2017. Visual exploration of semantic relationships in neural word embeddings. IEEE transactions on visualization and computer graphics 24, 1 (2017), 553–562.

[49]

Yang Liu, Eunice Jun, Qisheng Li, and Jeffrey Heer. 2019. Latent space cartography: Visual analysis of vector space embeddings. In Computer graphics forum, Vol. 38. Wiley Online Library, 67–78.

[50]

Bruce D Lucas and Takeo Kanade. 1981. An iterative image registration technique with an application to stereo vision. In IJCAI’81: 7th international joint conference on Artificial intelligence, Vol. 2. 674–679.

[51]

Kevin Lynch. 1984. Reconsidering the image of the city. Springer.

[52]

Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2013. Swifter: improved online video scrubbing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1159–1168.

Digital Library

[53]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

[54]

Allison Merz, Annie Hu, and Tracey Lin. 2018. ClipWorks: a tangible interface for collaborative video editing. In Proceedings of the 17th ACM Conference on Interaction Design and Children. 497–500.

Digital Library

[55]

Cuong Nguyen, Yuzhen Niu, and Feng Liu. 2013. Direct manipulation video navigation in 3D. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1169–1172.

Digital Library

[56]

Alejandro Pardo, Fabian Caba, Juan León Alcázar, Ali K Thabet, and Bernard Ghanem. 2021. Learning to cut by watching movies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6858–6868.

[57]

Amy Pavel, Dan B Goldman, Björn Hartmann, and Maneesh Agrawala. 2015. Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 181–190.

Digital Library

[58]

Amy Pavel, Colorado Reed, Björn Hartmann, and Maneesh Agrawala. 2014. Video digests: a browsable, skimmable format for informational lecture videos. In UIST, Vol. 10. Citeseer, 2642918–2647400.

[59]

Yi-Hao Peng, Jeffrey P Bigham, and Amy Pavel. 2021. Slidecho: Flexible non-visual exploration of presentation videos. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–12.

Digital Library

[60]

Nicola Pezzotti, Thomas Höllt, B Lelieveldt, Elmar Eisemann, and Anna Vilanova. 2016. Hierarchical stochastic neighbor embedding. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 21–30.

[61]

Suporn Pongnumkul, Jue Wang, Gonzalo Ramos, and Michael Cohen. 2010. Content-aware dynamic timeline for video browsing. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 139–142.

Digital Library

[62]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.

[63]

Karel Reisz and Gavin Millar. 1971. The technique of film editing. (1971).

[64]

Ben Shneiderman. 2003. The eyes have it: A task by data type taxonomy for information visualizations. In The craft of information visualization. Elsevier, 364–371.

[65]

Venkatesh Sivaraman, Yiwei Wu, and Adam Perer. 2022. Emblaze: Illuminating machine learning representations through interactive comparison of embedding spaces. In 27th International Conference on Intelligent User Interfaces. 418–432.

Digital Library

[66]

Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B Viégas, and Martin Wattenberg. 2016. Embedding projector: Interactive visualization and interpretation of embeddings. arXiv preprint arXiv:1611.05469 (2016).

[67]

Tomas Sokoler, Håkan Edeholt, and Martin Johansson. 2002. VideoTable: a tangible interface for collaborative exploration of video material during design sessions. In CHI’02 Extended Abstracts on Human Factors in Computing Systems. 656–657.

[68]

Robert Thorndike. 1953. Who belongs in the family?Psychometrika 18, 4 (1953), 267–276.

[69]

Ruo-Feng Tong, Yun Zhang, and Meng Ding. 2011. Video brush: A novel interface for efficient video cutout. In Computer Graphics Forum, Vol. 30. Wiley Online Library, 2049–2057.

[70]

Yoshinobu Tonomura, Akihito Akutsu, Kiyotaka Otsuji, and Toru Sadakata. 1993. Videomap and videospaceicon: Tools for anatomizing video content. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems. 131–136.

Digital Library

[71]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research 9, 11 (2008).

[72]

Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, and Raj Sodhi. 2024. LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing. arXiv preprint arXiv:2402.10294 (2024).

[73]

Kevin Wang, Deva Ramanan, and Aayush Bansal. 2021. Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos. arXiv preprint arXiv:2103.17261 (2021).

[74]

Miao Wang, Guo-Wei Yang, Shi-Min Hu, Shing-Tung Yau, Ariel Shamir, 2019. Write-a-video: computational video montage from themed text.ACM Trans. Graph. 38, 6 (2019), 177–1.

Digital Library

[75]

Tinghuai Wang, Andrew Mansfield, Rui Hu, and John P Collomosse. 2009. An evolutionary approach to automatic video editing. In 2009 Conference for Visual Media Production. IEEE, 127–134.

Digital Library

[76]

Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: adding visuals to audio travel podcasts. In Proceedings of the 33rd annual ACM symposium on user interface software and technology. 735–746.

Digital Library

[77]

Haojin Yang and Christoph Meinel. 2014. Content based lecture video retrieval using speech and video text information. IEEE transactions on learning technologies 7, 2 (2014), 142–154.

[78]

Saelyne Yang, Jisu Yim, Aitolkyn Baigutanova, Seoyoung Kim, Minsuk Chang, and Juho Kim. 2022. SoftVideo: Improving the Learning Experience of Software Tutorial Videos with Collective Interaction Data. In 27th International Conference on Intelligent User Interfaces. 646–660.

[79]

Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, and Hui Xiong. 2018. Dynamic word embeddings for evolving semantic discovery. In Proceedings of the eleventh acm international conference on web search and data mining. 673–681.

Digital Library

[80]

Jamie Zigelbaum, Michael S Horn, Orit Shaer, and Robert JK Jacob. 2007. The tangible video editor: collaborative video editing with active tokens. In Proceedings of the 1st international Conference on Tangible and Embedded interaction. 43–46.

Digital Library

Index Terms

VideoMap: Supporting Video Exploration, Brainstorming, and Prototyping in the Latent Space
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Browsing the Latent Space: A New Approach to Interactive Design Exploration for Volumetric Generative Systems
C&C '23: Proceedings of the 15th Conference on Creativity and Cognition

We introduce an interactive tool called "Browsing the Latent Space" (BLS), which explores the potential of latent space visualization and interpolation in 3D generative systems for design space exploration. By visualizing and interpreting the abstract ...
Brainstorming and Beyond: A User-Centered Design Method
Participatory video prototyping
CHI '92: Posters and Short Talks of the 1992 SIGCHI Conference on Human Factors in Computing Systems

At U S WEST Advanced Technologies, we have been using a method we call Video Prototyping to simulate interface ideas for over a dozen software design projects. It is similar to the work described by Curtis and Vertelney [1], because we create a simple ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

June 2024

718 pages

ISBN:9798400704857

DOI:10.1145/3635636

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

C&C '24

Sponsor:

SIGCHI

C&C '24: Creativity and Cognition

June 23 - 26, 2024

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 108 of 371 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
363
Total Downloads

Downloads (Last 12 months)363
Downloads (Last 6 weeks)80

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents