extended-abstract

Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech

Authors:

Taras Kucherenko,

Hedvig Kjellström,

Gustav Eje HenterAuthors Info & Claims

IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents

Pages 145 - 147

https://doi.org/10.1145/3472306.3478333

Published: 14 September 2021 Publication History

Abstract

We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to generate gestures that are both diverse and representational. Follow-ups and more information can be found on the project page: https://svito-zar.github.io/speech2properties2gestures/

References

[1]

Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, and Louis-Philippe Morency. 2020. No gestures left behind: Learning relationships between spoken language and freeform gestures. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1884--1895.

[2]

Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-controllable speech-driven gesture synthesis using normalising flows. Computer Graphics Forum 39, 2 (2020), 487--496.

[3]

Kirsten Bergmann and Stefan Kopp. 2006. Verbal or Visual? How Information is Distributed across Speech and Gesture in Spatial Dialog. In Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue. 90--97.

[4]

Kirsten Bergmann and Stefan Kopp. 2009. GNetIc-Using Bayesian decision networks for iconic gesture generation. In International Workshop on Intelligent Virtual Agents. Springer, 76--89.

Digital Library

[5]

Kirsten Bergmann and Manuela Macedonia. 2013. A virtual agent as vocabulary trainer: iconic gestures help to improve learners' memory performance. In Proceedings of the International Workshop on Intelligent Virtual Agents. 139--148.

[6]

Justine Cassell, Catherine Pelachaud, Norman Badler, Mark Steedman, Brett Achorn, Tripp Becket, Brett Douville, Scott Prevost, and Matthew Stone. 1994. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques. 413--420.

Digital Library

[7]

Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2020. Understanding the predictability of gesture parameters from speech and their perceptual importance. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Article 19, 8 pages.

Digital Library

[8]

Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics 39 (2020), 236:1-236:14.

Digital Library

[9]

Adam Kendon. 2004. Gesture: Visible Action as Utterance. Cambridge University Press.

[10]

Mark L. Knapp, Judith A. Hall, and Terrence G. Horgan. 2013. Nonverbal Communication in Human Interaction. Wadsworth, Cengage Learning.

[11]

Ivan Kobyzev, Simon Prince, and Marcus Brubaker. 2020. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[12]

Stefan Kopp and Ipke Wachsmuth. 2004. Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15, 1 (2004), 39--52.

Digital Library

[13]

Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, and Hedvig Kjellström. 2021. Moving fast and slow: Analysis of representations and postprocessing in speech-driven automatic gesture generation. International Journal of Human-Computer Interaction (2021), 1--17.

[14]

Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In Proceedings of the ACM International Conference on Multimodal Interaction. 242--250.

Digital Library

[15]

Taras Kucherenko, Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, and Gustav Eje Henter. 2021. A large, crowdsourced evaluation of gesture generation systems on common Data: The GENEA Challenge 2020. In Proceedings of the 26th International Conference on Intelligent User Interfaces. 11--21.

Digital Library

[16]

Taras Kucherenko, Rajmund Nagy, Michael Neff, Hedvig Kjellström, and Gustav Eje Henter. 2021. Multimodal analysis of the predictability of hand-gesture properties. arXiv preprint (2021).

[17]

Andy Lücking, Kirsten Bergman, Florian Hahn, Stefan Kopp, and Hannes Rieser. 2013. Data-based analysis of speech and gesture: The Bielefeld Speech and Gesture Alignment Corpus (SaGA) and its applications. Journal on Multimodal User Interfaces 7, 1 (2013), 5--18.

[18]

Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 25--35.

Digital Library

[19]

David McNeill. 1992. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press.

[20]

Michael Neff, Michael Kipp, Irene Albrecht, and Hans-Peter Seidel. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics 27, 1, Article 5 (2008), 24 pages.

Digital Library

[21]

Victor Ng-Thow-Hing, Pengcheng Luo, and Sandra Okita. 2010. Synchronized gesture and speech production for humanoid robots. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. 4617--4624.

[22]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing.

[23]

Carolyn Saund, Andrei Bîrlădeanu, and Stacy Marsella. 2021. CMCF: An architecture for realtime gesture generation by clustering gestures by motion and communicative function. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems. 1136--1144.

Digital Library

[24]

Yanxiang Wu, Sabarish V. Babu, Rowan Armstrong, Jeffrey W. Bertrand, Jun Luo, Tania Roy, Shaundra B. Daily, Lauren Cairco Dukes, Larry F. Hodges, and Tracy Fasolino. 2014. Effects of virtual human animation on emotion contagion in simulated inter-personal experiences. IEEE Transactions on Visualization and Computer Graphics 20, 4 (2014), 626--635.

Digital Library

[25]

Yiming Yang and Xin Liu. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 42--49.

Digital Library

[26]

Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics 39 (2020), 222:1-222:16.

Digital Library

[27]

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation. 4303--4309.

[28]

Fajrian Yunus, Chloé Clavel, and Catherine Pelachaud. 2019. Gesture Class Prediction by Recurrent Neural Network and Attention Mechanism. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 233--235.

Digital Library

[29]

Fajrian Yunus, Chloé Clavel, and Catherine Pelachaud. 2021. Sequence-to-Sequence Predictive Model: From Prosody to Communicative Gestures. In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior. Springer International Publishing, Cham, 355--374.

Cited By

Grondin-Verdon MCaillat DOuni S(2024)Qualitative study of gesture annotation corpus : Challenges and perspectivesCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688820(147-155)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688820
Chen BLi YDing YShao TZhou KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680847(6774-6783)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680847
Zhang ZAo TZhang YGao QLin CChen BLiu L(2024)Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisACM Transactions on Graphics10.1145/365813443:4(1-17)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658134
Show More Cited By

Index Terms

Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech
1. Computing methodologies
  1. Computer graphics
    1. Animation
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Analyzing Input and Output Representations for Speech-Driven Gesture Generation
IVA '19: Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents

This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-...
Gesticulator: A framework for semantically-aware speech-driven gesture generation
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture ...
Multimodal Analysis of the Predictability of Hand-gesture Properties
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents

September 2021

238 pages

ISBN:9781450386197

DOI:10.1145/3472306

Copyright © 2021 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2021

Check for updates

Badges

Honorable Mention Short Paper

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Funding Sources

Wallenberg AI, Autonomous Systems and Software Program
Swedish Foundation forStrategic Research

Conference

IVA '21

Sponsor:

SIGAI

IVA '21: ACM International Conference on Intelligent Virtual Agents

September 14 - 17, 2021

Virtual Event, Japan

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
309
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)9

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Grondin-Verdon MCaillat DOuni S(2024)Qualitative study of gesture annotation corpus : Challenges and perspectivesCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688820(147-155)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688820
Chen BLi YDing YShao TZhou KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680847(6774-6783)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680847
Zhang ZAo TZhang YGao QLin CChen BLiu L(2024)Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisACM Transactions on Graphics10.1145/365813443:4(1-17)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658134
Kucherenko TWolfert PYoon YViegas CNikolov TTsakov MHenter G(2024)Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022ACM Transactions on Graphics10.1145/365637443:3(1-28)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3656374
Mughal MDabral RHabibie IDonatelli LHabermann MTheobalt C(2024)ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00138(1388-1398)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00138
Ao TZhang ZLiu L(2023)GestureDiffuCLIP: Gesture Diffusion Model with CLIP LatentsACM Transactions on Graphics10.1145/359209742:4(1-18)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592097
Voß HKopp SLugrin BLatoschik Mvon Mammen SKopp SPécune FPelachaud C(2023)Augmented Co-Speech Gesture GenerationProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents10.1145/3570945.3607337(1-8)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1145/3570945.3607337
Zhou YShimada N(2023)Vision + Language Applications: A Survey2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00090(826-842)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00090
Wu BLiu CIshi CShi JIshiguro H(2023)Extrovert or Introvert? GAN-Based Humanoid Upper-Body Gesture Generation for Different ImpressionsInternational Journal of Social Robotics10.1007/s12369-023-01051-8Online publication date: 11-Oct-2023
https://doi.org/10.1007/s12369-023-01051-8
Sevilla-Salcedo JFernández-Rodicio ECastillo JCastro-González ÁSalichs M(2023)GERT: Transformers for Co-speech Gesture Prediction in Social RobotsSocial Robotics10.1007/978-981-99-8715-3_8(80-93)Online publication date: 3-Dec-2023
https://dl.acm.org/doi/10.1007/978-981-99-8715-3_8
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents