skip to main content
10.1145/3472306.3478333acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
extended-abstract

Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech

Published: 14 September 2021 Publication History

Abstract

We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to generate gestures that are both diverse and representational. Follow-ups and more information can be found on the project page: https://svito-zar.github.io/speech2properties2gestures/

References

[1]
Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, and Louis-Philippe Morency. 2020. No gestures left behind: Learning relationships between spoken language and freeform gestures. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1884--1895.
[2]
Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-controllable speech-driven gesture synthesis using normalising flows. Computer Graphics Forum 39, 2 (2020), 487--496.
[3]
Kirsten Bergmann and Stefan Kopp. 2006. Verbal or Visual? How Information is Distributed across Speech and Gesture in Spatial Dialog. In Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue. 90--97.
[4]
Kirsten Bergmann and Stefan Kopp. 2009. GNetIc-Using Bayesian decision networks for iconic gesture generation. In International Workshop on Intelligent Virtual Agents. Springer, 76--89.
[5]
Kirsten Bergmann and Manuela Macedonia. 2013. A virtual agent as vocabulary trainer: iconic gestures help to improve learners' memory performance. In Proceedings of the International Workshop on Intelligent Virtual Agents. 139--148.
[6]
Justine Cassell, Catherine Pelachaud, Norman Badler, Mark Steedman, Brett Achorn, Tripp Becket, Brett Douville, Scott Prevost, and Matthew Stone. 1994. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques. 413--420.
[7]
Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2020. Understanding the predictability of gesture parameters from speech and their perceptual importance. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Article 19, 8 pages.
[8]
Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics 39 (2020), 236:1-236:14.
[9]
Adam Kendon. 2004. Gesture: Visible Action as Utterance. Cambridge University Press.
[10]
Mark L. Knapp, Judith A. Hall, and Terrence G. Horgan. 2013. Nonverbal Communication in Human Interaction. Wadsworth, Cengage Learning.
[11]
Ivan Kobyzev, Simon Prince, and Marcus Brubaker. 2020. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[12]
Stefan Kopp and Ipke Wachsmuth. 2004. Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15, 1 (2004), 39--52.
[13]
Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, and Hedvig Kjellström. 2021. Moving fast and slow: Analysis of representations and postprocessing in speech-driven automatic gesture generation. International Journal of Human-Computer Interaction (2021), 1--17.
[14]
Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In Proceedings of the ACM International Conference on Multimodal Interaction. 242--250.
[15]
Taras Kucherenko, Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, and Gustav Eje Henter. 2021. A large, crowdsourced evaluation of gesture generation systems on common Data: The GENEA Challenge 2020. In Proceedings of the 26th International Conference on Intelligent User Interfaces. 11--21.
[16]
Taras Kucherenko, Rajmund Nagy, Michael Neff, Hedvig Kjellström, and Gustav Eje Henter. 2021. Multimodal analysis of the predictability of hand-gesture properties. arXiv preprint (2021).
[17]
Andy Lücking, Kirsten Bergman, Florian Hahn, Stefan Kopp, and Hannes Rieser. 2013. Data-based analysis of speech and gesture: The Bielefeld Speech and Gesture Alignment Corpus (SaGA) and its applications. Journal on Multimodal User Interfaces 7, 1 (2013), 5--18.
[18]
Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 25--35.
[19]
David McNeill. 1992. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press.
[20]
Michael Neff, Michael Kipp, Irene Albrecht, and Hans-Peter Seidel. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics 27, 1, Article 5 (2008), 24 pages.
[21]
Victor Ng-Thow-Hing, Pengcheng Luo, and Sandra Okita. 2010. Synchronized gesture and speech production for humanoid robots. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. 4617--4624.
[22]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing.
[23]
Carolyn Saund, Andrei Bîrlădeanu, and Stacy Marsella. 2021. CMCF: An architecture for realtime gesture generation by clustering gestures by motion and communicative function. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems. 1136--1144.
[24]
Yanxiang Wu, Sabarish V. Babu, Rowan Armstrong, Jeffrey W. Bertrand, Jun Luo, Tania Roy, Shaundra B. Daily, Lauren Cairco Dukes, Larry F. Hodges, and Tracy Fasolino. 2014. Effects of virtual human animation on emotion contagion in simulated inter-personal experiences. IEEE Transactions on Visualization and Computer Graphics 20, 4 (2014), 626--635.
[25]
Yiming Yang and Xin Liu. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 42--49.
[26]
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics 39 (2020), 222:1-222:16.
[27]
Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation. 4303--4309.
[28]
Fajrian Yunus, Chloé Clavel, and Catherine Pelachaud. 2019. Gesture Class Prediction by Recurrent Neural Network and Attention Mechanism. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 233--235.
[29]
Fajrian Yunus, Chloé Clavel, and Catherine Pelachaud. 2021. Sequence-to-Sequence Predictive Model: From Prosody to Communicative Gestures. In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior. Springer International Publishing, Cham, 355--374.

Cited By

View all

Index Terms

  1. Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents
      September 2021
      238 pages
      ISBN:9781450386197
      DOI:10.1145/3472306
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 September 2021

      Check for updates

      Badges

      • Honorable Mention Short Paper

      Author Tags

      1. gesture generation
      2. representational gestures
      3. virtual agents

      Qualifiers

      • Extended-abstract
      • Research
      • Refereed limited

      Funding Sources

      • Wallenberg AI, Autonomous Systems and Software Program
      • Swedish Foundation forStrategic Research

      Conference

      IVA '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 53 of 196 submissions, 27%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)35
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 06 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media