skip to main content
10.5555/3463952.3464084acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

CMCF: An Architecture for Realtime Gesture Generation by Clustering Gestures by Motion and Communicative Function

Published: 03 May 2021 Publication History

Abstract

Gestures augment speech by performing a variety of communicative functions in humans and virtual agents, and are often related to speech by complex semantic, rhetorical, prosodic, and affective elements. In this paper we briefly present an architecture for human-like gesturing in virtual agents that is designed to realize complex speech-to-gesture mappings by exploiting existing machine-learning based parsing tools and techniques to extract these functional elements from speech. We then deeply explore the rhetorical branch of this architecture, objectively assessing specifically whether existing rhetorical parsing techniques can classify gestures into classes with distinct movement properties. To do this, we take a corpus of spontaneously generated gestures and correlate their movement to co-speech utterances. We cluster gestures based on their rhetorical properties, and then by their movement. Our objective analysis suggests that some rhetorical structures are identifiable by our movement features while others require further exploration. We explore possibilities behind these findings and propose future experiments that may further reveal nuances of the richness of the mapping between speech and motion. This work builds towards a real-time gesture generator which performs gestures that effectively convey rich communicative functions.

References

[1]
James Allen, Myroslava O Dzikovska, Mehdi Manshadi, and Mary Swift. 2007. Deep linguistic processing for spoken dialogue systems. In ACL 2007 Workshop on Deep Linguistic Processing. 49--56.
[2]
Janet Beavin Bavelas. 1994. Gestures as part of speech: Methodological implications. Research on language and social interaction 27, 3 (1994), 201--221.
[3]
Geneviève Calbris. 2011.Elements of meaning in gesture. Vol. 5. John Benjamins Publishing.
[4]
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence(2019).
[5]
Justine Cassell, Hannes Högni Vilhjálmsson, and Timothy Bickmore. 2004. Beat: the behavior expression animation toolkit. In Life-Like Characters. Springer, 163--185.
[6]
Eugene Charniak. 2000. A maximum-entropy-inspired parser. In1st Meeting of the North American Chapter of the Association for Computational Linguistics.
[7]
Chung-Cheng Chiu and Stacy Marsella. 2011. How to train your avatar: A data driven approach to gesture generation. In International Workshop on Intelligent Virtual Agents. Springer, 127--140.
[8]
Chung-Cheng Chiu and Stacy Marsella. 2014. Gesture generation with low-dimensional embeddings. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 781--788.
[9]
Chung-Cheng Chiu, Louis-Philippe Morency, and Stacy Marsella. 2015. Predicting co-verbal gestures: a deep and temporal modeling approach. In International Conference on Intelligent Virtual Agents. Springer, 152--166.
[10]
Alan J Cienki and Jean-Pierre Koenig. 1998. Metaphoric gestures and some of their relations to verbal metaphoric expressions. Discourse and cognition: Bridging the gap(1998), 189--204.
[11]
Cathy Ennis, Rachel McDonnell, and Carol O'Sullivan. 2010. Seeing is believing: body motion dominates in multisensory conversations. ACM Transactions on Graphics (TOG)29, 4 (2010), 1--9.
[12]
Ylva Ferstl and Rachel McDonnell. 2018. Investigating the use of recurrent motion modelling for speech gesture generation. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 93--98.
[13]
Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning individual styles of conversational gesture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3497--3506.
[14]
Susan Goldin-Meadow, Howard Nusbaum, Spencer D Kelly, and Susan Wagner. 2001. Explaining math: Gesturing lightens the load. Psychological science 12, 6(2001), 516--522.
[15]
Joseph Grady. 1997. Foundations of meaning: Primary metaphors and primary scenes. (1997).
[16]
Matthew Honnibal and Mark Johnson. 2015. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 conference on empirical methods in natural language processing. 1373--1378.
[17]
Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media.
[18]
Shafiq Joty, Giuseppe Carenini, and Raymond T Ng. 2015. Codra: A novel discriminative framework for rhetorical analysis. Computational Linguistics 41, 3(2015), 385--435.
[19]
Adam Kendon. 1972. Some relationships between body motion and speech. Studies in dyadic communication 7, 177 (1972), 90.
[20]
Adam Kendon. 2000. Language and gesture: Unity or duality.Language and gesture 2 (2000).
[21]
Adam Kendon. 2004. Gesture: Visible action as utterance. Cambridge University Press.
[22]
Stefan Kopp, Brigitte Krenn, Stacy Marsella, Andrew N Marshall, Catherine Pelachaud, Hannes Pirker, Kristinn R Thórisson, and Hannes Vilhjálmsson. 2006. Towards a common framework for multimodal generation: The behavior markup language. In International workshop on intelligent virtual agents. Springer, 205--217.
[23]
Alex Lascarides and Matthew Stone. 2009. A formal semantic analysis of gesture. Journal of Semantics26, 4 (2009), 393--449.
[24]
Margaux Lhommet and Stacy C Marsella. 2013. Gesture with meaning. In International Conference on Intelligent Virtual Agents. Springer, 303--312.
[25]
William C Mann and Sandra A Thompson. 1987. Rhetorical structure theory: A theory of text organization. University of Southern California, Information Sciences Institute Los Angeles.
[26]
Daniel Marcu. 1997. The rhetorical parsing of unrestricted natural language texts. In35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. 96--103.
[27]
Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual Character Performance from Speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Anaheim, California)(SCA '13). ACM, New York, NY, USA, 25--35. https://doi.org/10.1145/2485895.2485900
[28]
Steven G McCafferty. 2004. Space for cognition: Gesture and second language learning. International Journal of Applied Linguistics14, 1 (2004), 148--165.
[29]
David McClosky, Eugene Charniak, and Mark Johnson. 2006. Reranking and self-training for parser adaptation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 337--344.
[30]
David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago press.
[31]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM38, 11 (1995), 39--41.
[32]
Michael Neff, Yingying Wang, Rob Abbott, and Marilyn Walker. 2010. Evaluating the effect of gesture and language on personality perception in conversational agents. In International Conference on Intelligent Virtual Agents. Springer, 222--235.
[33]
Radoslaw Niewiadomski, Elisabetta Bevacqua, Maurizio Mancini, and Catherine Pelachaud. 2009. Greta: An interactive expressive ECA system. Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 22, 1399--1400. https://doi.org/10.1145/1558109.1558314
[34]
Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Silvie Cinková, Dan Flickinger, Jan Hajic, and Zdenka Uresova. 2015. Semeval 2015 task 18: Broad-coverage semantic dependency parsing. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 915--926.
[35]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikitlearn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[36]
Frank E Pollick, Helena M Paterson, Armin Bruderlin, and Anthony J Sanford.2001. Perceiving affect from arm movement. Cognition 82, 2 (2001), B51--B61.
[37]
Brian Ravenet, Catherine Pelachaud, Chloé Clavel, and Stacy Marsella. 2018. Automating the production of communicative gestures in embodied characters. Frontiers in psychology 9 (2018), 1144.
[38]
Naushad UzZaman and James Allen. 2010. TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text. In Proceedings of the 5th International Workshop on Semantic Evaluation. 276--283.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
May 2021
1899 pages
ISBN:9781450383073

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 03 May 2021

Check for updates

Author Tags

  1. clustering
  2. communication
  3. gesture generation
  4. machine learning
  5. rhetorical parsing

Qualifiers

  • Research-article

Funding Sources

  • Royal Wolfson Society Award
  • UKRI Centre for Doctoral Training in Socially Intelligent Artificial Agents

Conference

AAMAS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media