skip to main content
10.1145/3527188.3561941acmotherconferencesArticle/Chapter ViewAbstractPublication PageshaiConference Proceedingsconference-collections
research-article

Motion and Meaning: Data-Driven Analyses of The Relationship Between Gesture and Communicative Semantics

Published: 05 December 2022 Publication History

Abstract

Gestures convey critical information within social interactions. As such, the success of virtual agents (VA) in both building social relationships and achieving their goals is heavily dependent on the information conveyed within their gestures. Because of the precision required for effective gesture behavior, it is prudent to retain some designer control over these conversational gestures. However, in order to exercise that control practically we must first understand how gestural motion conveys meaning. One consideration in this relationship between motion and meaning is the notion of Ideational Units, meaning that only parts of a gesture’s motion at a point in time may convey meaning, while other parts may be held from the previous gesture. In this paper, we develop, demonstrate, and release a set of tools that help quantify the relationship between the semantics conveyed in a gesture’s co-speech utterance and the fine-grained motion of that gesture. This allows us to explore insights into the complex relationship between motion and meaning. In particular, we use spectral motion clustering to discern patterns of motion that tend to be associated with semantic concepts, on both an aggregate and individual-speaker level. We then discuss the potential for these tools to serve as a framework for both automated gesture generation and interpretation in virtual agents. These tools can ideally be used within approaches to automating VA gesture performances as well as serve as an analysis framework for fundamental gesture research.

References

[1]
Chaitanya Ahuja, Dong Won Lee, and Louis-Philippe Morency. 2022. Low-Resource Adaptation for Personalized Co-Speech Gesture Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20566–20576.
[2]
Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 487–496.
[3]
Martha W Alibali, Andrew G Young, Noelle M Crooks, Amelia Yeo, Matthew S Wolfgram, Iasmine M Ledesma, Mitchell J Nathan, Ruth Breckinridge Church, and Eric J Knuth. 2013. Students learn more when their teacher has learned to gesture effectively. Gesture 13, 2 (2013), 210–233.
[4]
James Allen, Hannah An, Ritwik Bose, Will de Beaumont, and Choh Man Teng. 2020. A broad-coverage deep semantic lexicon for verbs. arXiv preprint arXiv:2007.02670(2020). https://pypi.org/project/pytrips/
[5]
James Allen and Choh Man Teng. 2018. Putting semantics into semantic roles. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 235–244.
[6]
Ruth Aylett, Marco Vala, Pedro Sequeira, and Ana Paiva. 2007. Fearnot!–an emergent narrative approach to virtual dramas for anti-bullying education. In International Conference on Virtual Storytelling. Springer, 202–205.
[7]
Timothy W Bickmore, Laura M Pfeifer, Donna Byron, Shaula Forsythe, Lori E Henault, Brian W Jack, Rebecca Silliman, and Michael K Paasche-Orlow. 2010. Usability of conversational agents by patients with inadequate health literacy: evidence from two clinical trials. Journal of health communication 15, S2 (2010), 197–210.
[8]
Paul Bremner, Anthony G Pipe, Mike Fraser, Sriram Subramanian, and Chris Melhuish. 2009. Beat gesture generation rules for human-robot interaction. In RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 1029–1034.
[9]
Geneviève Calbris. 2011. Elements of meaning in gesture. Vol. 5. John Benjamins Publishing.
[10]
David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, 2014. SimSensei Kiosk: A virtual human interviewer for healthcare decision support. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. 1061–1068.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[12]
Marc Dupont and Pierre-François Marteau. 2015. Coarse-DTW: Exploiting Sparsity in Gesture Time Series. In AALTD@ PKDD/ECML.
[13]
Cathy Ennis, Rachel McDonnell, and Carol O’Sullivan. 2010. Seeing is believing: body motion dominates in multisensory conversations. ACM Transactions on Graphics (TOG) 29, 4 (2010), 1–9.
[14]
Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2021. ExpressGesture: Expressive gesture generation from speech through database matching. Computer Animation and Virtual Worlds 32, 3-4 (2021), e2016.
[15]
Peggy E Gallaher. 1992. Individual differences in nonverbal behavior: Dimensions of style.Journal of personality and social psychology 63, 1(1992), 133.
[16]
Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning individual styles of conversational gesture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3497–3506.
[17]
Joseph Grady. 1997. Foundations of meaning: Primary metaphors and primary scenes. (1997).
[18]
Jon Hall. 2004. Cicero and Quintilian on the oratorical use of hand gestures. The Classical Quarterly 54, 1 (2004), 143–160.
[19]
Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–14.
[20]
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7, 1 (2017), 411–420.
[21]
Seokmin Kang, Gregory L Hallman, Lisa K Son, and John B Black. 2013. The different benefits from different gestures in understanding a concept. Journal of Science Education and Technology 22, 6 (2013), 825–837.
[22]
Adam Kendon. 1972. Some relationships between body motion and speech. Studies in dyadic communication 7, 177 (1972), 90.
[23]
Adam Kendon. 1997. Gesture. Annual review of anthropology 26, 1 (1997), 109–128.
[24]
Adam Kendon. 2004. Gesture: Visible action as utterance. Cambridge University Press.
[25]
Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexandersson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In Proceedings of the 2020 International Conference on Multimodal Interaction. 242–250.
[26]
Taras Kucherenko, Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, and Gustav Eje Henter. 2021. A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020. In 26th international conference on intelligent user interfaces. 11–21.
[27]
Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S Srinivasa, and Yaser Sheikh. 2019. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 763–772.
[28]
Thomas Leonard and Fred Cummins. 2011. The temporal relation between beat gestures and speech. Language and Cognitive Processes 26, 10 (2011), 1457–1471.
[29]
Elena T Levy and David McNeill. 1992. Speech, gesture, and discourse. Discourse processes 15, 3 (1992), 277–301.
[30]
Margaux Lhommet and Stacy Marsella. 2016. From embodied metaphors to metaphoric gestures. In Proceedings of the Cognitive Science Society. http://www.ccs.neu.edu/ marsella/publications/pdf/16-Cogsci-Lhommet.pdf
[31]
Margot Lhommet, Yuyu Xu, and Stacy Marsella. 2015. Cerebella: automatic generation of nonverbal behavior for virtual humans. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 4303–4304.
[32]
Yuanzhi Liang, Qianyu Feng, Linchao Zhu, Li Hu, Pan Pan, and Yi Yang. 2022. SEEG: Semantic Energized Co-Speech Gesture Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10473–10482.
[33]
Fridanna Maricchiolo, Augusto Gnisci, Marino Bonaiuto, and Gianluca Ficca. 2009. Effects of different types of hand gestures in persuasive speech on receivers evaluations. LANGUAGE AND COGNITIVE PROCESSES 24 (02 2009), 239–266. https://doi.org/10.1080/01690960802159929
[34]
Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual Character Performance from Speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Anaheim, California) (SCA ’13). ACM, New York, NY, USA, 25–35. https://doi.org/10.1145/2485895.2485900
[35]
David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago press.
[36]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.
[37]
George A Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J Miller. 1990. Introduction to WordNet: An on-line lexical database. International journal of lexicography 3, 4 (1990), 235–244.
[38]
Michael Neff, Michael Kipp, Irene Albrecht, and Hans-Peter Seidel. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics (TOG) 27, 1 (2008), 1–24.
[39]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[40]
Brian Ravenet, Catherine Pelachaud, Chloé Clavel, and Stacy Marsella. 2018. Automating the production of communicative gestures in embodied characters. Frontiers in psychology 9 (2018).
[41]
Marek Rei, Luana Bulat, Douwe Kiela, and Ekaterina Shutova. 2017. Grasping the finer point: A supervised similarity network for metaphor detection. arXiv preprint arXiv:1709.00575(2017).
[42]
Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53–65.
[43]
Carolyn Saund, Andrei Birladeanu, and Stacy Marsella. 2021. CMCF: An architecture for realtime gesture generation by clustering gestures by motion and communicative function. (2021).
[44]
Carolyn Saund and Stacy Marsella. 2021. The Importance of Qualitative Elements in Subjective Evaluation of Semantic Gestures. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 1–8.
[45]
Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. 2022. MotionCLIP: Exposing Human Motion Generation to CLIP Space. arXiv preprint arXiv:2203.08063(2022).
[46]
Wannesm, Khendrickx, Aras Yurtman, Pieter Robberechts, Dany Vohl, Eric Ma, Gust Verbruggen, Marco Rossi, Mazhar Shaikh, Muhammad Yasirroni, Todd, Wojciech Zieliński, Toon Van Craenendonck, and Sai Wu. 2022. wannesm/dtaidistance: v2.3.5. Zenodo. https://doi.org/10.5281/ZENODO.5901139
[47]
Barack Obama with PBS News Hour. 2020. WATCH: Barack Obama’s full speech at the 2020 Democratic National Convention | 2020 DNC Night 3. In WATCH: Barack Obama’s full speech at the 2020 Democratic National Convention | 2020 DNC Night 3. PBS News Hour, Barack Obama’s full speech at the 2020 Democratic National Convention. https://www.youtube.com/watch?v=oaalF5y2P0k
[48]
Pieter Wolfert, Taras Kucherenko, Hedvig Kjellström, and Tony Belpaeme. 2019. Should beat gestures be learned or designed?: A benchmarking user study. In ICDL-EPIROB 2019 Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions. IEEE conference proceedings.
[49]
Yuyu Xu, Catherine Pelachaud, and Stacy Marsella. 2014. Compound gesture generation: a model based on ideational units. In International Conference on Intelligent Virtual Agents. Springer, 477–491.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HAI '22: Proceedings of the 10th International Conference on Human-Agent Interaction
December 2022
352 pages
ISBN:9781450393232
DOI:10.1145/3527188
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Analysis Techniques
  2. Animation
  3. Clustering
  4. Gesture
  5. Human-Agent Interaction
  6. Motion Capture
  7. Virtual Humans

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HAI '22
HAI '22: International Conference on Human-Agent Interaction
December 5 - 8, 2022
Christchurch, New Zealand

Acceptance Rates

Overall Acceptance Rate 121 of 404 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 125
    Total Downloads
  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media