LLM Knows Body Language, Too: Translating Speech Voices into Human Gestures

Chenghao Xu, Guangtao Lyu, Jiexi Yan, Muli Yang, Cheng Deng


Abstract
In response to the escalating demand for digital human representations, progress has been made in the generation of realistic human gestures from given speeches. Despite the remarkable achievements of recent research, the generation process frequently includes unintended, meaningless, or non-realistic gestures. To address this challenge, we propose a gesture translation paradigm, GesTran, which leverages large language models (LLMs) to deepen the understanding of the connection between speech and gesture and sequentially generates human gestures by interpreting gestures as a unique form of body language. The primary stage of the proposed framework employs a transformer-based auto-encoder network to encode human gestures into discrete symbols. Following this, the subsequent stage utilizes a pre-trained LLM to decipher the relationship between speech and gesture, translating the speech into gesture by interpreting the gesture as unique language tokens within the LLM. Our method has demonstrated state-of-the-art performance improvement through extensive and impartial experiments conducted on public TED and TED-Expressive datasets.
Anthology ID:
2024.acl-long.273
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5004–5013
Language:
URL:
https://aclanthology.org/2024.acl-long.273
DOI:
10.18653/v1/2024.acl-long.273
Bibkey:
Cite (ACL):
Chenghao Xu, Guangtao Lyu, Jiexi Yan, Muli Yang, and Cheng Deng. 2024. LLM Knows Body Language, Too: Translating Speech Voices into Human Gestures. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5004–5013, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
LLM Knows Body Language, Too: Translating Speech Voices into Human Gestures (Xu et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.273.pdf