skip to main content
10.1145/2522848.2522890acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Timing and entrainment of multimodal backchanneling behavior for an embodied conversational agent

Published: 09 December 2013 Publication History

Abstract

We report on an analysis of feedback behavior in an Active Listening Corpus as produced verbally, visually (head movement) and bimodally. The behavior is modeled in an embodied conversational agent and displayed in a conversation with a real human to human participants for perceptual evaluation. Five strategies for the timing of backchannels are compared: copying the timing of the original human listener, producing backchannels at randomly selected times, producing backchannels according to high level timing distributions relative to the interlocutor's utterance and pauses, or according to local entrainment to the interlocutors' vowels, or according to both. Human observers judge that models with global timing distributions miss less opportunities for backchanneling than random timing.

References

[1]
P. A. Barbosa. Incursões em torno do ritmo da fala. Pontes, Campinas, 2006.
[2]
S. Benus. Adaptation in turn-initiation. In A. E. et al., editor, Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues., Lecture Notes in Computer Science {LNCS 6456}, pages 72--80. Springer Verlag, Berlin, Heidelberg, 2011.
[3]
S. Benus, A. Gravano, and J. Hirschberg. The prosody of backchannels in American English. In Proceedings of the 16th International Congress of Phonetic Sciences, pages 1065--1068, Saarbrücken, Germany, 2007.
[4]
P. Boersma and D. Weenink. Praat: Doing phonetics by computer. version 5.3.04, 2012.
[5]
M. Breen, L. Dilley, J. Kraemer, and E. Gibson. Inter-transcriber reliability for two systems of prosodic annotation: Tobi (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8:277--312, 2010.
[6]
M. Bull and M. Aylett. An analysis of the timing of turn-taking in a corpus of goal-orientated dialogue. In Proceedings of ICSLP, pages 1775--1778, Sydney, 1998.
[7]
H. Buschmeier, Z. Malisz, M. Włodarczak, S. Kopp, and P. Wagner. 'Are you sure you're paying attention?' -- 'Uh-huh'. Communicating understanding as a marker of attentiveness. In Proceedings of Interspeech 2011, pages 2057--2060, Florence, Italy, 2011.
[8]
E. Couper-Kuhlen. English Speech Rhythm: Form and function in everyday verbal interaction. Benjamins, Amsterdam, 1993.
[9]
F. Cummins. Practice and performance in speech produced synchronously. Journal of Phonetics, 31:139--148, 2003.
[10]
F. Cummins and R. Port. Rhythmic constraints on stress timing in english. Journal of Phonetics, 26:145--171, 1998.
[11]
I. de Kok, R. Poppe, and D. Heylen. Iterative perceptual learning for social behavior synthesis. 2013.
[12]
D. Gibbon and F. R. Fernandes. Annotation-mining for rhythm model comparison in brazilian portuguese. In Proceedings of INTERSPEECH, 2005.
[13]
C. Heinrich and F. Schiel. Estimating speaking rate by means of rhythmicity parameters. In Proceedings of Interspeech, 2011.
[14]
L. Huang, L.-P. Morency, and J. Gratch. Learning backchannel prediction model from parasocial consensus sampling: A subjective evaluation. In Proceedings of the International Conference on Intelligent Virtual Agents, 2010.
[15]
L. Huang, L.-P. Morency, and J. Gratch. Parasocial consensus sampling: Combining multiple perspectives to learn virtual human behavior. In Proceeedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 2010.
[16]
B. Inden, Z. Malisz, P. Wagner, and I. Wachsmuth. Rapid entrainment to spontaneous speech: A comparison of oscillator models. In Proceedings of the Cognitive Science Conference, 2012.
[17]
I. Jauk, P. Wagner, and I. Wachsmuth. Dynamic perception-production oscillation model in human-machine communication. In Proceedings of the International Conference on Multimodal Interaction, 2011.
[18]
G. Jonsdottir, J. Gratch, E. Fast, and K. Thórisson. Fluid semantic back-channel feedback in dialogue: Challenges and progress. In Proceedings of the International Conference on Interactive Virtual Agents, 2007.
[19]
M. Kawasaki, Y. Yamada, Y. Ushiku, E. Miyauchi, and Y. Yamaguchi. Inter-brain synchronization during coordination of speech rhythm in human-to-human social interaction. Nature, 3, 2013.
[20]
S. Kopp, J. Allwood, K. Grammer, E. Ahlsen, and T. Stockmeier. Modeling embodied feedback with virtual humans. In I. Wachsmuth and G. Knoblich, editors, Modeling communication with robots and virtual humans. Springer-Verlag Berlin Heidelberg, 2008.
[21]
S. Kopp, B. Jung, N. Leßmann, and I. Wachsmuth. Max -- a multimodal assistant in virtual reality construction. Künstliche Intelligenz, 4:11--17, 2003.
[22]
S. Kousidis, Z. Malisz, P. Wagner, and D. Schlangen. Exploring annotation of head gesture forms in spontaneous human interaction. In TiGeR 2013, Tilburg Gesture Research Meeting, 2013 (submitted).
[23]
R. M. Maatman, J. Gratch, and S. Marsella. Natural behavior of a listening agent. In Proceedings of the International Conference on Interactive Virtual Agents, pages 25--36, 2005.
[24]
L.-P. Morency, I. de Kok, and J. Gratch. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the 8th International Conference on Intelligent Virtual Agents, 2008.
[25]
Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A. L. Barabasi. The physics of rhythmic applause. Physical Review E, 61:6987, 2000.
[26]
R. Poppe, K. P. Truong, and D. Heylen. Bbackchannel: Quantity, type and timing matters. In Proccedings of Intelligent Virtual Agents, 2011.
[27]
R. Poppe, K. P. Truong, and D. Heylen. Perceptual evaluation of backchannel strategies for artificial listeners. Autonomous Agents and Multi-Agent Systems, 2013.
[28]
R. Poppe, K. P. Truong, D. Reidsma, and D. Heylen. Backchannel strategies for artificial listeners. In Proceedings of the Intelligent Virtual Agents Conference, 2010.
[29]
C. Richardson, R. Dale, and K. Schockley. Synchrony and swing in conversation: Coordination, temporal dynamics, and communication. In I. Wachsmuth, M. Lenzen, and G. Knoblich, editors, Embodied Communication in Humans and Machines. Oxford University Press, 2007.
[30]
F. Tamburini and P. Wagner. On automatic prominence detection for german. In Proceedings of INTERSPEECH, 2007.
[31]
K. P. Truong, R. Poppe, I. de Kok, and D. Heylen. A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. In Proceedings of INTERSPEECH, 2011.
[32]
I. Wachsmuth. Communicative rhythms in gesture and speech. In Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction, 1999.
[33]
P. Wagner, Z. Malisz, B. Inden, and I. Wachsmuth. Interaction phonology -- a temporal co-ordination component enabling communicative alignment. In Towards a New Theory of Communication. John Benjamins, to appear.
[34]
M. Wilson and T. P. Wilson. An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review, 12:957--968, 2005.
[35]
P. Wittenburg, H. Brugman, A. Russel, A. Klassmann, and H. Sloetjes. Elan: A professional framework for mulmultimodal research. In Proceedings of the fifth International Conference on Language Resources and Evaluation, 2006.
[36]
M. Włodarczak, H. Buschmeier, Z. Malisz, S. Kopp, and P. Wagner. Listener head gestures and verbal feedback expressions in a distraction task. In Proc. of the Interdisciplinary Workshop on Feedback Behaviours in Dialogue, 2012.
[37]
M. Włodarczak, J. Simko, and P. Wagner. Temporal entrainment in overlapped speech. In Proceedings of Interspeech, 2012.
[38]
Y. Zhang and J. Glass. Speech rhythm guided syllable nuclei detection. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.

Cited By

View all

Index Terms

  1. Timing and entrainment of multimodal backchanneling behavior for an embodied conversational agent

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
    December 2013
    630 pages
    ISBN:9781450321297
    DOI:10.1145/2522848
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 December 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. backchannels
    2. embodied conversational agents
    3. entrainment

    Qualifiers

    • Poster

    Conference

    ICMI '13
    Sponsor:

    Acceptance Rates

    ICMI '13 Paper Acceptance Rate 49 of 133 submissions, 37%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)40
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media