skip to main content
10.1145/1647314.1647332acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Multimodal end-of-turn prediction in multi-party meetings

Published: 02 November 2009 Publication History

Abstract

One of many skills required to engage properly in a conversation is to know the appropiate use of the rules of engagement. In order to engage properly in a conversation, a virtual human or robot should, for instance, be able to know when it is being addressed or when the speaker is about to hand over the turn. The paper presents a multimodal approach to end-of-speaker-turn prediction using sequential probabilistic models (Conditional Random Fields) to learn a model from observations of real-life multi-party meetings. Although the results are not as good as expected, we provide insight into which modalities are important when taking a multimodal approach to the problem based on literature and our own results.

References

[1]
M. Argyle and M. Cook. Gaze and mutual gaze. Cambridge University Press, London, United Kingdom, 1976.
[2]
M. Atterer, T. Baumann, and D. Schlangen. Towards incremental end-of-utterance detection in dialogue systems. In Proceedings of International Conference on Computational Linguistics, 2008.
[3]
P. Barkhuysen, E. Krahmer, and M. Swerts. The interplay between auditory and visual cues for end-of-utterance detection. Journal of Acoustical Society of America, 123(1):354 -- 365, 2008.
[4]
P. Boersma and V. van Heuven. Speak and unspeak with praat. Glot International, 5(9-10):341--347, November 2001.
[5]
J. Cassell, Y. I. Nakano, T. W. Bickmore, C. L. Sidner, and C. Rich. Non-verbal cues for discourse structure. In ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 114--123, Morristown, NJ, USA, 2001. Association for Computational Linguistics.
[6]
J. Cassell, J. Sullivan, S. Prevost, and E. F. Churchill. Embodied Conversational Agents. MIT Press, Cambridge Massachusetts, London England, 2000.
[7]
J. Cassell, O. E. Torres, and S. Prevost. Turn taking vs. discourse structure: How best to model multimodal conversation. In Machine Conversations, pages 143--154. Kluwer, 1998.
[8]
J. de Ruiter, H. Mitterer, and N. Enfield. Projecting the end of a speaker's turn: A cognitive cornerstone of conversation. Language, 82(3):515 -- 535, 2006.
[9]
S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2):283 -- 292, 1972.
[10]
S. Duncan and G. Niederehe. On signalling that it's your turn to speak. Journal of Experimental Social Psychology, 10:234--247, 1974.
[11]
O. Fuentes, D. Vera, and T. Solorio. A filter-based approach to detect end-of-utterances from prosody in dialog systems. In HLT-NAACL (Short Papers), pages 45--48. The Association for Computational Linguistics, 2007.
[12]
J. Fung, D. Hakkani-Tur, M. Magimai-Doss, E. Shriberg, S. Cuendet, and N. Mirghafori. Prosodic features and feature selection for multi-lingual sentence segmentation. In Proceedings of Interspeech 2007, pages 2585--2588, 2007.
[13]
C. Goodwin. Conversational Organization: interaction between speakers and hearers. Academic Press, 1981.
[14]
D. Heylen. Head gestures, gaze and the principles of conversational structure. International Journal of Humanoid Robotics, 3(3):241--267, 2006.
[15]
D. Heylen. Listening heads. In I. Wachsmuth and G. Knoblich, editors, Modeling Communication with robots and virtual humans, volume 4930 of Lecture Notes in Artificial Intelligence, pages 241--259. Springer Verlag, Berlin, 2008.
[16]
http://corpus.amiproject.org. The AMI Meeting Corpus, May 2009.
[17]
A. Kendon. Some functions of gaze direction in social interaction. Acta Psychologica, 26:22--63, 1967.
[18]
J. Laerty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labelling sequence data. In ICML, 2001.
[19]
T. Minato, Y. Yoshikawa, T. Noda, S. Ikemoto, H. Ishiguro, and M. Asada. CB2: A child robot with biomimetic body for cognitive developmental robotics. In IROS 2008: Proceedings of the IEEE/RSJ 2008 International Conference on Intelligent RObots and Systems, pages 193--200, 2008.
[20]
L.-P. Morency, I. de Kok, and J. Gratch. Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In ICMI '08: Proceedings of the 10th International Conference on Multimodal Interfaces, pages 181--188, New York, NY, USA, 2008. ACM.
[21]
L.-P. Morency, I. de Kok, and J. Gratch. Predicting listener backchannels: A probabilistic multimodal approach. In Intelligent Virtual Agents (IVA '08), pages 176--190, 2008.
[22]
D. C. O'Connell, S. Kowal, and E. Kaltenbacher. Turn-taking: A critical analysis of the research tradition. Journal of Psycholinguistic Research, 19(6):345 -- 373, 1990.
[23]
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286, 1989.
[24]
R. J. Rienks, R. Poppe, and D. Heylen. Di erences in head orientation behavior for speakers and listeners: an experiment in a virtual environment. Transactions on Applied Perception, 7(1):accepted for publication, 2010.
[25]
H. Sacks, E. A. Scheglo, and G. Je erson. A simplest systematics for the organization of turn-taking for conversation. Language, 50(4):696 -- 735, 1974.
[26]
D. Sakamoto, T. Kanda, T. Ono, H. Ishiguro, and N. Hagita. Android as a telecommunication medium with a human-like presence. In HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction, pages 193--200, New York, NY, USA, 2007. ACM.
[27]
D. Schlangen. From reaction to prediction: Experiments with computational models of turn-taking. In Proceedings of Interspeech 2006, 2006.
[28]
T. Sikorski and J. F. Allen. A task-based evaluation of the trains-95 dialogue system. In ECAI '96: Workshop on Dialogue Processing in Spoken Language Systems, pages 207--220, London, UK, 1997. Springer-Verlag.
[29]
R. Vertegaal, R. Slagter, G. van der Veer, and A. Nijholt. Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In Proceedings of CHI'01, pages 301 -- 308. ACM, 2001.
[30]
R. Vertegaal, G. van der Veer, and H. Vons. E ects of gaze on multiparty mediated communication. In Proceedings of Graphics Interface, pages 95 -- 102, Montreal, Canada, 2000. Morgan Kaufmann Publishers.
[31]
N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 32(8):1177--1207, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
November 2009
374 pages
ISBN:9781605587721
DOI:10.1145/1647314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. end-of-turn prediction
  2. multimodal
  3. probabilistic model

Qualifiers

  • Poster

Conference

ICMI-MLMI '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media