Abstract
This paper describes the Virtual Guide, a multimodal dialogue system represented by an embodied conversational agent that can help users to find their way in a virtual environment, while adapting its affective linguistic style to that of the user. We discuss the modular architecture of the system, and describe the entire loop from multimodal input analysis to multimodal output generation. We also describe how the Virtual Guide detects the level of politeness of the user’s utterances in real-time during the dialogue and aligns its own language to that of the user, using different politeness strategies. Finally we report on our first user tests, and discuss some potential extensions to improve the system.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Allen J, Core M (1997) Draft of DAMSL: Dialog Act Markup in Several Layers. Tech. rep., University of Rochester
André E, Rehm M, Minker W, Buhler D (2004) Endowing spoken language dialogue systems with emotional intelligence. In: Affective dialogue systems. LNCS, vol 3068, pp 178–187
Bateman J, Paris C (2005) Adaptation to affective factors: architectural impacts for natural language generation and dialogue. In: Proceedings of the workshop on adapting the interaction style to affective factors at the 10th international conference on user modeling (UM-05)
Bernsen N, Dybkjær L (2004) Managing domain-oriented spoken conversation. In: Proceedings of the AAMAS 2004 workshop on embodied conversational agents: balanced perception and action, pp 9–17
Bickmore T, Caruso L, Clough-Gorr K, Heeren T (2005) ‘It’s just like you talk to a friend’—relational agents for older adults. Interact Comput 17(6):711–735
Black W, Thompson P, Funk A, Conroy A (2003) Learning to classify utterances in a task-oriented dialogue. In: Proceedings of the 2003 EACL workshop on dialogue systems: interaction, adaptation and styles of management, pp 9–16
Boves L, Neumann A, Vuurpijl L, ten Bosch L, Rossignol S, Engel R, Pfleger N (2004) Multimodal interaction in architectural design applications. In: Proceedings UI4ALL 2004: 8th ERCIM workshop on “user interfaces for all”, pp 384–390
Brown P, Levinson SC (1987) Politeness—some universals in language usage. Cambridge University Press, Cambridge
Buschmeier H, Bergmann K, Kopp S (2009) An alignment-capable microplanner for natural language generation. In: Proceedings of the twelfth European workshop on natural language generation (ENLG 2009), pp 82–89
Cassell J, Bickmore T (2003) Negotiated collusion: modeling social language and its relationship effects in intelligent agents. User Model User-Adapt Interact 13(1–2):89–132
Cassell J, Vilhjálmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: Proceedings of SIGGRAPH ’01, pp 477–486
Catizone R, Setzer A, Wilks Y (2003) Multimodal dialogue management in the COMIC project. In: Proceedings of the 2003 EACL workshop on dialogue systems: interaction, adaptation and styles of management, pp 25–34
Cheyer A, Martin D (2001) The open agent architecture. J Auton Agents Multi-Agent Syst 4(1):143–148
Clark HH (1996) Using language. Cambridge University Press, Cambridge
Dale R, Reiter E (1995) Computational interpretation of the Gricean maxims in the generation of referring expressions. Cogn Sci 19(2):233–263
van Dijk B, op den Akker R, Nijholt A, Zwiers J (2003) Navigation assistance in virtual worlds. Inf Sci 6:115–125. Special series on community informatics
Evers M, Nijholt A (2000) Jacob—an animated instruction agent for virtual reality. In: Tan T et al. (eds), Advances in multimodal interfaces—ICMI 2000. LNCS, vol 1948. Springer, Berlin, pp 526–533
Guinn C, Hubal R (2003) Extracting emotional information from the text of spoken dialog. In: Proceedings of the 9th international conference on user modeling, pp 23–27
Gupta S, Walker MA, Romano DM (2007) Generating politeness in task based interaction: an evaluation of the effect of linguistic form and culture. In: Proceedings of the eleventh European workshop on natural language generation (ENLG-07), pp 57–64
Gupta S, Walker MA, Romano DM (2008) POLLy: a conversational system that uses a shared, representation to generate action and social language. In: Proceedings of IJCNLP 2008, the third international joint conference on natural language processing, pp 967–972
Isard A, Brockmann C, Oberlander J (2006) Individuality and alignment in generated dialogues. In: Proceedings of the 4th international conference on natural language generation (INLG-06), pp 22–29
Janarthanam S, Lemon O (2009) Learning lexical alignment policies for generating referring expressions for spoken dialogue systems. In: Proceedings of the twelfth European workshop on natural language generation (ENLG 2009), pp 74–81
de Jong M, Theune M, Hofs D (2008) Politeness and alignment in dialogues with a virtual guide. In: Proceedings of the seventh international conference on autonomous agents and multiagent systems (AAMAS 2008), pp 207–214
Keizer S, op den Akker R (2007) Dialogue act recognition under uncertainty using bayesian networks. Nat Lang Eng 13(4):287–316
Kelleher JD, Costello FJ (2009) Applying computational models of spatial prepositions to visually situated dialog. Comput Linguist 35(2):271–306
Kerminen A, Jokinen K (2003) Distributed dialogue management in a blackboard architecture. In: Proceedings of the 2003 EACL workshop on dialogue systems: interaction, adaptation and styles of management, pp 53–60
Kopp S, Tepper P, Striegnitz K, Ferriman K, Cassell J (2007) Trading spaces: how humans and humanoids use speech and gesture to give directions. In: Nishida T (ed) Engineering approaches to conversational informatics. Wiley, New York
Lappin S, Leass H (1994) An algorithm for pronominal anaphora resolution. Comput Linguist 20(4):535–561
Lemon O, Bracy A, Gruenstein A, Peters S (2001) The WITAS multi-modal dialogue system I. In: Proceedings EuroSpeech 2001, pp 1559–1562
Neff M, Kipp M, Albrecht I, Seidel HP (2008) Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans Graph 27(1):1–24
Oviatt S, Cohen P (2000) Multimodal interfaces that process what comes naturally. Commun ACM 43(3):45–53
Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of dialogue. Behav Brain Sci 27:169–226
Porayska-Pomsta K, Mellish C (2004) Modelling politeness in natural language generation. In: Proceedings of the third international conference on natural language generation (INLG-04). LNAI, vol 3123, pp 141–150
Rehm M, André E (2005) Informing the design of embodied conversational agents by analyzing multimodal politeness behaviors in human-human communication. In: Proceedings of the AISB symposium on conversational informatics for supporting social intelligence and interaction, pp 144–151
Sikkel K, op den Akker R (1993) Predictive head-corner chart parsing. In: IWPT 3, third international workshop on parsing technologies, pp 267–276
Theune M, Hofs D, van Kessel M (2007) The virtual guide: a direction giving embodied conversational agent. In: Proceedings of interspeech 2007, pp 2197–2200
Vismans R (1994) Modal particles in dutch directives: a study in functional grammar. In: IFOTT, Vrije Universiteit, Amsterdam
Walker M, Cahn J, Whittaker S (1997) Improvising linguistic style: social and affective bases for agent personality. In: Proceedings of autonomous agents’97. ACM, New York, pp 96–105
Wang N, Johnson WL, Mayer RE, Rizzo P, Shaw E, Collins H (2008) The politeness effect: pedagogical agents and learning outcomes. Int J Human-Comput Stud 66:98–112
Wasinger R, Wahlster W (2006) Multimodal human-environment interaction. In: Aarts E, Encarnação J (eds) True visions: the emergence of ambient intelligence. Springer, Berlin, pp 293–308
van Welbergen H, Nijholt A, Reidsma D, Zwiers J (2006) Presenting in virtual worlds: towards an architecture for a 3D presenter explaining 2D-presented information. IEEE Intell Syst 21(5):47–53
White M, Caldwell T (1998) EXEMPLARS: a practical, extensible framework for dynamic text generation. In: Proceedings of the ninth international workshop on natural language generation (INLG-98), pp 266–275
Wu L, Oviatt SL, Cohen PR (1999) Multimodal integration—a statistical view. IEEE Trans Multimedia 1(4):334–341
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Hofs, D., Theune, M. & op den Akker, R. Natural interaction with a virtual guide in a virtual environment. J Multimodal User Interfaces 3, 141–153 (2010). https://doi.org/10.1007/s12193-009-0024-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-009-0024-6