skip to main content
10.1145/2927006.2927011acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

Published: 06 June 2016 Publication History

Abstract

We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-driven dialogue manager, multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic.

References

[1]
M. Adda-Decker and L. Lamel. Pronunciation variants across systems, languages and speaking style. Modeling Pronunciation Variation for Automatic Speech Recognition, Netherlands, 1998.
[2]
J. Agenjo, A. Evans, and J. Blat. Webglstudio: A pipeline for webgl scene creation. In Proceedings of the 18th International Conference on 3D Web Technology, pages 79--82, New York, NY, USA, 2013. ACM.
[3]
E. André. Challenges for social embodiment. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges, pages 35--37. ACM, 2014.
[4]
M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven deep-syntactic dependency parsing. Natural Language Engineering, pages 1--36, 2015.
[5]
M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 Conference of the NAACL: Human Language Technologies, pages 387--397, Denver, Colorado, May--June 2015. ACL.
[6]
M. Ballesteros, C. Dyer, and N. A. Smith. Improved transition-based parsing by modeling characters instead of words with LSTMs. Proceedings of EMNLP, 2015.
[7]
M. Cohen and D. Massaro. Modeling Coarticulation in Synthetic Visual Speech, 1993.
[8]
M. Domınguez, M. Farrús, A. Burga, and L. Wanner. Using hierarchical information structure for prosody prediction in content-to-speech application. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.
[9]
M. Domınguez, M. Farrús, and L. Wanner. Combining acoustic and linguistic features in phrase-oriented prosody prediction. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.
[10]
S. Du, Y. Tao, and A. M. Martinez. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454--E1462, 2014.
[11]
P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.
[12]
M. Farrús, G. Lai, and J. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.
[13]
P. Gebhard, G. U. Mehlmann, and M. Kipp. Visual SceneMaker: A Tool for Authoring Interactive Virtual Characters. Journal of Multimodal User Interfaces: Interacting with Embodied Conversational Agents, Springer-Verlag, 6(1--2):3--11, 2012.
[14]
S. W. Gilroy, M. Cavazza, M. Niranen, E. André, T. Vogt, J. Urbain, M. Benayoun, H. Seichter, and M. Billinghurst. Pad-based multimodal affective fusion. In Affective Computing and Intelligent Interaction and Workshops, 2009.
[15]
H. Gunes and B. Schuller. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2):120--136, 2013.
[16]
D. Heckmann, T. Schwartz, B. Brandherm, M. Schmitz, and M. von Wilamowitz-Moellendorff. Gumo--the general user model ontology. In User modeling 2005. Springer, Berlin / Heidelberg, 2005.
[17]
J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Assessing naturalness and emotional intensity: a perceptual study of animated facial motion. In Proceedings of the ACM Symposium on Applied Perception, pages 15--22. ACM, 2014.
[18]
J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Using an interactive avatar's facial expressiveness to increase persuasiveness and socialness. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1719--1728. ACM, 2015.
[19]
L. Lamel and J. Gauvain. Speech recognition. In R, pages 305--322. 2003.
[20]
F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In MM, pages 377--386, 2014.
[21]
G. Mehlmann and E. André. Modeling Multimodal Integration with Event Logic Charts. In Proceedings of the 14th International Conference on Multimodal Interaction, pages 125--132. ACM, New York, NY, USA, 2012.
[22]
G. Mehlmann, K. Janowski, and E. André. Modeling Grounding for Interactive Social Companions. Journal of Artificial Intelligence: Social Companion Technologies, Springer-Verlag, 30(1):45--52, 2016.
[23]
G. Mehlmann, K. Janowski, T. Baur, M. Haring, E. André, and P. Gebhard. Exploring a Model of Gaze for Grounding in HRI. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 247--254. ACM, New York, NY, USA, 2014.
[24]
I. Mel'cuk. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, 1988.
[25]
S. Mille, A. Burga, and L. Wanner. AnCora-UPF: A multi-level annotation of Spanish. In Proceedings of DepLing 2013, pages 217--226, Prague, Czech Republic, 2013.
[26]
M. Mori, K. F. MacDorman, and N. Kageki. The uncanny valley {from the field}. Robotics & Automation Magazine, IEEE, 19(2):98--100, 2012.
[27]
B. Motik, B. Cuenca Grau, and U. Sattler. Structured objects in owl: Representation and reasoning. In Proceedings of the 17th international conference on World Wide Web, pages 555--564. ACM, 2008.
[28]
G. S. Neubig. Towards High-Reliability Speech Translation in the Medical Domain. CNLP, 2013.
[29]
M. Pantic, A. Pentland, A. Nijholt, and T. Huang. Human Computing and Machine Understanding of Human Behavior: A Survey, volume 4451, pages 47--71. 2007.
[30]
S. Pasquariello and C. Pelachaud. Greta: A simple facial animation engine. Soft Computing and Industry - Recent Applications, pages 511--525, 2002.
[31]
L. Pfeifer Vardoulakis, L. Ring, B. Barry, C. Sidner, and T. Bickmore. Designing relational agents as long term social companions for older adults. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, 2012.
[32]
J. Posner, J. Russell, and B. Peterson. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development and psychopathology. Development and psychopathology, 17(3), 2005.
[33]
L. Pragst, S. Ultes, M. Kraus, and W. Minker. Adaptive dialogue management in the kristina project for multicultural health care applications. In Proceedings of the 19thWorkshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 202--203, Aug. 2015.
[34]
D. Riano, F. Real, F. Campana, S. Ercolani, and R. Annicchiarico. An ontology for the care of the elder at home. In Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine, AIME '09, pages 235--239, Berlin, Heidelberg, 2009. Springer-Verlag.
[35]
A. Ruiz, J. Van de Weijer, and X. Binefa. From emotions to action units with hidden and semi-hidden-task learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3703--3711, 2015.
[36]
S. K. Sakti. Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based ASR System. LREC. Iceland, 2014.
[37]
G. Sandbach, S. Zafeiriou, M. Pantic, and L. Yin. Static and dynamic 3d facial expression recognition: A comprehensive survey. Image and Vision Computing, 30(10):683--697, 2012.
[38]
A. Savran, B. Sankur, and M. T. Bilge. Regression-based intensity estimation of facial action units. Image and Vision Computing, 30(10):774--784, 2012.
[39]
T. Tsikrika, K. Andreadou, A. Moumtzidou, E. Schinas, S. Papadopoulos, S. Vrochidis, and Y. Kompatsiaris. A unified model for socially interconnected multimedia-enriched objects. In Proceedings of the 21st MultiMedia Modelling Conference (MMM2015), 2015.
[40]
S. Ultes, M. Kraus, A. Schmitt, and W. Minker. Quality-adaptive spoken dialogue initiative selection and implications on reward modelling. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 374--383. ACL, Sept. 2015.
[41]
S. Ultes and W. Minker. Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5):523--539, Aug. 2014.
[42]
A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3:69--87, 2012.
[43]
A. Vlachantoni, R. Shaw, R. Willis, M. Evandrou, J. Falkingham, and R. Luff. Measuring unmet need for social care amongst older people. Population Trends, (145):1--17, 2011.
[44]
J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (SSI) framework--multimodal signal processing and recognition in real-time. In Proceedings of ACM International Conference on Multimedia, 2013.
[45]
L. Wanner, B. Bohnet, N. Bouayad-Agha, F. Lareau, and D. Nicklaß. MARQUIS: Generation of user-tailored multilingual air quality bulletins. Applied Artificial Intelligence, 24(10):914--952, 2010.
[46]
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31:39--58, 2009.

Cited By

View all
  • (2019)Converness: Ontology‐driven conversational awareness and context understanding in multimodal dialogue systemsExpert Systems10.1111/exsy.1237837:1Online publication date: 12-Feb-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction
June 2016
46 pages
ISBN:9781450343626
DOI:10.1145/2927006
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dialogue
  2. embodied agent
  3. multimodal communication
  4. retrieval

Qualifiers

  • Research-article

Funding Sources

  • European Commission

Conference

ICMR'16
Sponsor:

Acceptance Rates

MARMI '16 Paper Acceptance Rate 6 of 7 submissions, 86%;
Overall Acceptance Rate 6 of 7 submissions, 86%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Converness: Ontology‐driven conversational awareness and context understanding in multimodal dialogue systemsExpert Systems10.1111/exsy.1237837:1Online publication date: 12-Feb-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media