research-article

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

Pages 21 - 26

https://doi.org/10.1145/2927006.2927011

Published: 06 June 2016 Publication History

Abstract

We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-driven dialogue manager, multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic.

References

[1]

M. Adda-Decker and L. Lamel. Pronunciation variants across systems, languages and speaking style. Modeling Pronunciation Variation for Automatic Speech Recognition, Netherlands, 1998.

[2]

J. Agenjo, A. Evans, and J. Blat. Webglstudio: A pipeline for webgl scene creation. In Proceedings of the 18th International Conference on 3D Web Technology, pages 79--82, New York, NY, USA, 2013. ACM.

Digital Library

[3]

E. André. Challenges for social embodiment. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges, pages 35--37. ACM, 2014.

Digital Library

[4]

M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven deep-syntactic dependency parsing. Natural Language Engineering, pages 1--36, 2015.

[5]

M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 Conference of the NAACL: Human Language Technologies, pages 387--397, Denver, Colorado, May--June 2015. ACL.

[6]

M. Ballesteros, C. Dyer, and N. A. Smith. Improved transition-based parsing by modeling characters instead of words with LSTMs. Proceedings of EMNLP, 2015.

[7]

M. Cohen and D. Massaro. Modeling Coarticulation in Synthetic Visual Speech, 1993.

[8]

M. Domınguez, M. Farrús, A. Burga, and L. Wanner. Using hierarchical information structure for prosody prediction in content-to-speech application. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.

[9]

M. Domınguez, M. Farrús, and L. Wanner. Combining acoustic and linguistic features in phrase-oriented prosody prediction. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.

[10]

S. Du, Y. Tao, and A. M. Martinez. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454--E1462, 2014.

[11]

P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.

[12]

M. Farrús, G. Lai, and J. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.

[13]

P. Gebhard, G. U. Mehlmann, and M. Kipp. Visual SceneMaker: A Tool for Authoring Interactive Virtual Characters. Journal of Multimodal User Interfaces: Interacting with Embodied Conversational Agents, Springer-Verlag, 6(1--2):3--11, 2012.

[14]

S. W. Gilroy, M. Cavazza, M. Niranen, E. André, T. Vogt, J. Urbain, M. Benayoun, H. Seichter, and M. Billinghurst. Pad-based multimodal affective fusion. In Affective Computing and Intelligent Interaction and Workshops, 2009.

[15]

H. Gunes and B. Schuller. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2):120--136, 2013.

Digital Library

[16]

D. Heckmann, T. Schwartz, B. Brandherm, M. Schmitz, and M. von Wilamowitz-Moellendorff. Gumo--the general user model ontology. In User modeling 2005. Springer, Berlin / Heidelberg, 2005.

Digital Library

[17]

J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Assessing naturalness and emotional intensity: a perceptual study of animated facial motion. In Proceedings of the ACM Symposium on Applied Perception, pages 15--22. ACM, 2014.

Digital Library

[18]

J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Using an interactive avatar's facial expressiveness to increase persuasiveness and socialness. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1719--1728. ACM, 2015.

Digital Library

[19]

L. Lamel and J. Gauvain. Speech recognition. In R, pages 305--322. 2003.

[20]

F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In MM, pages 377--386, 2014.

Digital Library

[21]

G. Mehlmann and E. André. Modeling Multimodal Integration with Event Logic Charts. In Proceedings of the 14th International Conference on Multimodal Interaction, pages 125--132. ACM, New York, NY, USA, 2012.

Digital Library

[22]

G. Mehlmann, K. Janowski, and E. André. Modeling Grounding for Interactive Social Companions. Journal of Artificial Intelligence: Social Companion Technologies, Springer-Verlag, 30(1):45--52, 2016.

[23]

G. Mehlmann, K. Janowski, T. Baur, M. Haring, E. André, and P. Gebhard. Exploring a Model of Gaze for Grounding in HRI. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 247--254. ACM, New York, NY, USA, 2014.

Digital Library

[24]

I. Mel'cuk. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, 1988.

[25]

S. Mille, A. Burga, and L. Wanner. AnCora-UPF: A multi-level annotation of Spanish. In Proceedings of DepLing 2013, pages 217--226, Prague, Czech Republic, 2013.

[26]

M. Mori, K. F. MacDorman, and N. Kageki. The uncanny valley {from the field}. Robotics & Automation Magazine, IEEE, 19(2):98--100, 2012.

[27]

B. Motik, B. Cuenca Grau, and U. Sattler. Structured objects in owl: Representation and reasoning. In Proceedings of the 17th international conference on World Wide Web, pages 555--564. ACM, 2008.

Digital Library

[28]

G. S. Neubig. Towards High-Reliability Speech Translation in the Medical Domain. CNLP, 2013.

[29]

M. Pantic, A. Pentland, A. Nijholt, and T. Huang. Human Computing and Machine Understanding of Human Behavior: A Survey, volume 4451, pages 47--71. 2007.

Digital Library

[30]

S. Pasquariello and C. Pelachaud. Greta: A simple facial animation engine. Soft Computing and Industry - Recent Applications, pages 511--525, 2002.

[31]

L. Pfeifer Vardoulakis, L. Ring, B. Barry, C. Sidner, and T. Bickmore. Designing relational agents as long term social companions for older adults. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, 2012.

Digital Library

[32]

J. Posner, J. Russell, and B. Peterson. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development and psychopathology. Development and psychopathology, 17(3), 2005.

[33]

L. Pragst, S. Ultes, M. Kraus, and W. Minker. Adaptive dialogue management in the kristina project for multicultural health care applications. In Proceedings of the 19thWorkshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 202--203, Aug. 2015.

[34]

D. Riano, F. Real, F. Campana, S. Ercolani, and R. Annicchiarico. An ontology for the care of the elder at home. In Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine, AIME '09, pages 235--239, Berlin, Heidelberg, 2009. Springer-Verlag.

Digital Library

[35]

A. Ruiz, J. Van de Weijer, and X. Binefa. From emotions to action units with hidden and semi-hidden-task learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3703--3711, 2015.

Digital Library

[36]

S. K. Sakti. Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based ASR System. LREC. Iceland, 2014.

[37]

G. Sandbach, S. Zafeiriou, M. Pantic, and L. Yin. Static and dynamic 3d facial expression recognition: A comprehensive survey. Image and Vision Computing, 30(10):683--697, 2012.

Digital Library

[38]

A. Savran, B. Sankur, and M. T. Bilge. Regression-based intensity estimation of facial action units. Image and Vision Computing, 30(10):774--784, 2012.

Digital Library

[39]

T. Tsikrika, K. Andreadou, A. Moumtzidou, E. Schinas, S. Papadopoulos, S. Vrochidis, and Y. Kompatsiaris. A unified model for socially interconnected multimedia-enriched objects. In Proceedings of the 21st MultiMedia Modelling Conference (MMM2015), 2015.

[40]

S. Ultes, M. Kraus, A. Schmitt, and W. Minker. Quality-adaptive spoken dialogue initiative selection and implications on reward modelling. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 374--383. ACL, Sept. 2015.

[41]

S. Ultes and W. Minker. Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5):523--539, Aug. 2014.

[42]

A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3:69--87, 2012.

Digital Library

[43]

A. Vlachantoni, R. Shaw, R. Willis, M. Evandrou, J. Falkingham, and R. Luff. Measuring unmet need for social care amongst older people. Population Trends, (145):1--17, 2011.

[44]

J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (SSI) framework--multimodal signal processing and recognition in real-time. In Proceedings of ACM International Conference on Multimedia, 2013.

Digital Library

[45]

L. Wanner, B. Bohnet, N. Bouayad-Agha, F. Lareau, and D. Nicklaß. MARQUIS: Generation of user-tailored multilingual air quality bulletins. Applied Artificial Intelligence, 24(10):914--952, 2010.

Digital Library

[46]

Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31:39--58, 2009.

Digital Library

Cited By

Meditskos GKontopoulos EVrochidis SKompatsiaris I(2019)Converness: Ontology‐driven conversational awareness and context understanding in multimodal dialogue systemsExpert Systems10.1111/exsy.1237837:1Online publication date: 12-Feb-2019
https://doi.org/10.1111/exsy.12378

Index Terms

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. Information systems applications

Recommendations

Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Laughter and gaze have an important role in managing and coordi-nating social interactions. In the current work, using a multimodal corpus of dyadic taste-testing interactions, we explore whether laughs performing different pragmatic functions are ...
Design of a Knowledge-Based Agent as a Social Companion

We present work in progress on an intelligent embodied conversation agent that is supposed to act as a social companion with linguistic and emotional competence in the context of basic and health care. The core of the agent is an ontology-based ...
Interruptions in Human-Agent Interaction
IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents

Turn management is one of the necessary social interactions skills. In human-human interactions, turn changes are naturally completed by interruption, "cooperatively" or "competitively". Interruptions are inherent in conversation. They can be considered ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

June 2016

46 pages

ISBN:9781450343626

DOI:10.1145/2927006

General Chairs:
Stefanos Vrochidis
CERTH-ITI, Greece
,
Leo Wanner
ICREA-UPF, Spain
,
Elisabeth André
University of Augsburg, Germany
,
Stephanie Elzer Schwartz
Millersville University, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Commission

Conference

ICMR'16

Sponsor:

SIGMM

ICMR'16: International Conference on Multimedia Retrieval

June 6, 2016

New York, New York, USA

Acceptance Rates

MARMI '16 Paper Acceptance Rate 6 of 7 submissions, 86%;

Overall Acceptance Rate 6 of 7 submissions, 86%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Meditskos GKontopoulos EVrochidis SKompatsiaris I(2019)Converness: Ontology‐driven conversational awareness and context understanding in multimodal dialogue systemsExpert Systems10.1111/exsy.1237837:1Online publication date: 12-Feb-2019
https://doi.org/10.1111/exsy.12378

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten