skip to main content
10.1145/3136755.3136813acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multimodal analysis of vocal collaborative search: a public corpus and results

Published: 03 November 2017 Publication History

Abstract

Intelligent agents have the potential to help with many tasks. Information seeking and voice-enabled search assistants are becoming very common. However, there remain questions as to the extent by which these agents should sense and respond to emotional signals. We designed a set of information seeking tasks and recruited participants to complete them using a human intermediary. In total we collected data from 22 pairs of individuals, each completing five search tasks. The participants could communicate only using voice, over a VoIP service. Using automated methods we extracted facial action, voice prosody and linguistic features from the audio-visual recordings. We analyzed the characteristics of these interactions that correlated with successful communication and understanding between the pairs. We found that those who were expressive in channels that were missing from the communication channel (e.g., facial actions and gaze) were rated as communicating poorly, being less helpful and understanding. Having a way of reinstating nonverbal cues into these interactions would improve the experience, even when the tasks are purely information seeking exercises. The dataset used for this analysis contains over 15 hours of video, audio and transcripts and reported ratings. It is publicly available for researchers at: http://aka.ms/MISCv1.

References

[1]
Irman Abdic, Lex Fridman, Daniel McDuff, Erik Marchi, Bryan Reimer, and Björn Schuller. 2016. Driver Frustration Detection from Audio and Video in the Wild. KI 2016: Advances in Artificial Intelligence (2016), 237.
[2]
Ioannis Arapakis, Joemon M Jose, and Philip D Gray. 2008. Affective feedback: an investigation into the role of emotions in the information seeking process. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 395–402.
[3]
Ioannis Arapakis, Ioannis Konstas, and Joemon M Jose. 2009. Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In Proceedings of the 17th ACM international conference on Multimedia. ACM, 461–470.
[4]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2017. Multimodal Machine Learning: A Survey and Taxonomy. arXiv preprint arXiv:1705.09406 (2017).
[5]
Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 1–10.
[6]
Justine Cassell and Kristinn R Thorisson. 1999. The power of a nod and a glance: Envelope vs. emotional feedback in animated conversational agents. Applied Artificial Intelligence 13, 4-5 (1999), 519–538.
[7]
Sidney D’Mello and Jacqueline Kory. 2012. Consistent but modest: a metaanalysis on unimodal and multimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 31–38.
[8]
Paul Ekman, Wallace V Friesen, and John Hager. 2002. Facial action coding system: A technique for the measurement of facial movement. Research Nexus, Salt Lake City, UT. ICMI’17, November 13–17, 2017, Glasgow, UK Daniel McDuff, Paul Thomas, Mary Czerwinski, and Nick Craswell
[9]
Paul Ekman and Erika L Rosenberg. 1997. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA.
[10]
Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 835– 838.
[11]
Jonathan Gratch, Ning Wang, Jillian Gerten, Edward Fast, and Robin Duffy. 2007. Creating rapport with virtual agents. In International Workshop on Intelligent Virtual Agents. Springer, 125–138.
[12]
Jing Han, Zixing Zhang, Fabien Ringeval, and Björn Schuller. 2017. Predictionbased learning for continuous emotion recognition in speech. In 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017.
[13]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Advances in psychology 52 (1988), 139–183.
[14]
Natasha Jaques, Daniel McDuff, Yoo Lim Kim, and Rosalind Picard. 2016. Understanding and Predicting Bonding in Conversations Using Thin Slices of Facial Expressions and Body Language. In International Conference on Intelligent Virtual Agents. Springer, 64–74.
[15]
Tom Johnstone and Klaus R Scherer. 2000. Vocal communication of emotion. Handbook of emotions 2 (2000), 220–235.
[16]
Ashish Kapoor, Winslow Burleson, and Rosalind W Picard. 2007. Automatic prediction of frustration. International journal of human-computer studies 65, 8 (2007), 724–736.
[17]
Karim Sadik Kassam. 2010. Assessment of emotional experience through facial expression. Harvard University.
[18]
Daniel Jonathan McDuff. 2014. Crowdsourcing affective responses for predicting media effectiveness. Ph.D. Dissertation. Massachusetts Institute of Technology.
[19]
Kate G Niederhoffer and James W Pennebaker. 2002. Linguistic style matching in social interaction. Journal of Language and Social Psychology 21, 4 (2002), 337–360.
[20]
James W Pennebaker, Roger J Booth, and Martha E Francis. 2007. Linguistic inquiry and word count: LIWC {Computer software}. Austin, TX: liwc. net (2007).
[21]
Rosalind W Picard. 1997. Affective computing. Vol. 252. MIT press Cambridge.
[22]
Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 1–8.
[23]
Stefan Scherer, Giota Stratou, Marwa Mahmoud, Jill Boberg, Jonathan Gratch, Albert Rizzo, and Louis-Philippe Morency. 2013. Automatic behavior descriptors for psychological disorder analysis. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 1–8.
[24]
Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC: A data set of information-seeking conversations. In Proc. Int. W’shop on Conversational Approaches to Information Retrieval.
[25]
Matthew Turk. 2014. Multimodal interaction: A review. Pattern Recognition Letters 36 (2014), 189–195.
[26]
David Watson, Lee A Clark, and Auke Tellegen. 1988. Development and validation of brief measures of positive and negative affect: the PANAS scales. Journal of personality and social psychology 54, 6 (1988), 1063.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction
November 2017
676 pages
ISBN:9781450355438
DOI:10.1145/3136755
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multimodal
  2. agent
  3. dataset
  4. search

Qualifiers

  • Research-article

Conference

ICMI '17
Sponsor:

Acceptance Rates

ICMI '17 Paper Acceptance Rate 65 of 149 submissions, 44%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media