skip to main content
research-article

In the Mood for Vlog: Multimodal Inference in Conversational Social Video

Published: 30 June 2015 Publication History

Abstract

The prevalent “share what's on your mind” paradigm of social media can be examined from the perspective of mood: short-term affective states revealed by the shared data. This view takes on new relevance given the emergence of conversational social video as a popular genre among viewers looking for entertainment and among video contributors as a channel for debate, expertise sharing, and artistic expression. From the perspective of human behavior understanding, in conversational social video both verbal and nonverbal information is conveyed by speakers and decoded by viewers. We present a systematic study of classification and ranking of mood impressions in social video, using vlogs from YouTube. Our approach considers eleven natural mood categories labeled through crowdsourcing by external observers on a diverse set of conversational vlogs. We extract a comprehensive number of nonverbal and verbal behavioral cues from the audio and video channels to characterize the mood of vloggers. Then we implement and validate vlog classification and vlog ranking tasks using supervised learning methods. Following a reliability and correlation analysis of the mood impression data, our study demonstrates that, while the problem is challenging, several mood categories can be inferred with promising performance. Furthermore, multimodal features perform consistently better than single-channel features. Finally, we show that addressing mood as a ranking problem is a promising practical direction for several of the mood categories studied.

References

[1]
Nalini Ambady and Robert Rosenthal. 1992. Thin slices of expressive behavior as predictors of interpersonal consequences. A meta-analysis. Psychological Bulletin 111, 256--274.
[2]
Joan-Isaac Biel and Daniel Gatica-Perez. 2011. VlogSense: Conversational behavior and social attention in YouTube. ACM Transactions on Multimedia Computing, Communications 7, 1, 33:1--33:21.
[3]
Joan-Isaac Biel and Daniel Gatica-Perez. 2012. The good, the bad, and the angry: Analyzing crowdsourced impressions of vloggers. In Proceedings of International Conference on Weblogs and Social Media.
[4]
Joan-Isaac Biel and Daniel Gatica-Perez. 2013. The YouTube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia 15, 1, 41--55.
[5]
Joan-Isaac Biel, Daniel Gatica-Perez, John Dines, and Vagia Tsminiaki. 2013. Hi YouTube! Personality impressions and verbal content in social video. In Proceedings of the International Conference on Multimodal Interfaces (ICMI'13).
[6]
Joan-Isaac Biel, Lucia Teijeiro-Mosquera, and Daniel Gatica-Perez. 2012. FaceTube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the International Conference on Multimodal Interfaces (ICMI'12).
[7]
Paul Boersma. 2002. Praat, a system for doing phonetics by computer. Glot International 5, 9/10, 341--345.
[8]
Gary Bradski and Adrian Kaehler. 2008. Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Inc., Sebastopol, CA.
[9]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32.
[10]
Munmun De Choudhury, Scott Counts, and Michael Gamon. 2012. Not all moods are created equal! Exploring human emotional states in social media. In Proceedings of the AAAI International Conference on Weblogs and Social Media.
[11]
Trinh-Minh-Tri Do and Thierry Artières. 2012. Regularized bundle methods for convex and non-convex risks. Journal of Machine Learning Research 13, 1, 3539--3583.
[12]
Paul Ekman and Wallace V. Friesen. 2003. Unmasking the face: A guide to recognizing emotions from facial clues. ISHK, San Jose, CA.
[13]
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8, 861--874.
[14]
Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Communications of the ACM 56, 4, 82--89.
[15]
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933--969.
[16]
Felix Gillete. 2014. Hollywood's Big-Money YouTube Hit Factory. Bloomberg Business Week. Aug 28.
[17]
Jeffrey M. Girard, Jeffrey F. Cohn, Mohammad H. Mahoor, Seyedmohammad Mavadati, and Dean P. Rosenwald. 2013. Social risk and depression: Evidence from manual and automatic facial expression analysis. In Proceedings of the IEEE International Conference and Workshops onAutomatic Face and Gesture Recognition (FG'13).
[18]
Scott A. Golder and Michael W. Macy. 2011. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333, 6051, 1878--1881.
[19]
Thomas Hain, Lukas Burget, John Dines, Philip N. Garner, Frantisek Grezl, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiat, Mike Lincoln, and Vincent Wan. 2012. Transcribing meetings with the AMIDA systems. IEEE Transactions on Audio, Speech, and Language Processing, 20, 2, 486--498.
[20]
Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE 29, 6, 82--97.
[21]
Maurice George Kendall. 1975. Rank correlation methods. Griffin, London, UK.
[22]
Fazel Keshtkar and Diana Inkpen. 2009. Using sentiment orientation features for mood classification in blogs. In Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (NLPKE'09).
[23]
Mark Knapp and Judith Hall. 2008. Nonverbal Communication in Human Interaction. Wadsworth, Cengage Learning, Boston MA.
[24]
Gary G. Koch. 1982. Intraclass correlation coefficient. Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc.
[25]
Shiro Kumano, Kazuhiro Otsuka, Dan Mikami, Masafumi Matsuda, and Junji Yamato. 2012. Understanding communicative emotions from collective external observations. In Proceedings of Extended Abstracts, ACM Conference on Human Factors in Computing Systems (CHI'12). 2201--2206.
[26]
Chul Min Lee and Shrikanth S. Narayanan. 2005. Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 2, 293--303.
[27]
Gilly Leshed and Joseph Kaye. 2006. Understanding how bloggers feel: Recognizing affect in blog posts. In Proceedings of Extended Abstracts, ACM Conference on Human Factors in Computing Systems (CHI'06).
[28]
Gwen Littlewort, Jacob Whitehill, Tingfan Wu, Ian Fasel, Mark Frank, Javier Movellan, and Marian Bartlett. 2011. The computer expression recognition toolbox (CERT). In Proceedings of the IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[29]
Gwen C. Littlewort, Marian Stewart Bartlett, and Kang Lee. 2007. Faces of pain: Automated measurement of spontaneous facial expressions of genuine and posed pain. In Proceedings of the International Conference on Multimodal Interfaces (ICMI'07).
[30]
LIWC. 2007. LIWC Incorporation. Retrieved May 17, 2015 from http://www.liwc.net/index.php.
[31]
Richard Lowry. 1998. Concepts and applications of inferential statistics. Retrieved May 17, 2015 from http://vassarstats.net/textbook/.
[32]
Patrick Lucey, Jeffrey F. Cohn, Kenneth M. Prkachin, Patricia E. Solomon, Sien Chew, and Iain Matthews. 2012. Painful monitoring: Automatic pain monitoring using the UNBC-McMaster shoulder pain expression archive database. Image and Vision Computing 30, 3, 197--205.
[33]
François Mairesse, Marilyn A. Walker, Matthias R. Mehl, and Roger K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30, 457--501.
[34]
Daniel McDuff, Rana el Kaliouby, David Demirdjian, and Rosalind Picard. 2013. Predicting online media effectiveness based on smile responses gathered over the Internet. In Proceedings of the IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG'13).
[35]
Gary McKeown, Michel Franois Valstar, Roderick Cowie, and Maja Pantic. 2010. The SEMAINE corpus of emotionally coloured character interactions. In Proceedings of the ICME.
[36]
Gilad Mishne. 2005. Experiments with mood classification in blog posts. In Proceedings of SIGIR, Workshop on Stylistic Analysis of Text for Information Access.
[37]
Gilad Mishne and Maarten de Rijke. 2006. Capturing global mood levels using blog posts. In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs. 145--152.
[38]
Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. 2010. Pulse of the nation: US mood throughout the day inferred from Twitter. Retrieved May 17, 2015 from http://www.ccs.neu.edu/home/amislove/twittermood/.
[39]
Louis-Philippe Morency, Rada Mihalcea, and Payal Doshi. 2011. Towards multimodal sentiment analysis: Harvesting opinions from the web. In Proceedings of the International Conference on Multimodal Interfaces (ICMI).
[40]
Thin Nguyen, Dinh Phung, Brett Adams, Truyen Tran, and Svetha Venkatesh. 2010. Classification and pattern discovery of mood in weblogs. Advances in Knowledge Discovery and Data Mining, 283--290.
[41]
Mihalis A. Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2, 2, 92--105.
[42]
OMRON. 2007. OKAO Vision. http://www.omron.com/ecb/products/mobile/.
[43]
Oxford Dictionaries. 2014. Oxford Online Dictionary. Retrieved May 17, 2015 from http://oxforddictionaries.com/definition/english/mood.
[44]
Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2, 1--2, 1--135.
[45]
James Pennebaker, Martha E. Francis, and Roger J. Booth. 2001. Linguistic Inquiry and Word Count: LIWC2001. Erlbaum Publishers, Mahwah, NJ .
[46]
James Pennebaker and Laura King. 1999. Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology 77, 6, 1296--1312.
[47]
Dairazalia Sanchez-Cortes, Joan-Isaac Biel, Shiro Kumano, Junji Yamato, Kazuhiro Otsuka, and Daniel Gatica-Perez. 2013. Inferring mood in ubiquitous conversational video. In Proceedings of the Conference on Mobile and Ubiquitous Multimedia (MUM'13).
[48]
Klaus R. Scherer. 2003. Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 1, 227--256.
[49]
Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication 53, 9, 1062--1087.
[50]
Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic. 2012. Avec 2012: The continuous audio/visual emotion challenge. In Proceedings of the International Conference on Multimodal Interfaces (ICMI'12). ACM, New York, NY, 449--456.
[51]
Nicu Sebe, Ira Cohen, Theo Gevers, and Thomas S. Huang. 2006. Emotion recognition based on joint visual and audio cues. In Proceedings of International Conference on Pattern Recognition.
[52]
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of Conference on Empirical Methods in International Conference on Natural Language Processing. Association for Computational Linguistics.
[53]
Carlo Strapparava and Rada Mihalcea. 2007. Semeval-2007 task 14: Affective text. In Proceedings of the International Workshop on Semantic Evaluations. Association for Computational Linguistics, 70--74.
[54]
Carlo Strapparava and Rada Mihalcea. 2008. Learning to identify emotions in text. In Proceedings of the 2008 ACM Symposium on Applied Computing. ACM, New York, NY, 1556--1560.
[55]
Michel F. Valstar, Bihan Jiang, Marc Mehu, Maja Pantic, and Klaus Scherer. 2011. The first facial expression recognition and analysis challenge. In Proceedings of the IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[56]
Martin Wollmer, Felix Weninger, Tobias Knaup, Bjorn Schuller, Congkai Sun, Kenji Sagae, and Louis-Philippe Morency. 2013. YouTube movie reviews: In, cross, and open-domain sentiment analysis in an audiovisual context. Intelligent Systems, IEEE 28, 3, 46--53.
[57]
YouTube. 2014a. YouTube Channels. Retrieved May 17, 2015 from http://www.youtube.com/channels.
[58]
YouTube. 2014b. YouTube Statistics. Retrieved May 17, 2015 from http://www.youtube.com/yt/press/statistics.html.

Cited By

View all

Index Terms

  1. In the Mood for Vlog: Multimodal Inference in Conversational Social Video

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Interactive Intelligent Systems
    ACM Transactions on Interactive Intelligent Systems  Volume 5, Issue 2
    Special Issue on Behavior Understanding for Arts and Entertainment (Part 1 of 2)
    July 2015
    144 pages
    ISSN:2160-6455
    EISSN:2160-6463
    DOI:10.1145/2799389
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 June 2015
    Accepted: 01 March 2015
    Revised: 01 February 2015
    Received: 01 April 2014
    Published in TIIS Volume 5, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Social video
    2. mood
    3. nonverbal behavior
    4. verbal content
    5. video blogs
    6. vlogs

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NISHA project (NTT-Idiap Social Behavior Analysis Initiative)
    • SNSF UBImpressed project (Ubiquitous First Impressions and Ubiquitous Awareness)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media