skip to main content
chapter

Multimodal user state and trait recognition: an overview

Published: 01 October 2018 Publication History
First page of PDF

References

[1]
M. Abouelenien, M. Burzo, and R. Mihalcea. 2015. Cascaded multimodal analysis of alertness related features for drivers safety applications. In Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, p. 59. ACM. 134
[2]
E. Alpaydin. 2018. Classifying multimodal data. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch. 2. Morgan & Claypool Publishers, San Rafael, CA.
[3]
K. Altun and K. E. MacLean. 2015. Recognizing affect in human touch of a robot. Pattern Recognition Letters, vol. 66, pp. 31--40. 148
[4]
O. Amft and G. Tröster. 2006. Methods for detection and classification of normal swallowing from muscle activation and sound. In Pervasive Health Conference and Workshops, 2006, pp. 1--10. IEEE. 134
[5]
E. O. Andreeva, P. Aarabi, M. G. Philiastides, K. Mohajer, and M. Emami. 2004. Driver drowsiness detection using multimodal sensor fusion. In Defense and Security, pp. 380--390. International Society for Optics and Photonics. 134
[6]
P. Baggia, D. C. Burnett, J. Carter, D. A. Dahl, G. McCobb, and D. Raggett. 2009. EMMA: Extensible MultiModal Annotation Markup Language. W3C Recommendation. 140
[7]
J. N. Bailenson, N. Yee, S. Brave, D. Merget, and D. Koslow. 2007. Virtual interpersonal touch: expressing and recognizing emotions through haptic devices. Human-Computer Interaction, 22(3): 325--353. 148
[8]
T. Baltrusaitis, C. Ahuja, and L.-Ph. Morency. 2018. Multimodal machine learning. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 1. Morgan & Claypool Publishers San Rafael, CA.
[9]
C. Banea, R. Mihalcea, and J. Wiebe. 2011. Multilingual sentiment and subjectivity. In I. Zitouni and D. Bikel, editors, Multilingual Natural Language Processing. Prentice Hall.
[10]
A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, L. Kessous, and N. Amir. 2011. Whodunnit---Searching for the most important feature types signalling emotion-related user states in speech. Computer Speech & Language, 25(1): 4--28.e 146
[11]
L. Batrinca, B. Lepri, and F. Pianesi.2011. Multimodal recognition of personality during short self-presentations. In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, pp. 27--28. ACM. 134
[12]
L. Batrinca, B. Lepri, N. Mana, and F. Pianesi. 2012. Multimodal recognition of personality traits in human-computer collaborative tasks. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 39--46. ACM. 134
[13]
F. Benamara, C. Cesarano, A. Picariello, D. Reforgiato, and V.S. Subrahmanian. 2007. Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings International Conference on Weblogs and Social Media, pp. 1--7, Boulder, CO. 146
[14]
G. G. Berntson, J. T. Bigger, D. L. Eckberg, P. Grossman, P. G. Kaufmann, M. Malik, H. N. Nagaraja, S. W. Porges, J. P. Saul, P. H. Stone, and M. W. VanderMolen. 1997. Heart rate variability:origins, methods, and interpretive caveats. Psychophysiology, 34(6): 623--648. 137
[15]
K. C. Berridge. 2000. Reward learning: Reinforcement, incentives, and expectations. Psycholology of Learning Motiva, 40: 223--278. 142
[16]
H. Bořil, P. Boyraz, and J. H. L. Hansen. 2012. Towards multimodal driver's stress detection. In Digital Signal Processing for In-vehicle Systems and Safety, pp. 3--19. Springer. 134
[17]
H. Brugman and A. Russel. 2004. Annotating Multi-media / Multi-modal resources with ELAN. In Proceedings of LREC, pp. 2065--2068, Lisbon, Portugal. 149
[18]
F. Burkhardt, C. Pelachaud, B. Schuller, and E. Zovato. 2017. Emotion ML. In D. Dahl, editor, Multimodal Interaction with W3C Standards: Towards Natural User Interfaces to Everything, pp. 65--80. Springer, Berlin/Heidelberg. 140
[19]
M. Burzo, M. Abouelenien, V. Perez-Rosas, and R. Mihalcea. 2018. Multimodal deception detection. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 13. Morgan & Claypool Publishers San Rafael, CA.
[20]
E. Cambria, B. Schuller, Y. Xia, and C. Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems Magazine, 28(2): 15--21. 146
[21]
H. E. Çetingül, E. Erzin, Y. Yemez, and A. M. Tekalp. 2006. Multimodal speaker/speech recognition using lip motion, lip texture and audio. Signal Processing, 86(12): 3549--3558. 134
[22]
G. Chanel, J. Kronegg, D. Grandjean, and T. Pun. 2006. Emotion assessment: Arousal evaluation using eeg's and peripheral physiological signals. In LNCS vol. 4105, pp. 530--537. 148
[23]
G. Chanel, K. Ansari-Asl, and T. Pun. 2007. Valence-arousal evaluation using physiological signals in an emotion recall paradigm. In Proceedings of SMC, pp. 2662--2667, Montreal, QC. IEEE. 148
[24]
G. Chanel, J. J. M. Kierkels, M. Soleymani, and T. Pun. 2009. Short-term emotion assessment in a recall paradigm. International Journal of Human-Computer Studies, 67(8): 607--627. 137, 148
[25]
J. F. Cohn, T. S. Kruez, I. Matthews, Y. Yang, M. H. Nguyen, M. T. Padilla, F. Zhou, and F. De La Torre. 2009. Detecting depression from facial actions and vocal prosody. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, pp. 1--7. IEEE. 134
[26]
J. F. Cohn, N. Cummins, J. Epps, R. Goecke, J. Joshi, and S. Scherer. 2018. Multimodal assessment of depression and related disorders based on behavioural signals. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 12. Morgan & Claypool Publishers San Rafael, CA.
[27]
R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schroöder. 2000. Feeltrace: An instrument for recording perceived emotion in real time. In Proceedings of ISCA Workshop on Speech and Emotion, pp. 19--24, Newcastle, UK.
[28]
R. Cowie, G. McKeown, and E. Douglas-Cowie. 2012. Tracing emotion: an overview. International Journal of Synthetic Emotions, 3(1): 1--17. 149
[29]
N. Dael, M. Mortillaro, and K. R. Scherer. 2012. The body action and posture coding system (bap): Development and reliability. Journal of Nonverbal Behavior, 36(2): 97--121. 146
[30]
D. Davidov, O. Tsur, and A. Rappoport. 2010. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In Proceedings of CoNNL, pp. 107--116, Uppsala, Sweden.
[31]
R. J. Davidson and N. A. Fox. 1982. Asymmetrical brain activity discriminates between positive and negative affective stimuli in human infants. Science, 218: 1235--1237. 148
[32]
J. Deng and B. Schuller. 2012. Confidence measures in speech emotion recognition based on semi-supervised learning. In Proceedings of INTERSPEECH, Portland, OR. ISCA. 140
[33]
J. Deng, Z. Zhang, F. Eyben, and B. Schuller. 2014. Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters, 21(9): 1068--1072. 143
[34]
S. K. D'Mello, N. Bosch, and H. Chen. 2018. Multimodal-multisensor affect detection. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 6. Morgan & Claypool Publishers San Rafael, CA.
[35]
L. El Asri, R. Laroche, and O. Pietquin. 2012. Reward function learning for dialogue management. In Proceedings Sixth Starting AI Researchers' Symposium - STAIRS, pp. 95--106. 142
[36]
F. Eyben, M. Wöllmer, and B. Schuller. 2010. openSMILE - The Munich versatile and fast open-source audio feature extractor. In Proceedings of MM, pp. 1459--1462, Florence, Italy. ACM. 150
[37]
F. Eyben, M. Wöllmer, M. Valstar, H. Gunes, B. Schuller, and M. Pantic. 2011. String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In Proceedings of FG, pp. 322--329, Santa Barbara, CA. IEEE. 146
[38]
F. Eyben, F. Weninger, L. Paletta, and B. Schuller. 2013. The acoustics of eye contact---Detecting visual attention from conversational audio cues. In Proceedings 6th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction (GAZEIN 2013), held in conjunction with ICMI 2013, pp. 7--12, Sydney, Australia. ACM. 149
[39]
M. Farrús, P. Ejarque, A. Temko, and J. Hernando. 2007. Histogram equalization in svm multimodal person verification. In Advances in Biometrics, pp. 819--827. Springer. 134
[40]
S. M. Feraru, D. Schuller, and B. Schuller. 2015. Cross-language acoustic emotion recognition: an overview and some tendencies. In Proceedings of ACII, pp. 125--131, Xi'an, P.R. China. IEEE. 152
[41]
Y. Gao, N. Bianchi-Berthouze, and H. Meng. 2012. What does touch tell us about emotions in touchscreen-based gameplay? ACM Transactions on Computer-Human Interaction, 19(4/31). 148
[42]
J. T. Geiger, M. Kneissl, B. Schuller, and G. Rigoll. 2014. Acoustic Gait-based Person Identification using Hidden Markov Models. In Proceedings of the Personality Mapping Challenge & Workshop (MAPTRAITS 2014), Satellite of ICMI), pp. 25--30, Istanbul, Turkey. ACM. 145
[43]
C. Georgakis, S. Petridis, and M. Pantic. 2014. Discriminating native from non-native speech using fusion of visual cues. In Proceedings of the ACM International Conference on Multimedia, pp. 1177--1180. ACM. 134
[44]
D. Glowinski, N. Dael, A. Camurri, G. Volpe, M. Mortillaro, and K. Scherer. 2011. Towards a minimal representation of affective gestures. IEEE Transactions on Affective Computing, 2(2): 106--118. 150
[45]
H. Gunes and M. Pantic. 2010. Automatic, dimensional and continuous emotion recognition. International Journal of Synthetic Emototions, 1(1): 68--99. 137
[46]
H. Gunes and B. Schuller. 2013. Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image and Vision Compututing Journal Special Issue, 31(2): 120--136. 144
[47]
H. Gunes, B. Schuller, O. Celiktutan, E. Sariyanidi, and F. Eyben, editors. 2014. Proceedings of the Personality Mapping Challenge & Workshop (MAPTRAITS 2014), Istanbul, Turkey. ACM. Satellite of the 16th ACM International Conference on Multimodal Interaction (ICMI). 133, 134, 150
[48]
A. Haag, S. Goronzy, P. Schaich, and J. Williams. 2004. Emotion recognition using bio-sensors: First steps towards an automatic system. In LNCS 3068, pp. 36--48. 148
[49]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The weka data mining software: an update. ACM SIGKDD Explorations Newsletters, 11(1): 10--18. 150
[50]
S. Hantke, T. Appel, F. Eyben, and B. Schuller. 2015. iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing. In Proceedings of the 1st International Workshop on Automatic Sentiment Analysis in the Wild (WASA 2015) held in conjunction with ACII, pp. 891--897, Xi'an, P.R. China. IEEE. 149
[51]
M. Hofmann, J. Geiger, S. Bachmann, B. Schuller, and G. Rigoll. 2013. The TUM Gait from Audio, Image and Depth (GAID) Database: Multimodal Recognition of Subjects and Traits. Journal of Visual Communication and Image Representation Special Issue on Visual Understanding Application with RGB-D Cameras, 25(1): 195--206. 134, 146
[52]
G. Huang and Y. Wang. 2007. Gender classification based on fusion of multi-view gait sequences. In Computer Vision-ACCV 2007, pp. 462--471. Springer. 134
[53]
R. Jenke, A. Peer, and M. Buss. 2014. Feature extraction and selection for emotion recognition from eeg. IEEE Transactions on Affective Computing, 5(3): 327--339. 148
[54]
G. Keren, A. E.-D. Mousa, O. Pietquin, S. Zafeiriou, and B. Schuller. 2018. Deep learning for multisensorial and multimodal interaction. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 4. Morgan & Claypool Publishers San Rafael, CA.
[55]
M. M. Khan, R. D. Ward, and M. Ingleby. 2006. Infrared thermal sensing of positive and negative affective states. In Proceedings of the International Conference on Robotics, Automation and Mechatronics, pp. 1--6. IEEE. 147
[56]
M. Kipp. 2001. Anvil - a generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology, pp. 1367--1370. 149
[57]
A. Kleinsmith and N. Bianchi-Berthouze. 2007. Recognizing affective dimensions from body posture. In Proceedings of ACII, pp. 48--58, Lisbon, Portugal. 147
[58]
A. Kleinsmith, P. R. De Silva, and N. Bianchi-Berthouze. 2005. Recognizing emotion from postures: Cross-cultural differences in user modeling. In Proceedings of the Conference on User Modeling, pp. 50--59, Edinburgh, UK. 147
[59]
T. Ko. 2005. Multimodal biometric identification for large user population using fingerprint, face and iris recognition. In Applied Imagery and Pattern Recognition Workshop, 2005. Proceedings 34th, p. 6. IEEE. 134
[60]
A. Kreilinger, H. Hiebel, and G. Muller-Putz. 2015. Single versus multiple events error potential detection in a BCI-controlled car game with continuous and discrete feedback. IEEE Transactions on Biomedical Engineering, (3): 519--29. 149
[61]
M. Kusserow, O. Amft, and G. Troster. 2009. Bodyant: Miniature wireless sensors for naturalistic monitoring of daily activity. In Proceedings of the International Conference on Body Area Networks, pp. 1--8, Sydney, Australia. 148
[62]
M. Li, V. Rozgić, G. Thatte, S. Lee, A. Emken, M. Annavaram, U. Mitra, D. Spruijt-Metz, and S. Narayanan. 2010a. Multimodal physical activity recognition by fusing temporal and cepstral information. IEEE Transactions on Neural Systems Rehabilitation Engineering, 18(4): 369--380. 134
[63]
X. Li, X. Zhao, Y. Fu, and Y. Liu. 2010b. Bimodal gender recognition from face and fingerprint. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2590--2597. IEEE. 134
[64]
G. Littlewort, J. Whitehill, T. Wu, I. R. Fasel, M. G. Frank, J. R. Movellan, and M. S. Bartlett. 2011. The computer expression recognition toolbox (cert). In Proceedings of FG, pp. 298--305, Santa Barbara, CA. IEEE. 150
[65]
C. Liu, P. Rani, and N. Sarkar. 2005. An empirical study of machine learning techniques for affect recognition in human-robot interaction. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2662--2667. 148
[66]
X. Lu, H. Chen, and A. K. Jain. 2005. Multimodal facial gender and ethnicity identification. In Advances in Biometrics, pp. 554--561. Springer. 134
[67]
K. Matsumoto and F. Ren. 2011. Estimation of word emotions based on part of speech and positional information. Computters in Human Behavior, 27(5): 1553--1564. 146
[68]
F. Matta, U. Saeed, C. Mallauran, and J.-L. Dugelay. 2008. Facial gender recognition using multiple sources of visual information. In Multimedia Signal Processing, 2008 IEEE 10th Workshop on, pp. 785--790. IEEE. 134
[69]
U. Maurer, A. Smailagic, D. P. Siewiorek, and M. Deisher. 2006. Activity recognition and monitoring using multiple sensors on different body positions. In Wearable and Implantable Body Sensor Networks, 2006. BSN 2006. International Workshop on, pp. 4--pp. IEEE. 134
[70]
I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang. 2005. Automatic analysis of multimodal group actions in meetings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3): 305--317. 134
[71]
G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroöder. 2012. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1): 5--17. 150
[72]
H. K. Meeren, C. C. Van Heijnsbergen, and B. De Gelder. 2005. Rapid perceptual integration of facial expression and emotional body language. Proceedings of the National Academy of Sciences of the USA, 102: 16518--16523. 145
[73]
W. A. Melder, K. P. Truong, M. D. Uyl, D. A. Van Leeuwen, M. A. Neerincx, L. R. Loos, and B. Plum. 2007. Affective multimodal mirror: sensing and eliciting laughter. In Proceed. International Workshop on Human-centered Multimedia, pp. 31--40. ACM. 134
[74]
A. Metallinou, A. Katsamanis, Y. Wang, and S. Narayanan. 2011. Tracking changes in continuous emotion states using body language and prosodic cues. In Proceedings of ICASSP, pp. 2288--2291, Prague, Czech Republic. IEEE. 147
[75]
F. Metze, A. Batliner, F. Eyben, T. Polzehl, B. Schuller, and S. Steidl. 2010. Emotion recognition using imperfect speech recognition. In Proceedings INTERSPEECH, pp. 478--481, Makuhari, Japan. ISCA. 146
[76]
T. L. Nwe, H. Sun, N. Ma, and H. Li. 2010. Speaker diarization in meeting audio for single distant microphone. In Proceedings of INTERSPEECH, pp. 1505--1508, Makuhari, Japan. ISCA. 152
[77]
S. Oviatt, J. F. Grafsgaard, L. Chen, and X. Ochoa. 2018. Multimodal learning analytics: Assessing learners' mental state during the process of learning. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 11. Morgan & Claypool Publishers San Rafael, CA.
[78]
Y. Panagakis, O. Rudovic, and M. Pantic. 2018. Learning for multi-modal and context-sensitive interfaces. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 3. Morgan & Claypool Publishers San Rafael, CA.
[79]
M. Pantic and M.S. Bartlett. 2007. Machine analysis of facial expressions. In K. Delac and M. Grgic, editors, Face Recognition, pp. 377--416. I-Tech Education and Publishing, Vienna, Austria. 146, 147
[80]
F. Pianesi, N. Mana, A. Cappelletti, B. Lepri, and M. Zancanaro. 2008. Multimodal recognition of personality traits in social interactions. In Proceedings of the 10th International Conference on Multimodal Interfaces, pp. 53--60. ACM. 134
[81]
R. W. Picard, E. Vyzas, and J. Healey. 2001. Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis Machine Intelligence, 23(10): 1175--1191. 148
[82]
F. Pokorny, F. Graf, F. Pernkopf, and B. Schuller. 2015. Detection of negative emotions in speech signals using bags-of-audio-words. In Proceedings of the 1st International Workshop on Automatic Sentiment Analysis in the Wild (WASA 2015) held in conjunction with ACII, pp. 879--884, Xi'an, P.R. China. IEEE. 137, 142
[83]
T. Polzehl, A. Schmitt, and F. Metze. 2010. Approaching multi-lingual emotion recognition from speech---on language dependency of acoustic/prosodic features for anger detection. In Proceedings of Speech Prosody. ISCA. 152
[84]
R. Poppe. 2007. Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1-2): 4--18. 147
[85]
R. Poppe. 20101. A survey on vision-based human action recognition. Image and Vision Computing, 28(6): 976--990. 147
[86]
T. Pun, T. I. Alecu, G. Chanel, J. Kronegg, and S. Voloshynovskiy. 2006. Brain-computer interaction research at the computer vision and multimedia laboratory, University of Geneva. IEEE Transactions on Neural Systems Rehabilitation Engineering, 14(2): 210--213. 148
[87]
T. Pursche, J. Krajewski, and R. Moeller. 2012. Video-based heart rate measurement from human faces. In Consumer Electronics (ICCE), 2012 IEEE International Conference on, pp. 544--545. IEEE. 149
[88]
F. Putze, J.-P. Jarvis, and T. Schultz. 2010. Multimodal recognition of cognitive workload for multitasking in the car. In Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 3748--3751. IEEE. 134
[89]
T. Qin, J. K. Burgoon, J. P. Blair, and J. F. Nunamaker Jr. 2005. Modality effects in deception detection and applications in automatic-deception-detection. In System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on, pp. 23b-23b. IEEE. 134
[90]
F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne. 2013. Introducing the recola multimodal corpus of remote collaborative and affective interactions. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pp. 1--8. IEEE. 150
[91]
F. Ringeval, E. Marchi, M. Méhu, K. Scherer, and B. Schuller. 2015. Face reading from speech---predicting facial action units from audio cues. In Proceedings INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, pp. 1977--1981, Dresden, Germany. ISCA. 149
[92]
H. Sagha, J. Deng, M. Gavryukova, J. Han, and B. Schuller. 2016. Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. In Proceedings 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2016, Shanghai, P.R. China. IEEE. 152
[93]
L. Salahuddin, J. Cho, M. G. Jeong, and D. Kim. 2007. Ultra short term analysis of heart rate variability for monitoring mental stress in mobile settings. In Proceedings of the IEEE International Conference of Engineering in Medicine and Biology Society, pp. 39--48. 137
[94]
D. Sanchez-Cortes, O. Aran, D. B. Jayagopi, M. Mast, and D. Gatica-Perez. 2013. Emergent leaders through looking and speaking: from audio-visual data to multimodal recognition. Journal of Multimodal User Interface, 7(1-2): 39--53. 134
[95]
M. E. Sargin, E. Erzin, Y. Yemez, and A. M. Tekalp. 2006. Multimodal speaker identification using canonical correlation analysis. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, vol. 1, pp. I--I. IEEE. 134
[96]
D. A. Sauter, F. Eisner, P. Ekman, and S. K. Scott. 2010. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences of the U.S.A., 107(6): 2408--2412. 151
[97]
K. R. Scherer and T. Brosch. 2009. Culture-specific appraisal biases contribute to emotion dispositions. European Journal of Personality, 23: 265--288. 151, 152
[98]
K. R. Scherer, R. Banse, and H. G. Wallbott. 2001. Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1): 76--92. 151
[99]
M. Schröder, H. Pirker, and M. Lamolle. 2006. First suggestions for an emotion annotation and representation language. In Proceedings LREC, vol. 6, pp. 88--92, Genoa, Italy. ELRA. 140
[100]
M. Schröder, E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. ter Maat, G. McKeown, S. Pammi, M. Pantic, C. Pelachaud, B. Schuller, E. de Sevin, M. Valstar, and M. Wöllmer. 2012. Building autonomous sensitive artificial listeners. IEEE Transactions on Affectective Computing, pp. 1--20. 152
[101]
B. Schuller. 2013. Intelligent Audio Analysis. Signals and Communication Technology. Springer. 135, 137
[102]
B. Schuller and A. Batliner. 2013. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley. 132, 141
[103]
B. Schuller, Manfred Lang, and G. Rigoll. 2002. Multimodal emotion recognition in audiovisual communication. In Proceedings of ICME, vol. 1, pp. 745--748, Lausanne, Switzerland. IEEE.
[104]
B. Schuller, R. Müller, F. Eyben, J. Gast, B. Hörnler, M. Wöllmer, G. Rigoll, A. Höthker, and H. Konosu. 2009. Being Bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing Journal, 27(12): 1760--1774. 134, 146
[105]
B. Schuller, A. Batliner, S. Steidl, and D. Seppi. 2011a. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communications, 53(9/10): 1062--1087. 151
[106]
B. Schuller, M. Valstar, R. Cowie, and M. Pantic. 2011b. Avec 2011---the first audio/visual emotion challenge and workshop - an introduction. In Proceedings of the 1st International Audio/Visual Emotion Challenge and Workshop, pp. 415--424, Memphis, TN. 133, 134
[107]
B. Schuller, F. Friedmann, and F. Eyben. 2013. Automatic recognition of physiological parameters in the human voice: heart rate and skin conductance. In Proceedings 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, pp. 7219--7223, Vancouver, Canada. IEEE. 149
[108]
B. Schuller, A. El-Desoky Mousa, and V. Vasileios. 2015a. Sentiment analysis and opinion mining: on optimal parameters and performances. WIREs Data Mining and Knowledge Discovery, 5: 255--263. 146
[109]
B. Schuller, S. Steidl, A. Batliner, S. Hantke, F. Hünig, J.R. Orozco-Arroyave, E. Nöth, Y. Zhang, and F. Weninger. 2015b. The INTERSPEECH 2015 computational paralinguistics challenge: degree of nativeness, Parkinson's & eating condition. In Proceedings INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, pp. 478--482, Dresden, Germany. ISCA. 150
[110]
B. Settles, M. Craven, and S. Ray. 2008. Multiple-instance active learning. In Proceedings of NIPS, pp. 1289--1296, Vancouver, BC, Canada. 142
[111]
C. Shan, S. Gong, and P. W. McOwan. 2007. Learning gender from human gaits and faces. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on, pp. 505--510. IEEE. 134
[112]
C. Shan, S. Gong, and P. W. McOwan. 2008. Fusing gait and face cues for human gender recognition. Neurocomputing, 71(10):1931--1938. 134
[113]
N. Sharma and T. Gedeon. 2012. Objective measures, sensors and computational techniques for stress recognition and classification: A survey. Computer Methods and Programs in Biomedicine, 108(3): 1287--1301. 134
[114]
R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. 2013. Zero-shot learning through cross-modal transfer. In NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 1, pp. 935--943. 143
[115]
C. Strapparava and R. Mihalcea. 2010. Annotating and identifying emotions in text. In G. Armano, M. de Gemmis, G. Semeraro, and E. Vargiu, editors, Intelligent Information Access, Studies in Computational Intelligence, vol. 301, pp. 21--38. Springer Berlin/Heidelberg. ISBN 978-3-642-13999-4. 146
[116]
A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, and B. Schuller. 2011. Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In Proceedings of ICASSP, pp. 5688--5691, Prague, Czech Republic. IEEE. 142
[117]
V. S. Subrahmanian and D. Reforgiato. 2008. AVA: adjective-verb-adverb combinations for sentiment analysis. Intelligent Systems, 23(4):43--50. 146
[118]
R. S. Sutton and A. G. Barto. 1998. Reinforcement Learning: An Introduction, volume 1. MIT Press Cambridge. 141
[119]
G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. Schuller. 2014. A deep semi-NMF model for learning hidden representations. In Proceedings of ICML, vol. 32, pp. 1692--1700, Beijing, China. IMLS. 142
[120]
G. Trigeorgis, M.Ã. Nicolaou, S. Zafeiriou, and B. Schuller. 2015. Towards deep alignment of multimodal data. In Proceedings 2015 Multimodal Machine Learning Workshop held in conjunction with NIPS 2015 (MMML@NIPS), Montréal, QC. NIPS. 142
[121]
G. Trigeorgis, F. Ringeval, R. Bruückner, E. Marchi, M. Nicolaou, B. Schuller, and S. Zafeiriou. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2016, Shanghai, P.R. China. IEEE. 137
[122]
P. Tsiamyrtzis, J. Dowdall, D. Shastri, I. T. Pavlidis, M. G. Frank, and P. Ekman. 2007. Imaging facial physiology for the detection of deceit. Intelligent Journal of Computer Vision, 71(2): 197--214. 147, 148
[123]
J. Van den Stock, R. Righart, and B. De Gelder. 2007. Body expressions influence recognition of emotions in the face and voice. Emotion, 7(3):487--494. 145
[124]
S. van Wingerden, T. J. Uebbing, M. M. Jung, and M. Poel. 2014. A neural network based approach to social touch classification. In Proceedings of the 2nd International Workshop on Emotion representations and modelling in Human-Computer Interaction systems, ERM4HCI, pp. 7--12, Istanbul, Turkey. ACM. 148
[125]
A. Vinciarelli and A. Esposito. 2018. Multimodal analysis of social signals. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 7. Morgan & Claypool Publishers San Rafael, CA.
[126]
T. Vogt, E. André, and N. Bee. 2008. Emovoice - a framework for online recognition of emotions from voice. In Proceedings of IEEE PIT, volume 5078 of LNCS, pp. 188--199. Springer, Kloster Irsee. 150
[127]
J. Wagner and E. André. 2018. Real-time sensing of affect and social signals in a multimodal framework: a practical approach. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 8. Morgan & Claypool Publishers San Rafael, CA.
[128]
J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. 2013. The social signal interpretation (ssi) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM International Conference on Multimedia, pp. 831--834. ACM. 149
[129]
F. Weninger and B. Schuller. 2012. Optimization and parallelization of monaural source separation algorithms in the openBliSSART toolkit. Journal of Signal Processing Systems, 69(3): 267--277. 150
[130]
F. Weninger, J. Bergmann, and B. Schuller. 2015. Introducing CURRENNT: the Munich open-source CUDA RecurREnt neural network toolkit. Journal of Machine Learning Research, 16: 547--551. 150
[131]
M. Wöollmer, A. Metallinou, F. Eyben, B. Schuller, and S. Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In Proceedings of INTERSPEECH, pp. 2362--2365, Makuhari, Japan. ISCA. 147
[132]
M. Wöllmer, C. Blaschke, T. Schindl, B. Schuller, B. Färber, S. Mayer, and B. Trefflich. 2011. On-line driver distraction detection using long short-term memory. IEEE Transactions on Intelligent Transportation Systems, 12(2): 574--582. 134
[133]
M. Woöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. P. Morency. 2013. YouTube movie reviews: Sentiment analysis in an audiovisual context. IEEE Intelligent Systems, 28(2): 2--8. 134
[134]
Y. Yoshitomi, S. I. Kim, T. Kawano, and T. Kitazoe. 2000. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In IEEE International Workshop on Robot and Human Interactive Communication, pp. 178--183. 147
[135]
Y. Zhang, E. Coutinho, Z. Zhang, M. Adam, and B. Schuller. 2015a. Introducing rater reliability and correlation based dynamic active learning. In Proceedings of ACII, pp. 70--76, Xi'an, P.R. China. IEEE. 142
[136]
Z. Zhang, E. Coutinho, J. Deng, and B. Schuller. 2015b. Cooperative Learning and its Application to Emotion Recognition from Speech. IEEE ACM Transactions on Audio, Speech and Language Processing, 23(1): 115--126. 142
[137]
Y. Zhang, Y. Zhou, J. Shen, and B. Schuller. 2016a. Semi-autonomous data enrichment based on cross-task labelling of missing targets for holistic speech analysis. In Proceedings 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2016, Shanghai, P.R. China. IEEE. 141
[138]
Z. Zhang, F. Ringeval, B. Dong, E. Coutinho, E. Marchi, and B. Schuller. 2016b. Enhanced semi-supervised learning for multimodal emotion recognition. In Proceedings 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2016, Shanghai, P.R. China. IEEE. 141
[139]
J. Zhou, K. Yu, F. Chen, Y. Wang, and S. Z. Arshad. 2018. Multimodal behavioural and physiological signals as indicators of cognitive load. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krueger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition, Ch 10. Morgan & Claypool Publishers San Rafael, CA.

Cited By

View all
  1. Multimodal user state and trait recognition: an overview

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Books
    The Handbook of Multimodal-Multisensor Interfaces: Signal Processing, Architectures, and Detection of Emotion and Cognition - Volume 2
    October 2018
    2034 pages
    ISBN:9781970001716
    DOI:10.1145/3107990

    Publisher

    Association for Computing Machinery and Morgan & Claypool

    Publication History

    Published: 01 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Chapter

    Appears in

    ACM Books

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media