No abstract available.
Proceeding Downloads
Mattpod: A Design Proposal for a Multi-Sensory Solo Dining Experience
The consumption of a meal is not just a bodily requirement but can also carry significant symbolic meaning. Solo dining is often contrasted to a shared eating experience and portrayed as an inferior way of eating a meal due to lacking essential social ...
Towards Automatic Prediction of Non-Expert Perceived Speech Fluency Ratings
Automatic speech fluency prediction has been mainly approached from the perspective of computer aided language learning, where the system tends to predict ratings similar to those of the human experts. Speech fluency prediction, however, can be ...
Endowing Spiking Neural Networks with Homeostatic Adaptivity for APS-DVS Bimodal Scenarios
Plastic changes with intrinsic dynamics in synaptic efficacy underlie the cellular level of expression of brain functions regarding multimodal information processing. Among diverse plasticity mechanisms, synaptic scaling exerts indispensable effects on ...
Training Computational Models of Group Processes without Groundtruth: the Self- vs External Assessment’s Dilemma
Supervised learning relies on the availability and reliability of the labels used to train computational models. In research areas such as Affective Computing and Social Signal Processing, such labels are usually extracted from multiple self- and/or ...
Speaker Motion Patterns during Self-repairs in Natural Dialogue
An important milestone for any agent in interaction with humans on a regular basis is to achieve natural and efficient methods of communication. Such strategies should be derived on the hallmarks of human-human interaction. So far, the work in embodied ...
Real-time Public Speaking Anxiety Prediction Model for Oral Presentations
Oral presentation skills are essential for most people’s academic and career development. However, due to public speaking anxiety, many people find oral presentations challenging and often avoid them to the detriment of their careers. Public speaking ...
To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion
Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) ...
Towards Integration of Embodiment Features for Prosodic Prominence Prediction from Text
Prosodic prominence prediction is an important task in the area of speech processing and especially forms an essential part of modern text-to-speech systems. Previous work has broadly focused on acoustic and linguistic features (such as syntactic and ...
Symbiosis: Design and Development of Novel Soft Robotic Structures for Interactive Public Spaces
- Aaron Chooi,
- Thileepan Stalin,
- Aby Raj Plamootil Mathai,
- Arturo Castillo Ugalde,
- Yixiao Wang,
- Elgar Kanhere,
- Gumawang Hiramandala,
- Deborah Loh,
- Pablo Valdivia Y Alvarado
High-rise concrete structures and crowded public spaces are familiar scenes in today’s fast-paced world, resulting in decreased restorative time for humans with nature. Therefore, interior designers and architects have strived to amalgamate nature-...
Impact of aesthetic movie highlights on semantics and emotions: a preliminary analysis
Aesthetic highlight detection is a challenge for understanding affective processes underlying emotional movie experience. Aesthetic highlights in movies are scenes with aesthetic values and attributes in terms of form and content. Deep understanding of ...
Understanding Interviewees’ Perceptions and Behaviour towards Verbally and Non-verbally Expressive Virtual Interviewing Agents
Recent technological advancements have boosted the usage of virtual interviewing platforms where the candidates interact with a virtual interviewing agent or an avatar that has human-like behavior instead of face-to-face interviews. As a result, it is ...
An Emotional Respiration Speech Dataset
Natural interaction with human-like embodied agents, such as social robots or virtual agents, relies on the generation of realistic non-verbal behaviours, including body language, gaze and facial expressions. Humans can read and interpret somatic ...
Automatic facial expressions, gaze direction and head movements generation of a virtual agent
In this article, we present two models to jointly and automatically generate the head, facial and gaze movements of a virtual agent from acoustic speech features. Two architectures are explored: a Generative Adversarial Network and an Adversarial ...
Can you tell that I’m confused? An overhearer study for German backchannels by an embodied agent
In spoken interaction, humans constantly display and interpret each others’ state of understanding. For an embodied agent, displaying its internal state of understanding in an efficient manner can be an important means for making a user-interaction ...
ReCell: replicating recurrent cell for auto-regressive pose generation
This paper describes FineMotion’s gesture generating system entry for the GENEA Challenge 2022. Our system is based on auto-regressive approach imitating recurrent cell. Combined with a special windowed auto-encoder and training approach this system ...
Predicting User Confidence in Video Recordings with Spatio-Temporal Multimodal Analytics
A critical component of effective communication is the ability to project confidence. In video presentations (e.g., video interviews), there are many factors that influence perceived confidence by a listener. Advances in computer vision, speech ...
How can Interaction Data be Contextualized with Mobile Sensing to Enhance Learning Engagement Assessment in Distance Learning?
Multimodal learning analytics can enrich interaction data with contextual information through mobile sensing. Information about, for example, the physical environment, movement, physiological signals, or smart wearable usage. Through the use of smart ...
Exploring the Benefits of Spatialized Multimodal Psychophysiological Insights for User Experience Research
Conducting psychophysiological investigations outside of lab settings has a lot of potential for academic applications as well as for industries concerned about the quality of their user and customer experience. Prior work employing in-the-wild ...
Predicting evaluations of entrepreneurial pitches based on multimodal nonverbal behavioral cues and self-reported characteristics
Acquiring funding for a startup venture often involves pitching a business idea to potential investors. Insight into the nonverbal behavioral cues that impact the investment decision making process can help entrepreneurs to improve their persuasion ...
Contextual modulation of affect: Comparing humans and deep neural networks
When inferring emotions, humans rely on a number of cues, including not only facial expressions, body posture, but also expressor-external, contextual information. The goal of the present study was to compare the impact of such contextual information ...
Improving Supervised Learning in Conversational Analysis through Reusing Preprocessing Data as Auxiliary Supervisors
Emotions recognition systems are trained using noisy human labels and often require heavy preprocessing during multi-modal feature extraction. Using noisy labels in single-task learning increases the risk of over-fitting. Auxiliary tasks could improve ...
Investigating Transformer Encoders and Fusion Strategies for Speech Emotion Recognition in Emergency Call Center Conversations.
There has been growing interest in using deep learning techniques to recognize emotions from speech. However, real-life emotion datasets collected in call centers are relatively rare and small, making the use of deep learning techniques quite ...
An Architecture Supporting Configurable Autonomous Multimodal Joint-Attention-Therapy for Various Robotic Systems
In this paper, we present a software-architecture for robot-assisted configurable and autonomous Joint-Attention-Training scenarios to support autism therapy. The focus of the work is the expandability of the architecture for the use of different ...
Exploring Facial Metric Normalization For Within- and Between-Subject Comparisons in a Multimodal Health Monitoring Agent
- Oliver Roesler,
- Hardik Kothare,
- William Burke,
- Michael Neumann,
- Jackson Liscombe,
- Andrew Cornish,
- Doug Habberstad,
- David Pautler,
- David Suendermann-Oeft,
- Vikram Ramanarayanan
The use of facial metrics obtained through remote web-based platforms has shown promising results for at-home assessment of facial function in multiple neurological and mental disorders. However, an important factor influencing the utility of the ...
Enabling Non-Technical Domain Experts to Create Robot-Assisted Therapeutic Scenarios via Visual Programming
In this paper, we present a visual programming software for enabling non-technical domain experts to create robot-assisted therapy scenarios for multiple robotic platforms. Our new approach is evaluated by comparing it with Choregraphe, the standard ...
Towards Multimodal Dialog-Based Speech & Facial Biomarkers of Schizophrenia
- Vanessa Richter,
- Michael Neumann,
- Hardik Kothare,
- Oliver Roesler,
- Jackson Liscombe,
- David Suendermann-Oeft,
- Sebastian Prokop,
- Anzalee Khan,
- Christian Yavorsky,
- Jean-Pierre Lindenmayer,
- Vikram Ramanarayanan
We present a scalable multimodal dialog platform for the remote digital assessment and monitoring of schizophrenia. Patients diagnosed with schizophrenia and healthy controls interacted with Tina, a virtual conversational agent, as she guided them ...
A Wavelet-based Approach for Multimodal Prediction of Alexithymia from Physiological Signals
- Valeria Filippou,
- Nikolas Theodosiou,
- Mihalis Nicolaou,
- Elena Constantinou,
- Georgia Panayiotou,
- Marios Theodorou
Alexithymia is a trait reflecting a person’s difficulty in identifying and expressing their emotions that has been linked to various forms of psychopathology. The identification of alexithymia might have therapeutic, preventive and diagnostic benefits. ...
Head Movement Patterns during Face-to-Face Conversations Vary with Age
- Denisa Qori McDonald,
- Casey J. Zampella,
- Evangelos Sariyanidi,
- Aashvi Manakiwala,
- Ellis DeJardin,
- John D. Herrington,
- Robert T. Schultz,
- Birkan Tunc
Advances in computational behavior analysis have the potential to increase our understanding of behavioral patterns and developmental trajectories in neurotypical individuals, as well as in individuals with mental health conditions marked by motor, ...
Predicting Backchannel Signaling in Child-Caregiver Multimodal Conversations
Conversation requires cooperative social interaction between interlocutors. In particular, active listening through backchannel signaling (hereafter BC) i.e., showing attention through verbal (short responses like “Yeah”) and non-verbal behaviors (e.g. ...
Approbation of the Child's Emotional Development Method (CEDM)
The paper presents a description of the methodological approach for assessing the formation of the emotional sphere of children aged 5-16 years with typical and atypical development. The purpose of the Child's Emotional Development Method (CEDM) is to ...
Index Terms
- Companion Publication of the 2022 International Conference on Multimodal Interaction
Recommendations
Multimodal Interaction During Multiparty Dialogues: Initial Results
ICMI '02: Proceedings of the 4th IEEE International Conference on Multimodal InterfacesGroups of people involved in collaboration on a task often incorporate the objects in their mutual environment into their discussion. With this comes physical reference to these 3-D objects, including: gesture, gaze, haptics, and possibly other ...