Analyzing free-standing conversational groups: A multimodal approach

X Alameda-Pineda, Y Yan, E Ricci, O Lanz… - Proceedings of the 23rd …, 2015 - dl.acm.org
Proceedings of the 23rd ACM international conference on Multimedia, 2015dl.acm.org
During natural social gatherings, humans tend to organize themselves in the so-called free-
standing conversational groups. In this context, robust head and body pose estimates can
facilitate the higher-level description of the ongoing interplay. Importantly, visual information
typically obtained with a distributed camera network might not suffice to achieve the
robustness sought. In this line of thought, recent advances in wearable sensing technology
open the door to multimodal and richer information flows. In this paper we propose to cast …
During natural social gatherings, humans tend to organize themselves in the so-called free-standing conversational groups. In this context, robust head and body pose estimates can facilitate the higher-level description of the ongoing interplay. Importantly, visual information typically obtained with a distributed camera network might not suffice to achieve the robustness sought. In this line of thought, recent advances in wearable sensing technology open the door to multimodal and richer information flows. In this paper we propose to cast the head and body pose estimation problem into a matrix completion task. We introduce a framework able to fuse multimodal data emanating from a combination of distributed and wearable sensors, taking into account the temporal consistency, the head/body coupling and the noise inherent to the scenario. We report results on the novel and challenging SALSA dataset, containing visual, auditory and infrared recordings of 18 people interacting in a regular indoor environment. We demonstrate the soundness of the proposed method and the usability for higher-level tasks such as the detection of F-formations and the discovery of social attention attractors.
ACM Digital Library