short-paper

Public Access

OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception

Authors:

Kalin Stefanov,

Mohammad SoleymaniAuthors Info & Claims

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

Pages 660 - 664

https://doi.org/10.1145/3382507.3418832

Published: 22 October 2020 Publication History

Abstract

Automatic multimodal acquisition and understanding of social signals is an essential building block for natural and effective human-machine collaboration and communication. This paper introduces OpenSense, a platform for real-time multimodal acquisition and recognition of social signals. OpenSense enables precisely synchronized and coordinated acquisition and processing of human behavioral signals. Powered by the Microsoft's Platform for Situated Intelligence, OpenSense supports a range of sensor devices and machine learning tools and encourages developers to add new components to the system through straightforward mechanisms for component integration. This platform also offers an intuitive graphical user interface to build application pipelines from existing components. OpenSense is freely available for academic research.

Supplementary Material

MP4 File (3382507.3418832.mp4)

Presentation video for the paper "OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception"

Download
8.60 MB

References

[1]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/

[2]

T. Baltrušaitis, M. Mahmoud, and P. Robinson. 2015. Cross-Dataset Learning and Person-Specific Normalisation for Automatic Action Unit Detection. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 1--6.

[3]

T. Baltrušaitis, P. Robinson, and L-P. Morency. 2013. Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild. In Proceedings of the IEEE International Conference on Computer Vision. 354--361.

Digital Library

[4]

T. Baltrušaitis, A. Zadeh, Y. C. Lim, and L-P. Morency. 2018. Openface 2.0: Facial Behavior Analysis Toolkit. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 59--66.

[5]

D. Bohus, S. Andrist, and M. Jalobeanu. 2017. Rapid Development of Multimodal Interactive Systems: A Demonstration of Platform for Situated Intelligence. In Proceedings of the ACM International Conference on Multimodal Interaction. 493-- 494.

[6]

Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).

[7]

Z. Cao, T. Simon, S-E. Wei, and Y. Sheikh. 2017. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[8]

François Chollet et al. 2015. Keras. https://keras.io.

[9]

F. Eyben, F. Weninger, F. Gross, and B. Schuller. 2013. Recent Developments in OpenSmile, the Munich Open-Source Multimedia Feature Extractor. In Proceedings of the ACM International Conference on Multimedia. 835--838.

[10]

F. Eyben, M. Wöllmer, and B. Schuller. 2010. OpenSmile: The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the ACM International Conference on Multimedia. 1459--1462.

[11]

The Apache Software Foundation. 2020. Apache ActiveMQ. http://activemq. apache.org/

[12]

Google. 2020. Google Cloud Platform. https://cloud.google.com

[13]

J. Haas. 2014. A History of the Unity Game Engine. (2014).

[14]

D. J. McDuff, K. Rowan, P. Choudhury, J. Wolk, T. Pham, and M. Czerwinski. 2019. A Multimodal Emotion Sensing Platform for Building Emotion-Aware Applications. CoRR abs/1903.12133 (2019).

[15]

Microsoft. 2020. Microsoft Azure. https://azure.microsoft.com

[16]

Nvidia. 2020. CUDA. https://developer.nvidia.com/cuda-zone

[17]

J. Shen and M. Pantic. 2009. A Software Framework for Multimodal HumanComputer Interaction Systems. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. 2038--2045.

[18]

J. Shen, W. Shi, and M. Pantic. 2011. HCI2 Workbench: A Development Tool for Multimodal Human-Computer Interaction Systems. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 766--773.

[19]

T. Simon, H. Joo, I. Matthews, and Y. Sheikh. 2017. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[20]

K. Stefanov. 2010. Webcam-based Eye Gaze Tracking Under Natural Head Movement. Master's thesis. University of Amsterdam.

[21]

G. Stratou and L-P. Morency. 2017. MultiSense-?Context-Aware Nonverbal Behavior Analysis Framework: A Psychological Distress Use Case. IEEE Transactions on Affective Computing 8, 2 (2017), 190--203.

[22]

J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. 2013. The Social Signal Interpretation (SSI) Framework: Multimodal Signal Processing and Recognition in Real-Time. In Proceedings of the ACM International Conference on Multimedia. 831--834.

[23]

S-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. 2016. Convolutional Pose Machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[24]

E. Wood, T. Baltrušaitis, X. Zhang, Y. Sugano, P. Robinson, and A. Bulling. 2015. Rendering of Eyes for Eye-Shape Registration and Gaze Estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3756--3764.

[25]

A. Zadeh, Y. C. Lim, T. Baltrušaitis, and L-P. Morency. 2017. Convolutional Experts Constrained Local Model for 3D Facial Landmark Detection. In Proceedings of the IEEE International Conference on Computer Vision. 2519--2528.

Cited By

Arjmand MNouraei FSteenstra IBickmore T(2024)Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational AgentsProceedings of the 24th ACM International Conference on Intelligent Virtual Agents10.1145/3652988.3673949(1-10)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3652988.3673949
Chang DYin YLi ZTran MSoleymani M(2024)LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00802(8190-8200)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00802
George TPark HLee U(2024)FT Xtraction: Feature extraction and visualization of conversational video data for social and emotional analysisSoftwareX10.1016/j.softx.2024.10182727(101827)Online publication date: Sep-2024
https://doi.org/10.1016/j.softx.2024.101827
Show More Cited By

Index Terms

OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception

Recommendations

Touchibo: Multimodal Texture-Changing Robotic Platform for Shared Human Experiences
UIST '22 Adjunct: Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

Touchibo is a modular robotic platform for enriching interpersonal communication in human-robot group activities, suitable for children with mixed visual abilities. Touchibo incorporates several modalities, including dynamic textures, scent, audio, and ...
A probabilistic multimodal approach for predicting listener backchannels

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important ...
Multimodal Interaction within Ambient Environments: An Exploratory Study
INTERACT '09: Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II

Inputs and outputs are not two independent phenomena in multimodal systems. This paper examines the relationship that exists between them. We present the results of a Wizard of Oz experiment which shows that output modalities used by the system have an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

October 2020

920 pages

ISBN:9781450375818

DOI:10.1145/3382507

General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Army Research Office

Conference

ICMI '20

Sponsor:

SIGCHI

ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 25 - 29, 2020

Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
861
Total Downloads

Downloads (Last 12 months)251
Downloads (Last 6 weeks)19

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Arjmand MNouraei FSteenstra IBickmore T(2024)Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational AgentsProceedings of the 24th ACM International Conference on Intelligent Virtual Agents10.1145/3652988.3673949(1-10)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3652988.3673949
Chang DYin YLi ZTran MSoleymani M(2024)LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00802(8190-8200)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00802
George TPark HLee U(2024)FT Xtraction: Feature extraction and visualization of conversational video data for social and emotional analysisSoftwareX10.1016/j.softx.2024.10182727(101827)Online publication date: Sep-2024
https://doi.org/10.1016/j.softx.2024.101827
Bian YKüster DLiu HKrumhuber E(2023)Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language ModelsSensors10.3390/s2401012624:1(126)Online publication date: 26-Dec-2023
https://doi.org/10.3390/s24010126
Andrist SBohus DLi ZSoleymani M(2023)Platform for Situated Intelligence and OpenSense: A Tutorial on Building Multimodal Interactive Applications for ResearchCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3617603(105-106)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3610661.3617603
Nargund AAponte ACaetano ASra M(2023)ModBand: Design of a Modular Headband for Multimodal Data Collection and InferenceAdjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586182.3616682(1-3)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586182.3616682
Gainer AAptaker AArtstein RCobbins DCore MGordon CLeuski ALi ZMerchant CNelson DSoleymani MTraum DLugrin BLatoschik Mvon Mammen SKopp SPécune FPelachaud C(2023)DIVISProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents10.1145/3570945.3607328(1-2)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1145/3570945.3607328
Woo DByeon GSong EYu S(2023)A Study on the Application of OpenPose for the Prevention of Collision of VR HMD Wearers2023 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC57457.2023.10049936(1-3)Online publication date: 5-Feb-2023
https://doi.org/10.1109/ICEIC57457.2023.10049936
Ragel RRey RPáez ÁPonce JNakamura KCaballero FMerino LGómez R(2023)Multi-modal Data Fusion for People Perception in the Social Robot HaruSocial Robotics10.1007/978-3-031-24667-8_16(174-187)Online publication date: 1-Feb-2023
https://doi.org/10.1007/978-3-031-24667-8_16
Hartholt AMozgai S(2022)Platforms and Tools for SIA Research and DevelopmentThe Handbook on Socially Interactive Agents10.1145/3563659.3563668(261-304)Online publication date: 27-Oct-2022
https://dl.acm.org/doi/10.1145/3563659.3563668
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents