skip to main content
10.1145/3657242.3658601acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinteraccionConference Proceedingsconference-collections
short-paper
Open access

An AI-Powered Computer Vision Module for Social Interactive Agents

Published: 19 June 2024 Publication History

Abstract

Social interactive agents play a crucial role in various domains, providing intelligent assistance in healthcare, entertainment, and education settings. Recent advancements in Artificial Intelligence (AI) have shown promising potential to enhance the autonomy of these agents. However, the lack of standardization in their development often results in the creation of complex functionalities that are challenging to transfer across different platforms. In this study, we introduce a general-purpose AI-powered computer vision module designed to address this challenge. Our module features a modular structure that enables easy scalability and integration into diverse environments. Currently supporting seven tasks, including face and person detection, facial recognition, facial expression recognition, facial landmarks estimation, age and gender estimation, and background subtraction, the module offers up to 21 computer vision methods. Additionally, we integrate explainability functionalities to enhance user trust in the system. Moving forward, we aim to expand the module by adding new tasks and methods to meet evolving needs. Our goal is to streamline the integration of AI capabilities into social interactive agents, simplifying their development and enhancing their utility across various applications.

References

[1]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10, 7 (07 2015), 1–46.
[2]
Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58 (2020), 82–115.
[3]
Francois Chollet. 2017. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4]
A Costa, E Martinez-Martin, M Cazorla, and V Julian. 2018. PHAROS-PHysical Assistant RObot System. Sensors (Basel) 18 (8 2018). Issue 8.
[5]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 886–893 vol. 1. https://doi.org/10.1109/CVPR.2005.177
[6]
Paul Ekman. 1992. An argument for basic emotions. Cognition and Emotion 6, 3-4 (1992), 169–200. https://doi.org/10.1080/02699939208411068
[7]
Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143–166. https://doi.org/10.1016/S0921-8890(02)00372-X Socially Interactive Robots.
[8]
Jia Guo, Jiankang Deng, Alexandros Lattas, and Stefanos Zafeiriou. 2021. Sample and Computation Redistribution for Efficient Face Detection. arxiv:2105.04714 [cs.CV]
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition.
[10]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. arxiv:1905.02244 [cs.CV]
[11]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arxiv:1704.04861 [cs.CV]
[12]
Manuel Jesús-Azabal, Javier Rojo, Enrique Moguel, Daniel Flores-Martin, Javier Berrocal, José García-Alonso, and Juan M. Murillo. 2020. Voice Assistant to Remind Pharmacologic Treatment in Elders. In Gerontechnology, José García-Alonso and César Fonseca (Eds.). Springer International Publishing, Cham, 123–133.
[13]
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. Ultralytics YOLO. https://github.com/ultralytics/ultralytics
[14]
Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1867–1874. https://doi.org/10.1109/CVPR.2014.241
[15]
Lachaux Killian, Maitre Julien, Bouchard Kevin, Lussier Maxime, Bottari Carolina, Couture Mélanie, Bier Nathalie, Giroux Sylvain, and Gaboury Sebastien. 2021. Fall Prevention and Detection in Smart Homes Using Monocular Cameras and an Interactive Social Robot. Proceedings of the Conference on Information Technology for Social Good, 7–12.
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.). Vol. 25. Curran Associates, Inc.
[17]
Maksim Kuprashevich and Irina Tolstykh. 2023. MiVOLO: Multi-input Transformer for Age and Gender Estimation. (2023). arXiv:arXiv:2307.04616
[18]
Wei Li, Min Li, Zhong Su, and Zhigang Zhu. 2015. A deep-learning approach to facial expression recognition with candid images. In 2015 14th IAPR International Conference on Machine Vision Applications (MVA). IEEE, 279–282.
[19]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.
[20]
Anastasia K. Ostrowski, Hae Won Park, and Cynthia Lynn Breazeal. 2020. Design Research in HRI: Roboticists, Design Features, and Users as Co-Designers.
[21]
Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models.
[22]
Andrés Prados-Torreblanca, José M Buenaposada, and Luis Baumela. 2022. Shape Preserving Facial Landmarks with Graph Attention Networks. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press.
[23]
Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar Zaiane, and Martin Jagersand. 2020. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognition 106, 107404.
[24]
Silvia Ramis, Jose M. Buades, Francisco J. Perales, and Cristina Manresa-Yee. 2022. A Novel Approach to Cross Dataset Studies in Facial Expression Recognition. Multimedia Tools Appl. 81, 27 (nov 2022), 39507–39544. https://doi.org/10.1007/s11042-022-13117-2
[25]
Silvia Ramis Guarinos, Cristina Manresa Yee, Jose Maria Buades Rubio, and Francesc Xavier Gaya-Morey. 2024. Explainable Facial Expression Recognition for People with Intellectual Disabilities. In Proceedings of the XXIII International Conference on Human Computer Interaction (Lleida, Spain) (Interaccion ’23). Association for Computing Machinery, New York, NY, USA, Article 5, 7 pages. https://doi.org/10.1145/3612783.3612789
[26]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144.
[27]
Hayley Robinson, Bruce MacDonald, Ngaire Kerse, and Elizabeth Broadbent. 2013. The Psychosocial Effects of a Companion Robot: A Randomized Controlled Trial. Journal of the American Medical Directors Association 14, 9 (2013), 661–667. https://doi.org/10.1016/j.jamda.2013.02.007
[28]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[29]
David Silvera-Tawil and Christine Roberts Yates. 2018. Socially-Assistive Robots to Enhance Learning for Secondary Students with Intellectual Disabilities and Autism. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). 838–843. https://doi.org/10.1109/ROMAN.2018.8525743
[30]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv:1409.1556 [cs.CV]
[31]
Inchul Song, Hyun-Jun Kim, and Paul Barom Jeon. 2014. Deep learning for real-time robust facial expression recognition on a smartphone. In 2014 IEEE International Conference on Consumer Electronics (ICCE). 564–567. https://doi.org/10.1109/ICCE.2014.6776135
[32]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. arxiv:1512.00567 [cs.CV]
[33]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105–6114.
[34]
Mingxing Tan and Quoc V. Le. 2021. EfficientNetV2: Smaller Models and Faster Training. arxiv:2104.00298 [cs.CV]
[35]
P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 1. I–I. https://doi.org/10.1109/CVPR.2001.990517
[36]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503. https://doi.org/10.1109/LSP.2016.2603342
[37]
Álvaro Sabater-Gárriz, F. Xavier Gaya-Morey, José María Buades-Rubio, Cristina Manresa Yee, Pedro Montoya, and Inmaculada Riquelme. 2024. Automated facial recognition system using deep learning for pain assessment in adults with cerebral palsy. arxiv:2401.12161 [cs.CV]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Interacción '24: Proceedings of the XXIV International Conference on Human Computer Interaction
June 2024
155 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Artificial Intelligence
  2. Computer Vision
  3. Human-Computer Interaction
  4. Social Interactive Agents

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • MCIN/AEI/10.13039/501100011033

Conference

INTERACCION 2024

Acceptance Rates

Overall Acceptance Rate 109 of 163 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 277
    Total Downloads
  • Downloads (Last 12 months)277
  • Downloads (Last 6 weeks)32
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media