skip to main content
10.1145/3678957.3685719acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Decoding Contact: Automatic Estimation of Contact Signatures in Parent-Infant Free Play Interactions

Published: 04 November 2024 Publication History

Abstract

In parent-child interactions (PCIs), there is frequent physical contact between the two actors. Quantifying this contact provides valuable input to assess the nature of the interaction or the relation between parent and child. Here, we explore the application of vision-based techniques to automatically detect contact signatures at each frame of video recordings of playful parent-infant interactions. We employ two separate models: (i) a multimodal convolutional neural network (CNN) that integrates 2D pose and body part information, and (ii) a unimodal graph convolutional neural network (GCN) that utilizes only 2D pose. We showcase the potential and limitations of automatic contact signature estimation through quantitative and qualitative assessments using a parent-infant free play interaction dataset consisting of 100 parent-child dyadic interactions, covering 20 hours. Additionally, our experiments provide insights into various design choices through systematic experimentation. By releasing our annotations and code, we aim to enable further research in the automatic contact signature estimation during free play interactions between parents and infants.

Supplemental Material

PDF File
Appendix

References

[1]
Mary D Salter Ainsworth, Mary C Blehar, Everett Waters, and Sally N Wall. 2015. Patterns of attachment: A psychological study of the strange situation. Psychology press, New York, NY.
[2]
John Bowlby. 1982. Attachment and loss: retrospect and prospect.American journal of Orthopsychiatry 52, 4 (1982), 664.
[3]
Alicja Brzozowska, Matthew R. Longo, Denis Mareschal, Frank Wiesemann, and Teodora Gliga. 2021. Capturing touch in parent–infant interaction: A comparison of methods. Infancy 26, 3 (2021), 494–514.
[4]
Zhe Cao, Gines Hidalgo Martinez, Tomas Simon, Shih-En Wei, and Yaser A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 172–186. Issue 1.
[5]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). Springer, Munich, Germany, 801–818.
[6]
Qingshuang Chen, Rana Abu-Zhaya, Amanda Seidl, and Fengqing Zhu. 2019. CNN Based Touch Interaction Detection for Infant Speech Development. In IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, San Jose, CA, 20–25.
[7]
Qingshuang Chen, He Li, Rana Abu-Zhaya, Amanda Seidl, Fengqing Zhu, and Edward J Delp. 2016. Touch event recognition for human interaction. Electronic Imaging 2016, 11 (2016), 1–6.
[8]
Jia. Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09. IEEE, Miami, FL, USA, 248–255.
[9]
Metehan Doyran, Ronald Poppe, and Albert Ali Salah. 2023. Embracing Contact: Detecting Parent-Infant Interactions. In Proceedings of the 25th International Conference on Multimodal Interaction(ICMI ’23). Association for Computing Machinery, New York, NY, USA, 198–206. https://doi.org/10.1145/3577190.3614147
[10]
Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE international conference on computer vision. IEEE, Venice, Italy, 2334–2343.
[11]
Mihai Fieraru, Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Vlad Olaru, and Cristian Sminchisescu. 2020. Three-Dimensional Reconstruction of Human Interactions. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF, Seattle, WA, USA, 2596–2605.
[12]
Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. 2021. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
[13]
Tsfira Grebelsky‐Lichtman. 2014. Children’s Verbal and Nonverbal Congruent and Incongruent Communication During Parent–Child Interactions. Human Communication Research 40 (07 2014). https://doi.org/10.1111/hcre.12035
[14]
Wen Guo. 2020. Multi-person pose estimation in complex physical interactions. In Proceedings of the 28th ACM International Conference on Multimedia. IEEE/CVF, Seattle, WA, USA, 4752–4755.
[15]
Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, and Francesc Moreno-Noguer. 2022. Multi-person extreme motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13053–13064.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 770–778. https://doi.org/10.1109/CVPR.2016.90
[17]
Matthew J Hertenstein, Julie M Verkamp, Alyssa M Kerestes, and Rachel M Holmes. 2006. The communicative functions of touch in humans, nonhuman primates, and rats: a review and synthesis of the empirical research. Genetic, social, and general psychology monographs 132, 1 (2006), 5–94.
[18]
Berfu Karaca, Albert Ali Salah, Jaap Denissen, Ronald Poppe, and Sonja M.C. de Zwarte. to appear. Survey of Automated Methods for Nonverbal Behavior Analysis in Parent-Child Interactions. In Proceedings of the International Conference on Face and Gesture Recognition (FG).
[19]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. IEEE, Santiago, Chile, 2938–2946.
[20]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015–4026.
[21]
Lisa J. G. Krijnen, Marjolein Verhoeven, and Anneloes L. van Baar. 2023. Observing mother-child interaction in a free-play vs. a structured task context and its relationship with preterm and term born toddlers’ psychosocial outcomes. Frontiers in Child and Adolescent Psychiatry 2 (2023), 1176560.
[22]
Zhengcen Li, Yueran Li, Linlin Tang, Tong Zhang, and Jingyong Su. 2023. Two-Person Graph Convolutional Network for Skeleton-Based Human Interaction Recognition. IEEE Transactions on Circuits and Systems for Video Technology 33, 7 (2023), 3333–3342. https://doi.org/10.1109/TCSVT.2022.3232373
[23]
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and Ming-Ting Sun. 2020. Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31 (2020), 7277–7286. Issue 3.
[24]
Ashley Montague. 1986. Touching: The human significance of the skin (3rd ed.). Harper & Row, New York, NY, USA.
[25]
N Charlotte Onland-Moret, Jacobine E Buizer-Voskamp, Maria EWA Albers, Rachel M Brouwer, Elizabeth EL Buimer, Roy S Hessels, Roel de Heus, Jorg Huijding, Caroline MM Junge, René CW Mandl, 2020. The YOUth study: Rationale, design, and study procedures. Developmental cognitive neuroscience 46 (2020), 100868.
[26]
Muhammad Rameez Ur Rahman, Luca Scofano, Edoardo De Matteis, Alessandro Flaborea, Alessio Sampieri, and Fabio Galasso. 2023. Best Practices for 2-Body Pose Forecasting.
[27]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, Springer, Munich, Germany, 234–241.
[28]
Reva Rubin. 1963. Maternal touch. Nursing outlook 11 (1963), 828–829.
[29]
Sara E Schroer and Chen Yu. 2022. The Real-Time Effects of Parent Speech on Infants’ Multimodal Attention and Dyadic Coordination. Infancy : the official journal of the International Society on Infant Studies 27, 6 (2022), 1154–1178. https://doi.org/10.1111/infa.12500
[30]
Jack P Shonkoff and P Hauser-Cram. 1987. Early intervention for disabled infants and their families: a quantitative analysis. Pediatrics 80, 5 (1987), 650–658.
[31]
Jack P Shonkoff and Samuel J Meisels. 2000. Handbook of Early Childhood Intervention (2 ed.). Cambridge University Press.
[32]
Anja Sommer, Claudia Hachul, and Hans-Günther Roßbach. 2016. Video-Based Assessment and Rating of Parent-Child Interaction Within the National Educational Panel Study. Springer Fachmedien Wiesbaden, Wiesbaden, 151–167. https://doi.org/10.1007/978-3-658-11994-2_9
[33]
Alexandros Stergiou and Ronald Poppe. 2019. Analyzing human–human interactions: A survey. Computer Vision and Image Understanding 188 (2019), 102799.
[34]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, Long Beach, CA, USA, 5693–5703.
[35]
Juulia T. Suvilehto, Enrico Glerean, Robin I. M. Dunbar, Riitta Hari, and Lauri Nummenmaa. 2015. Topography of social touching depends on emotional bonds between humans. Proceedings of the National Academy of Sciences 112, 45 (2015), 13811–13816. https://doi.org/10.1073/pnas.1519231112
[36]
Ines Van Keer, Eva Ceulemans, Nadja Bodner, Sier Vandesande, Karla Van Leeuwen, and Bea Maes. 2019. Parent-child interaction: A micro-level sequential approach in children with a significant cognitive and motor developmental delay. Research in Developmental Disabilities 85 (2019), 172–186. https://doi.org/10.1016/j.ridd.2018.11.008
[37]
Yifei Yin, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Jie Song, and Otmar Hilliges. 2023. Hi4D: 4D instance segmentation of close human interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17016–17027.
[38]
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-Aware Coordinate Representation for Human Pose Estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF, Seattle, WA, USA, 7093–7102.
[39]
Feng Zhang, Xiatian Zhu, and Mao Ye. 2019. Fast human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE/CVF, Long Beach, CA, USA, 3517–3526.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '24: Proceedings of the 26th International Conference on Multimodal Interaction
November 2024
725 pages
ISBN:9798400704628
DOI:10.1145/3678957
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Contact Detection
  2. Free Play
  3. Graph Convolutional Neural Network
  4. Interaction Analysis
  5. Parent-Child Interaction
  6. Pose Estimation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '24
ICMI '24: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
November 4 - 8, 2024
San Jose, Costa Rica

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)16
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media